Skip to content

Commit 244bbf1

Browse files
docs: Document how to cast values to different data types in inline stream maps (#3338)
1 parent aa6677a commit 244bbf1

File tree

2 files changed

+158
-3
lines changed

2 files changed

+158
-3
lines changed

docs/batch.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,15 @@ When using the `jsonl` encoding format, batch files should contain **raw JSON re
6262

6363
**Correct format** (raw JSON records):
6464

65-
```jsonl
65+
```json
6666
{"id": 1, "name": "Alice", "email": "[email protected]"}
6767
{"id": 2, "name": "Bob", "email": "[email protected]"}
6868
{"id": 3, "name": "Charlie", "email": "[email protected]"}
6969
```
7070

7171
**Incorrect format** (Singer RECORD messages):
7272

73-
```jsonl
73+
```json
7474
{"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "Alice", "email": "[email protected]"}}
7575
{"type": "RECORD", "stream": "users", "record": {"id": 2, "name": "Bob", "email": "[email protected]"}}
7676
{"type": "RECORD", "stream": "users", "record": {"id": 3, "name": "Charlie", "email": "[email protected]"}}

docs/stream_maps.md

Lines changed: 156 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,13 +292,168 @@ records being generated.
292292

293293
The following logic is applied in determining the SCHEMA of the transformed stream:
294294

295-
1. Calculations which begin with the text `str(`, `float(`, `int(` will be
295+
1. Calculations which begin with the text `str(`, `float(`, `int(`, or `bool(` will be
296296
assumed to be belonging to the specified type.
297297
1. Otherwise, if the property already existed in the original stream, it will be assumed
298298
to have the same data type as the original stream.
299299
1. Otherwise, if no type is detected using the above rules, any new stream properties will
300300
be assumed to be of type `str` .
301301

302+
#### Type Casting
303+
304+
Stream maps support explicit type casting to convert values to different data types. By wrapping expressions with type casting functions, you can ensure proper schema detection and data type conversion.
305+
306+
##### Supported Type Casting Functions
307+
308+
The following type casting functions are available in stream mapping expressions:
309+
310+
| Function | Target Type | JSON Schema Type | Description |
311+
| :------- | :---------- | :--------------- | :---------- |
312+
| `int()` | Integer | `integer` | Converts a value to an integer type |
313+
| `float()` | Decimal/Number | `number` | Converts a value to a floating-point number |
314+
| `str()` | String | `string` | Converts a value to a string type |
315+
| `bool()` | Boolean | `boolean` | Converts a value to a boolean type |
316+
317+
##### Type Casting Examples
318+
319+
**Converting string values to integers:**
320+
321+
````{tab} meltano.yml
322+
```yaml
323+
stream_maps:
324+
mystream:
325+
# Convert a string literal to integer
326+
int_test: int('0')
327+
# Convert a calculation result to integer
328+
fixed_count: int(count - 1)
329+
# Extract year from date as integer
330+
create_year: int(datetime.date.fromisoformat(create_date).year)
331+
```
332+
````
333+
334+
````{tab} JSON
335+
```json
336+
{
337+
"stream_maps": {
338+
"mystream": {
339+
"int_test": "int('0')",
340+
"fixed_count": "int(count - 1)",
341+
"create_year": "int(datetime.date.fromisoformat(create_date).year)"
342+
}
343+
}
344+
}
345+
```
346+
````
347+
348+
**Converting values to floats:**
349+
350+
````{tab} meltano.yml
351+
```yaml
352+
stream_maps:
353+
mystream:
354+
# Convert timestamp to float
355+
joined_timestamp: float(datetime.datetime.fromisoformat(joined_at).timestamp())
356+
```
357+
````
358+
359+
````{tab} JSON
360+
```json
361+
{
362+
"stream_maps": {
363+
"mystream": {
364+
"joined_timestamp": "float(datetime.datetime.fromisoformat(joined_at).timestamp())"
365+
}
366+
}
367+
}
368+
```
369+
````
370+
371+
**Converting values to strings:**
372+
373+
````{tab} meltano.yml
374+
```yaml
375+
stream_maps:
376+
repositories:
377+
# Explicitly cast to string (useful for schema detection)
378+
description: str('[masked]')
379+
```
380+
````
381+
382+
````{tab} JSON
383+
```json
384+
{
385+
"stream_maps": {
386+
"repositories": {
387+
"description": "str('[masked]')"
388+
}
389+
}
390+
}
391+
```
392+
````
393+
394+
**Converting values to booleans:**
395+
396+
````{tab} meltano.yml
397+
```yaml
398+
stream_maps:
399+
mystream:
400+
# Convert to boolean with null handling
401+
is_active: bool(status_value) if status_value else None
402+
```
403+
````
404+
405+
````{tab} JSON
406+
```json
407+
{
408+
"stream_maps": {
409+
"mystream": {
410+
"is_active": "bool(status_value) if status_value else None"
411+
}
412+
}
413+
}
414+
```
415+
````
416+
417+
##### When to Use Type Casting
418+
419+
Type casting is particularly useful in the following scenarios:
420+
421+
1. **Schema Detection Hints**: When creating new calculated fields, wrap the expression in a type casting function to ensure the SDK correctly detects the output type in the schema.
422+
423+
1. **Type Conversion**: When you need to convert a value from one type to another (e.g., converting a numeric string to an actual integer).
424+
425+
1. **Consistent Data Types**: When working with data that may have inconsistent types in the source, you can enforce a specific type in the output.
426+
427+
**Example combining type casting with conditional logic:**
428+
429+
````{tab} meltano.yml
430+
```yaml
431+
stream_maps:
432+
nested_jellybean:
433+
# Extract custom field value and cast to integer, handling null values
434+
custom_field_2: >-
435+
int(dict([(x["id"], x["value"]) for x in custom_fields]).get(2))
436+
if dict([(x["id"], x["value"]) for x in custom_fields]).get(2)
437+
else None
438+
```
439+
````
440+
441+
````{tab} JSON
442+
```json
443+
{
444+
"stream_maps": {
445+
"nested_jellybean": {
446+
"custom_field_2": "int(dict([(x[\"id\"], x[\"value\"]) for x in custom_fields]).get(2)) if dict([(x[\"id\"], x[\"value\"]) for x in custom_fields]).get(2) else None"
447+
}
448+
}
449+
}
450+
```
451+
````
452+
453+
:::{note}
454+
Type casting functions are standard Python built-ins that are available in the stream mapping expression evaluator. The SDK performs static type detection by examining the beginning of the expression string, so the type casting function must appear at the start of the expression for proper schema detection.
455+
:::
456+
302457
## Customized `stream_map` Behaviors
303458

304459
### Removing a single stream or property

0 commit comments

Comments
 (0)