Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[v0.1.0 (yyyy-mm-dd)](https://github.com/civitaspo/embulk-filter-join_file/tree/v0.1.0)
===================

### Breaking Change
* Change options interface
* this change has no backward compatibility
* Support JSON Type
* Use Embulk v0.8.x
* Write tests

[v0.0.2 (2015-11-25)](https://github.com/civitaspo/embulk-filter-join_file/tree/v0.0.2)
===================
* Fix a bug
* Do not call pageBuilder#finish inside pageOutput#add https://github.com/civitaspo/embulk-filter-join_file/pull/2


[v0.0.1 (2015-10-11)](https://github.com/civitaspo/embulk-filter-join_file/tree/v0.0.1)
===================
* First Release
81 changes: 24 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,18 @@ This plugin combine rows from file having data format like a table, based on a c

## Configuration

- **base_column**: a column name of data embulk loaded (hash, required)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp
- **counter_column**: a column name of data loaded from file (string, default: `{name: id, type: long}`)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp
- **joined_column_prefix**: prefix added to joined data columns (string, default: `"_joined_by_embulk_"`)
- **file_path**: path of file (string, required)
- **file_format**: file format (string, required, supported: `csv`, `tsv`, `yaml`, `json`)
- **columns**: required columns of data from the file (array of hash, required)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp
* **on**:
* **in_column**: name of the column on input. (string, required)
* **file_column**: name of the column on file. (string, default is the same as **in_column**)
* **file**:
* **path**: path of file (string, required)
* **format**: file format (string, required, supported: `json`)
* **encode**: file encode (string, default is `raw`, supported: `raw`, `gzip`)
* **columns**: required columns of data from the file (array of hash, required)
* **name**: name of the column
* **type**: type of the column (see below)
* **format**: format of the timestamp if type is timestamp
* **timezone**: timezone of the timestamp if type is timestamp

---
**type of the column**
Expand All @@ -34,20 +31,24 @@ This plugin combine rows from file having data format like a table, based on a c
|timestamp|Date and time with nano-seconds precision|
|double|64-bit floating point numbers|
|string|Strings|
|json|JSON|

## Example

```yaml
filters:
- type: join_file
base_column: {name: name_id, type: long}
counter_column: {name: id, type: long}
on:
in_column: name_id
file_column: id
file:
path: ./master.json
format: json
encode: raw
columns:
- {name: id, type: long}
- {name: name, type: string}
joined_column_prefix: _joined_by_embulk_
file_path: master.json
file_format: json
columns:
- {name: id, type: long}
- {name: name, type: string}
```

## Run Example
Expand All @@ -58,44 +59,10 @@ $ embulk run -I lib example/config.yml
```

## Supported Data Format
- csv ( **not implemented** )
- tsv ( **not implemented** )
- yaml ( **not implemented** )
- json
* json

### Supported Data Format Example

#### CSV

```csv
id,name
0,civitaspo
2,mori.ogai
5,natsume.soseki
```

#### TSV

Since the representation is difficult, it represents the tab as `\t`.

```tsv
id\tname
0\tcivitaspo
2\tmori.ogai
5\tnatsume.soseki
```

#### YAML

```
- id: 0
name: civitaspo
- id: 2
name: mori.ogai
- id: 5
name: natsume.soseki
```

#### JSON

```
Expand Down
6 changes: 2 additions & 4 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,13 @@ configurations {
provided
}

version = "0.0.2"
version = "0.1.0"
sourceCompatibility = 1.7
targetCompatibility = 1.7

dependencies {
compile "org.embulk:embulk-core:0.8.+"
provided "org.embulk:embulk-core:0.8.+"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
testCompile "junit:junit:4.+"
}

Expand Down Expand Up @@ -82,11 +81,10 @@ Gem::Specification.new do |spec|
spec.test_files = spec.files.grep(%r"^(test|spec)/")
spec.require_paths = ["lib"]

#spec.add_dependency 'YOUR_GEM_DEPENDENCY', ['~> YOUR_GEM_DEPENDENCY_VERSION']
spec.add_development_dependency 'bundler', ['~> 1.0']
spec.add_development_dependency 'rake', ['>= 10.0']
end
/$)
}
}
clean { delete "${project.name}.gemspec" }
clean { delete "${project.name}.gemspec" }