Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,9 @@
/classpath/
build/
.idea
/.settings/
/.metadata/
.classpath
.project
*.iml
out
76 changes: 37 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,35 @@ This plugin combine rows from file having data format like a table, based on a c
## Configuration

* **on**:
* **in_column**: name of the column on input. (string, required)
* **file_column**: name of the column on file. (string, default is the same as **in_column**)
* **input_column**: name of the column on input. (string, required)
* **file_column**: name of the column on file. (string, required)
* **file**:
* **path**: path of file (string, required)
* **format**: file format (string, required, supported: `json`)
* **encode**: file encode (string, default is `raw`, supported: `raw`, `gzip`)
* **path_prefix**: Path prefix of input files (string, required)
* **parser**: Parser configurations except **columns** option (see below [Supported Parser Type](#supported-parser-type)) (hash, required)
* **decoders**: Decoder configuration (see below [Supported Decorder Type](#supported-decorder-type)) (array of hash, optional)
* **follow_symlinks**: If true, follow symbolic link directories (boolean, default: `false`)
* **columns**: required columns of data from the file (array of hash, required)
* **name**: name of the column
* **type**: type of the column (see below)
* **type**: type of the column (see below [Type of the column](#type-of-the-column))
* **format**: format of the timestamp if type is timestamp
* **timezone**: timezone of the timestamp if type is timestamp
* **timezone**: timezone of the timestamp if type is timestamp
* **column_prefix**: column name prefix added to file `columns` for prevent duplicating column name (string, default: `"_join_by_embulk_""`)
* **parser_plugin_columns_option**: Set the **file.columns** value as this option name into **file.parser** options. (optional, default: `"columns"`, see [Supported Parser Type](#supported-parser-type) in details.)

---
**type of the column**
### Supported Parser Type

* You can use all embulk file-parser plugins.
* [built-in parser plugins](http://www.embulk.org/docs/built-in.html)
* [parser plugins](http://www.embulk.org/plugins/#file-parser).
* You don't need to define the option like **columns** into **file.parser** options, because **file.columns** value is set into **file.parser**'s **columns** option. If you set a value to **file.parser_plugin_columns_option**, this plugin sets **file.columns** value as the option name that is set into into **file.parser** options as **file.parser_plugin_columns_option** option.

### Supported Decorder Type

* You can use all embulk file-decorder plugins.
* [built-in decorder plugins](http://www.embulk.org/docs/built-in.html)
* [decorder plugins](http://www.embulk.org/plugins/#file-decoder)

### Type of the column

|name|description|
|:---|:---|
Expand All @@ -39,47 +54,30 @@ This plugin combine rows from file having data format like a table, based on a c
filters:
- type: join_file
on:
in_column: name_id
input_column: id
file_column: id
file:
path: ./master.json
format: json
encode: raw
path_prefix: ./example/json_array_of_hash/*.json
parser:
type: jsonpath
root: "$."
columns:
- {name: id, type: long}
- {name: name, type: string}
joined_column_prefix: _joined_by_embulk_
- {name: created_at, type: timestamp, format: "%Y-%m-%d"}
- {name: point, type: double}
- {name: time_zone, type: string}
column_prefix: _join_by_embulk_
```

See [more examples](./example).

## Run Example

```
$ ./gradlew classpath
$ embulk run -I lib example/config.yml
```

## Supported Data Format
* json

### Supported Data Format Example

#### JSON

```
[
{
"id": 0,
"name": "civitaspo"
},
{
"id": 2,
"name": "moriogai"
},
{
"id": 5,
"name": "natsume.soseki"
}
]
$ embulk bundle install --gemfile=example/Gemfile --path vendor/bundle
$ embulk run -b example -Ilib example/config.yml
```

## Build
Expand Down
24 changes: 17 additions & 7 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,19 @@ configurations {
}

version = "0.1.0"
sourceCompatibility = 1.7
targetCompatibility = 1.7

sourceCompatibility = 1.8
targetCompatibility = 1.8

dependencies {
compile "org.embulk:embulk-core:0.8.+"
provided "org.embulk:embulk-core:0.8.+"
compile "org.embulk:embulk-core:0.8.29"
provided "org.embulk:embulk-core:0.8.29"
compile "org.embulk:embulk-standards:0.8.29"
provided "org.embulk:embulk-standards:0.8.29"
compile "com.google.guava:guava:23.0"
provided "com.google.guava:guava:23.0"
compile "org.komamitsu:fluency:1.4.0"
compile "com.okumin:influent-java:0.3.0"
testCompile "junit:junit:4.+"
}

Expand All @@ -46,6 +53,7 @@ task checkstyle(type: Checkstyle) {
classpath = sourceSets.main.output + sourceSets.test.output
source = sourceSets.main.allJava + sourceSets.test.allJava
}

task gem(type: JRubyExec, dependsOn: ["gemspec", "classpath"]) {
jrubyArgs "-rrubygems/gem_runner", "-eGem::GemRunner.new.run(ARGV)", "build"
script "${project.name}.gemspec"
Expand All @@ -57,9 +65,11 @@ task gemPush(type: JRubyExec, dependsOn: ["gem"]) {
script "pkg/${project.name}-${project.version}.gem"
}

task "package"(dependsOn: ["gemspec", "classpath"]) << {
println "> Build succeeded."
println "> You can run embulk with '-L ${file(".").absolutePath}' argument."
task "package"(dependsOn: ["gemspec", "classpath"]) {
doLast {
println "> Build succeeded."
println "> You can run embulk with '-L ${file(".").absolutePath}' argument."
}
}

task gemspec {
Expand Down
3 changes: 3 additions & 0 deletions example/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
source 'https://rubygems.org'

gem 'embulk-parser-jsonpath'
32 changes: 0 additions & 32 deletions example/config.yml

This file was deleted.

37 changes: 37 additions & 0 deletions example/json_array_of_hash/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
in:
type: file
path_prefix: example/data.csv
parser:
type: csv
charset: UTF-8
newline: CRLF
null_string: 'NULL'
skip_header_lines: 1
comment_line_marker: '#'
columns:
- {name: time, type: timestamp, format: "%Y-%m-%d"}
- {name: id, type: long}
- {name: name, type: string}
- {name: score, type: double}

filters:
- type: join_file
on:
input_column: id
file_column: id
file:
path_prefix: ./example/json_array_of_hash/*.json
parser:
type: jsonpath
root: "$."
columns:
- {name: id, type: long}
- {name: name, type: string}
- {name: created_at, type: timestamp, format: "%Y-%m-%d"}
- {name: point, type: double}
- {name: time_zone, type: string}
column_prefix: _join_by_embulk_


out:
type: stdout
File renamed without changes.
Binary file modified gradle/wrapper/gradle-wrapper.jar
Binary file not shown.
2 changes: 1 addition & 1 deletion gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#Wed Jun 21 00:12:09 JST 2017
#Sun Jan 08 00:35:58 PST 2017
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
Expand Down
19 changes: 8 additions & 11 deletions gradlew
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env sh
#!/usr/bin/env bash

##############################################################################
##
Expand Down Expand Up @@ -154,19 +154,16 @@ if $cygwin ; then
esac
fi

# Escape application args
save ( ) {
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
echo " "
# Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
function splitJvmOpts() {
JVM_OPTS=("$@")
}
APP_ARGS=$(save "$@")

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"
eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS
JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME"

# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
if [[ "$(uname)" == "Darwin" ]] && [[ "$HOME" == "$PWD" ]]; then
cd "$(dirname "$0")"
fi

exec "$JAVACMD" "$@"
exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@"
4 changes: 4 additions & 0 deletions lib/embulk/filter/join_file.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
Embulk::JavaPlugin.register_filter(
"join_file", "org.embulk.filter.join_file.JoinFileFilterPlugin",
File.expand_path('../../../../classpath', __FILE__))

Embulk::JavaPlugin.register_output(
"internal_forward", "org.embulk.filter.join_file.plugin.InternalForwardOutputPlugin",
File.expand_path('../../../../classpath', __FILE__))
Loading