mod-linked-data-import

This software is distributed under the terms of the Apache License, Version 2.0. See the file "LICENSE" for more information.

Introduction

This module provides bulk import functionality for RDF data graphs into the mod-linked-data application. It reads RDF subgraphs in Bibframe 2 format, transforms them into the Builde vocabulary, and delivers them to mod-linked-data via Kafka.

Third party libraries used in this software

This software uses the following Weak Copyleft (Eclipse Public License 1.0 / 2.0) licensed software libraries:

How to Import Data

Upload the RDF file to the S3 bucket specified by the S3_BUCKET environment variable.
Inside that bucket, place the file within the subdirectory corresponding to the target tenant ID.
Trigger the import by calling the following API:

POST /linked-data-import/start?fileUrl={fileNameInS3}&contentType=application/ld+json
x-okapi-tenant: {tenantId}
x-okapi-token: {token}

File Format & Contents

The file must be in JSON Lines (jsonl) format.
Each line must contain a complete subgraph of a Bibframe Instance resource, as defined by the Bibframe 2 ontology.

For an example of a valid import file containing two RDF instances, see docs/example-import.jsonl.

Limitations

Only RDF data serialized as application/ld+json is supported. Support for additional formats (e.g., XML, N-Triples) may be added in the future.
Only Bibframe Instances and their connected resources can be imported. Standalone resources—such as a Person not linked to any Instance—cannot be processed.

Batch processing

File contents are processed in batches. You can configure batch processing using following environment variables:

CHUNK_SIZE: Number of lines read from the input file per chunk
OUTPUT_CHUNK_SIZE: Number of Graph resources sent to Kafka per chunk
PROCESS_FILE_MAX_POOL_SIZE: Maximum threads used for parallel chunk processing

Interaction with mod-linked-data

mod-linked-data uses the Builde vocabulary for representing graph data.

During import:

This module transforms Bibframe 2 subgraphs into the equivalent Builde subgraph using the lib-linked-data-rdf4ld library.
The transformed subgraphs are published to the Kafka topic specified by the KAFKA_LINKED_DATA_IMPORT_OUTPUT_TOPIC environment variable.
mod-linked-data consumes messages from this topic, performs additional processing, and persists the graph to its database.

Dependencies on libraries

This module is dependent on the following libraries:

Compiling

mvn clean install

Skip tests:

mvn clean install -DskipTests

Environment variables

This module uses S3 storage for files. AWS S3 and Minio Server are supported for files storage. It is also necessary to specify variable S3_IS_AWS to determine if AWS S3 is used as files storage. By default, this variable is false and means that MinIO server is used as storage. This value should be true if AWS S3 is used.

Name	Default value	Description
S3_URL	http://127.0.0.1:9000/	S3 url
S3_REGION	-	S3 region
S3_BUCKET	-	S3 bucket
S3_ACCESS_KEY_ID	-	S3 access key
S3_SECRET_ACCESS_KEY	-	S3 secret key
S3_IS_AWS	false	Specify if AWS S3 is used as files storage
CHUNK_SIZE	1000	Number of lines read from the input file per chunk
OUTPUT_CHUNK_SIZE	100	Number of Graph resources sent to Kafka per chunk
JOB_POOL_SIZE	1	Number of concurrent Import Jobs
PROCESS_FILE_MAX_POOL_SIZE	1000	Maximum threads used for parallel chunk processing
KAFKA_LINKED_DATA_IMPORT_OUTPUT_TOPIC	linked_data_import.output	Kafka topic where the transformed subgraph is published for mod-linked-data

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
checkstyle		checkstyle
descriptors		descriptors
docs		docs
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
NEWS.md		NEWS.md
PERSONAL_DATA_DISCLOSURE.md		PERSONAL_DATA_DISCLOSURE.md
README.md		README.md
lombok.config		lombok.config
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mod-linked-data-import

Introduction

Third party libraries used in this software

How to Import Data

File Format & Contents

Limitations

Batch processing

Interaction with mod-linked-data

Dependencies on libraries

Compiling

Environment variables

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

folio-org/mod-linked-data-import

Folders and files

Latest commit

History

Repository files navigation

mod-linked-data-import

Introduction

Third party libraries used in this software

How to Import Data

File Format & Contents

Limitations

Batch processing

Interaction with mod-linked-data

Dependencies on libraries

Compiling

Environment variables

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages