28 Aug 11:33

3a82682

Latest

Major release:

Support for the JSONiq Update Facility to write to tables managed in the Hive metastore and Delta files
Support for the JSONiq Scripting Extension (variable assignments, while loops, applying updates during execution, exit returning, etc)
Support for Python with the pip jsoniq package
Alpha support for XML (XQuery 3.0)
Automatic schema detection upon writing CSV or Parquet files. No need to specify schemas explicitly any more.

Support for Spark 4.0 and Spark 3.5 (Scala 2.13). Note that Amazon EMR does not yet support Spark 4.0 but we expect this to happen soon. EMR 7 should be used with RumbleDB 1.22 because it is on Spark 3.5 and Scala 2.12.

Java 17 or 21 is required for Spark 4.0. Java 11 or 17 is required for Spark 3.5.

Many bug fixes, enhanced schema detection.

Contributors (Ghislain Fourny's students at ETH): Stefan Irimescu, Renato Marroquin, Rodrigo Bruno, Falko Noé, Ioana Stefan, Andrea Rinaldi, Stevan Mihajlovic, Mario Arduini, Can Berker Çıkış, Elwin Stephan, David Dao, Zirun Wang, Ingo Müller, Dan-Ovidiu Graur, Thomas Zhou, Olivier Goerens, Alexandru Meterez, Pierre Motard, Remo Röthlisberger, Dominik Bruggisser, David Loughlin, David Buzatu, Marco Schöb, Maciej Byczko, Abishek Ramdas, Matteo Agnoletto, Dwij Dixit.

Main website: https://www.rumbledb.org
Documentation: https://docs.rumbledb.org
Maven repository: https://central.sonatype.com/artifact/com.github.rumbledb/rumbledb
Javadoc: https://rumbledb.org/docs/latest/api/
Python package: https://pypi.org/project/jsoniq/

Assets 6

26 Mar 14:33

mschoeb

v1.23.0

9fcf2e0

RumbleDB 1.23.0 "Mountain ash" beta

Update (July 3, 2025): Spark 4.0 support is available.

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Supported versions
RumbleDB 1.23 supports Spark 3.5 with Scala 2.13 as well as Spark 4.0.
The jars are compatible with Java 11 and 17. As we are increasingly focusing our efforts towards the Spark 4 release and stability and conformance improvements and as Spark 4 is based on Scala 2.13, RumbleDB 1.23 support for Spark 3.4 as well as Scala 2.12 is dropped. Please use RumbleDB 1.22, which is stable, if you use Spark 3.4 or Spark 3.5 with Scala 2.12.

The standalone jar contains Spark 3.5 with Scala 2.13 and will thus just work.

General

Dropped support for Scala 2.12.
Dropped support for Spark 3.4
Renamed json-file() to json-lines(), old name can still be used for now but is marked deprecated
Added support for single quotes '. Strings with single quotes may contain double quotes ", but single quotes inside need to be escaped using \'. Analogous, strings with double quotes may contain single quotes, but double quotes inside need to be escaped using \"
Add support for some popular features of pandas/numpy libraries

JSONiq 3.1

Added option to use JSONiq 3.1 which brings changes to the JSONiq 1.0 spec to align it closer with XQuery 3.1. Enabling the option results in the following changes:

Objects and Arrays now have no effective boolean value and throw an error when checked
Keys for objects must be quoted
atomic is replaced by anyAtomicType
Remove JNDY0003 and replace it with XQDY0137
Both the JSONiq and XQuery parsers are available. The parser to use can be selected on the command line or with a language declaration in the query file.

Basic XML/XQuery support for both parsers

Add doc() function for reading an XML document
Add a new xml-files() function that allows for reading and processing of multiple .xml files in parallel
Add XPath steps for navigating XML documents. We are able to navigate through 32+ GB of XML data spread over many documents in just a few minutes on an Amazon EMR cluster.
Add data() function for atomization of nodes

Experimental XQuery Parser

Updated option to use XQuery parser instead of JSONiq. To use it, just prefix your query with xquery version "3.1";. Note: this is in a very early state and many features are still missing.

Context item is "." as opposed to "$$" from JSONiq
No JSONiq ObjectLookups with "."
No JSONiq ArrayLookup and ArrayUnboxing
Support for XQuery Map constructor and curly Array constructor
Support for String Lookup on Maps and Integer lookup on arrays with the ? operator

Minor Improvements and Bug fixes

subsequence and sequencelookups now use Spark pagination for large positions
Rumble shell now keeps history of previous sessions
Implements compare() with arities 2 and 3
Implements trace() arity 2
Implements xs:numeric
Adds support for setting base-uri in query and as CLI option
Implement FOAR0002, FOAY0001, FOTY0013, FODT0001, FODT0002, XPTY0018, XPTY0019, XQST0032
Increase decimal multiplication precision to 18 digits
Fixes index lookup with an index >= 1'000'000 throwing an error and incorrect behaviour with non-integer
Fixes calling parallelize on an already parallelized structure throwing an error
Fixes index lookup with decimal not adhering to spec
Fixes unnecessary warning shown when
Fixes effective boolean value of NaN and decimals equal to 0
Fixes stringToCodepoints() on multibyte ranges
Fixes indexof() shoudn't find NaN
Fixes some base64 errors
Fixes some edgecases in pow, log10, exp10, atan
Fixes resolveUri with empty baseUri
Fixes some incorrect exceptions of matches()
Fixes sum() with zeroElement not behaving correctly if sequence is non-empty
Fixes idiv and imult handling of inf and NaN
Fixes inner focus sometimes missing in simpleMap
Fixes bug allowing missing commas between function arguments

Assets 5

24 Oct 14:18

ghislainfourny

v1.22.0

90a7faa

RumbleDB 1.22.0 "Pyrenean oak" beta

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Supported Java versions

The jars are compatible with Java 11. Support for Java 8 is dropped.

Supported Spark versions

Spark 3.2 and 3.3 are no longer supported as of RumbleDB 1.22, as they are no longer supported officially by the Spark team. Spark 3.4 and 3.5 are supported. Spark 4 is currently in preview and not supported yet by RumbleDB, but we are currently trying it out in order to support in future releases.

Jars

RumbleDB comes in 3 jars that you can pick from depending on your needs:

rumbledb-1.22.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.22.0-standalone.jar with Java 11.

rumbledb-1.22.0-for-spark-3.4-scala-2-12.jar, rumbledb-1.22.0-for-spark-3.5-scala-2-12.jar, and rumbledb-1.22.0-for-spark-3.5-scala-2-13.jar are smaller in size, do not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-....jar -q '1+1'

Improvements

Support for the W3C-standardized copy-modify-return expression as a more convenient way to transform JSON objects and arrays with the update syntax (insertion, deletion, replacement, renaming)
Support for the persistence of updates on objects and arrays read from the DeltaLake (with the same update syntax)
Support for scripting: variable assignments, while loops, applying updates in the middle of the execution with visible side effects (under snapshot semantics), statements, block statements, continue, break, exit returning.
Many performance improvements
Many bugfixes

Assets 7

16 May 13:09

ghislainfourny

v1.21.0

53f4df0

RumbleDB 1.21.0 "Hawthorn blossom" beta

NEW! The jar for Spark 3.5 was added and is available for download.

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.21, as they are no longer supported officially by the Spark team. Spark 3.4 is newly supported.

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.21.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.21.0-standalone.jar with Java 8 or 11.
rumbledb-1.21.0-for-spark-3.X.jar (3.2, 3.3, 3.4) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.21.0-for-spark-3.X.jar

Improvements

Automatically parallelizes range expressions with more than a million items with no need to call parallelize() any more.
some simple map expressions on homogeneous input are now faster (native SQL behind the scene).
general comparisons on equality are now considerably faster
reverse() is now more efficient and faster on homogeneous sequences
Fixed bug on equijoin involving homogeneous sequences
Add two functions jn:cosh and jn:sinh
Automatic optimization of general comparisons to value comparisons when it is detected that the sequences have at most one item (can be deactivated with --optimize-general-comparison-to-value-comparison on)
Better static type detection
It is now possible to force a sequential execution (without Spark) with --parallel-execution no. This also works with queries containing calls to parallelize() (which will be ineffective), json-doc(), and json-file() (which will simply stream-read from the disk). Other I/O functions (such as csv-file(), etc) will still involve Spark for reading, but immediately materialize for the rest of the execution.
It is now possible to deactivate Native Spark SQL execution (forcing a fallback to the use of UDFs by RumbleDB) with --native-execution no.
annotate expression (similar syntax to validate expression) allows directly annotating an item without checking for validity.
More static types are detected
Non-recursive functions are now automatically inlined for faster execution. This can be deactivated with --function-inlining no (reverting to behavior in previous versions)
TypeSwitch expressions now support DataFrame execution

Bugfixes

Fixed bug when reading longs from DataFrames
Fixed an issue with projection pushdowns in join queries
Fixed a few bugs with queries that navigate JSON in for clauses; they are compiled to native SQL whenever possible, but some chains were throwing errors (e.g., an array unboxing followed by object lookup)
Fixed a bug in which calling count() on a grouping variable did not return 1 when native SQL execution is activated
hexBinary and base64Binary values can now be used in order by clauses with parallel execution

Assets 8

07 Nov 12:57

ghislainfourny

v1.20.0

38e07ca

RumbleDB 1.20.0 "Honeylocust"

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.20, as they are no longer supported officially by the Spark team.

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.20.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.20.0-standalone.jar with Java 8 or 11.
rumbledb-1.20.0-for-spark-3.X.jar (3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.20.0-for-spark-3.X.jar

New features:

Open and query YAML files (also with multiple documents) with yaml-doc()
Serialize the output of your queries to YAML with --output-format yaml
General comparisons (existential quantification on large sequences) now work with very big sequences and are automatically pushed down to Spark.

Bugfixes:

Fixed an issue preventing reading Decimal types from Parquet with some precisions and ranges
Fixed a few bugs in static typing
Fixed a bug that didn't throw an error when using the concatenation operator || on sequences with more than one item

Assets 7

14 Jun 13:17

ghislainfourny

v1.19.0

cd6684b

RumbleDB 1.19.0 "Tipuana Tipu"

RumbleDB allows you to query data that does not fit in DataFrames with JSONiq.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.19.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.19.0-standalone.jar with Java 8 or 11.
rumbledb-1.19.0-for-spark-3.X.jar (3.0, 3.1, 3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.19.0-for-spark-3.X.jar

Release notes:

Fixed the bug with spaces in paths
Various fixes and enhancement
New functions repartition#2 to change the number of physical partitions, and binary-classification-metrics#3, binary-classification-metrics#4 for preparing ROC curves, PR curves to evaluation the output of ML pipelines.

Assets 9

12 Apr 14:55

ghislainfourny

v1.18.0

52a3424

RumbleDB 1.18.0 "Scarlet Ixora" beta

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.18.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.18.0-standalone.jar with Java 8 or 11.
rumbledb-1.18.0-for-spark-3.X.jar (3.0, 3.1, 3.2) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.18.0-for-spark-3.X.jar

Release notes:

FLWOR expressions starting with a series of let are now better optimized and faster.
A warning with advice is issued in the command window if a group by is used in a FLWOR expression that starts with a let clause.
The shell will no longer exit when an error is thrown.
When a query cannot be executed in parallel, a more informative error message is output inviting the user to rewrite their query, instead of the raw Spark error.
When launching in shell or server mode, instructions are printed on the screen for next steps
Fixed crash in the execution of some where clauses when a join was not successfully detected and it falls back to linear execution
Support for context item declarations and passing an external context item value on the command line
By default, the date type no longer supports timezones (which are rarely used for this type, although supported by ISO 8601). This enables more optimizations (e.g., internal conversion to DataFrame DateType columns and export of datasets with dates to Parquet). Timezones on dates can be activated for those users who need them with a simple CLI argument (--dates-with-timezone yes).
Ctrl+C now elegantly exits the shell.

Assets 7

02 Feb 10:41

ghislainfourny

v1.17.0

02b7b3b

RumbleDB 1.17.0 "Cacao tree" beta

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

The CLI was extended with verbs (run, serve, repl) and single-dash shortcuts (-f for --output-format, etc). This is backward compatible.
Automatic internal conversion to DataFrames for FLWOR expressions executed in parallel when the statically inferred type is DataFrame-compatible.
Fixed bug that prevented calling a variable $type or lookup up a field called "type" without quotes.
Fixed but for projecting a sequence internally stored as a DataFrame to dynamically defined keys.
Fix some bugs with post-grouping count optimizations on let variables
Support for Spark 2.4, which is no longer maintained by the Spark team, is now dropped, but available on request. RumbleDB 1.17 supports Spark 3.0, 3.1 and 3.2.
plenty of smaller bug fixes
[Experimental] we also provide a jar that embeds Spark and does not require its installation (rumbledb-1.17.0-standalone.jar). It is for use on a local machine only (not a cluster) and works with java -jar rumbledb-1.17.0-standalone.jar run -q '1+1' rather with spark-submit. Feedback is welcome! This is just experimental at this point and we will take it from there.

Assets 7

09 Dec 10:14

ghislainfourny

v1.16.2

171fe57

RumbleDB 1.16.2 "Shagbark Hickory" beta Pre-release

Pre-release

Interim release.

Fix recursive view "input" issue.
Nicer message for out of memory errors and hint to use CLI parameters.
Reverted to Kryo 4 for Spark 3.2, which depends on Twitter Chill 0.10.0 using this version of Kryo in a way incompatible with Kryo5

Assets 6

06 Dec 08:46

ghislainfourny

v1.16.1

b703258

Rumble 1.16.1 "Shagbark Hickory" beta Pre-release

Pre-release

Interim release.

Fixed race condition issue with min() and max() called multiple times that led to possibly incorrect output.
The sum() and count() functions are now able to stream locally on very large (non parallelized) sequences.
Range expressions now support 64 bit integers as well (before this, an overflow happened)
The arrow syntax works for dynamic function calls, too, so in Rumble ML pipelines can also be called with a pipelining syntax: $training-set=>$my-transformer($params)=>my-estimator($params)
substring() was fixed to follow standard behavior even with exotic parameters (mostly returning an empty string in these cases)

Assets 6

Releases: RumbleDB/rumble

RumbleDB 2.0.0 Lemon Ironwood

Uh oh!

RumbleDB 1.23.0 "Mountain ash" beta

General

JSONiq 3.1

Basic XML/XQuery support for both parsers

Experimental XQuery Parser

Minor Improvements and Bug fixes

Uh oh!

RumbleDB 1.22.0 "Pyrenean oak" beta

Supported Java versions

Supported Spark versions

Jars

Improvements

Uh oh!

RumbleDB 1.21.0 "Hawthorn blossom" beta

Uh oh!

RumbleDB 1.20.0 "Honeylocust"

Uh oh!

RumbleDB 1.19.0 "Tipuana Tipu"

Uh oh!

RumbleDB 1.18.0 "Scarlet Ixora" beta

Uh oh!

RumbleDB 1.17.0 "Cacao tree" beta

Uh oh!

RumbleDB 1.16.2 "Shagbark Hickory" beta

Uh oh!

Rumble 1.16.1 "Shagbark Hickory" beta

Uh oh!