Skip to content

hyparam/parquet-grep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parquet-grep

mit license

A CLI tool for searching text within Apache Parquet files. Works like grep but for Parquet files, with support for recursive directory search and multiple output formats.

Built on top of hyparquet for high-performance Parquet parsing.

Installation

npm install -g parquet-grep

Or use directly with npx:

npx parquet-grep "search term" file.parquet

Usage

parquet-grep [options] <query> [parquet-file]

Options

  • -i - Force case-insensitive search (by default: case-insensitive if query is lowercase, case-sensitive if query contains uppercase)
  • --table - Output in markdown table format (default, grouped by file)
  • --jsonl - Output as JSON lines (one match per line with filename, rowOffset, and value)

If no file is specified, recursively searches all .parquet files in the current directory, skipping node_modules and hidden directories.

Examples

Search a single file:

parquet-grep "Holland" bunnies.parquet

Search recursively in current directory:

parquet-grep "search term"

Case-insensitive search:

parquet-grep -i "HOLLAND" bunnies.parquet

JSONL output:

parquet-grep --jsonl "Holland" bunnies.parquet

About

Grep your parquet files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published