pdfx

A lightning-fast terminal-native PDF indexing and search toolkit

Features

Fast PDF Indexing: SQLite-powered database with metadata extraction
Lightning Search: Instant filename-based search across indexed PDFs
List & Browse: View all indexed PDFs with detailed information
Export Data: Export your PDF library to JSON, CSV, Markdown, PDF, YAML, and HTML
Cross-Platform: Native support for Linux, macOS, and Windows
Clean UI: Beautiful progress bars and organized output
Zero Dependencies: No external system requirements
Smart Cleanup: Complete data removal with pdfx cleanup

Installation

From Source

# Clone the repository
git clone https://github.com/ionnss/pdfx.git
cd pdfx

# Build and install
cargo install --path .

From GitHub

cargo install --git https://github.com/ionnss/pdfx

Usage

Basic Commands

# Initialize PDF index
pdfx init                    # Index current directory
pdfx init ~/Documents        # Index specific directory
pdfx init ~                  # Index entire home directory

# Search indexed PDFs
pdfx search "machine learning"   # Search for keyword in filenames

# List all indexed PDFs
pdfx list                    # Show all PDFs with details

# Export your PDF library
pdfx export                  # Export all formats to Downloads folder
pdfx export --format json    # Export only JSON format
pdfx export --format csv,yaml # Export multiple formats

# Clean up
pdfx cleanup                 # Remove all indexed data

Workflow Example

# 1. First time setup - index your PDFs
pdfx init ~/Documents
#⠋ Counting⠠ Counting files... 221400
#🔍 Indexing PDFs... [00:00:04] [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿           ] 374539/722498 files | 90,889.0426/s | ETA: 4s

#✅ Index complete!
#📊 Summary: 176 PDFs found | 722497 files processed | 122 directories skipped
#✅ Successfully indexed 176 PDFs in /Users/user/Library/Application Support/pdfx/db.sqlite


# 2. Browse your PDF library
pdfx list
# 📋 All Indexed PDFs
# 📊 Total: 170 PDFs
# 📄 1. The Rust Programming Language.pdf
#     Size: 14.37 MB
#     Path: /Users/user/Documents/books/rust.pdf
#     Modified: 2025-01-15 10:30:00

# 3. Search your indexed PDFs instantly
pdfx search "rust programming"

# 4. Export your library for sharing or backup
pdfx export
# Exporting 170 PDFs to /Users/user/Downloads/pdfx_exports
#   ✅ Generated pdfs.json
#   ✅ Generated pdfs.csv
#   ✅ Generated pdfs.md
#   ✅ Generated pdfs.pdf
#   ✅ Generated pdfs.yaml
#   ✅ Generated pdfs.html
# 🎉 Export complete!

# 5. When you're done (optional cleanup)
pdfx cleanup

Export Formats

pdfx supports multiple export formats for your PDF library:

Available Formats

JSON: Machine-readable format with full metadata
CSV: Spreadsheet-compatible format for data analysis
Markdown: Human-readable format with tables
YAML: Structured format for configuration files
HTML: Web-ready format for sharing online

Export Examples

# Export all formats to Downloads folder
pdfx export

# Export specific formats
pdfx export --format json
pdfx export --format csv,yaml
pdfx export --format html

Export Location

Default: ~/Downloads/pdfx_exports/ (or equivalent on your OS)
Files: pdfs.json, pdfs.csv, pdfs.md, pdfs.pdf, pdfs.yaml, pdfs.html

Database & Storage

Where Your Data Lives

# macOS
~/Library/Application Support/pdfx/db.sqlite

# Linux  
~/.local/share/pdfx/db.sqlite

# Windows
%APPDATA%/pdfx/db.sqlite

Privacy & Security

Local Storage Only: No cloud, no tracking, no data sharing
SQLite Database: Industry-standard, portable format
Complete Cleanup: pdfx cleanup removes all traces

Requirements

Rust: 1.70 or later
Operating System: Linux, macOS, or Windows
Terminal: Any modern terminal with Unicode support

Development

Setup

git clone https://github.com/ionnss/pdfx.git
cd pdfx
cargo build
cargo run -- --help

Project Structure

src/
├── cli/          # Command-line interface
├── database/     # SQLite database operations
├── indexer/      # PDF file discovery and indexing
├── helpers/      # Utility functions
└── types.rs      # Core data structures

Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Troubleshooting

Common Issues

Q: "Permission denied" errors during scanning

# This is normal on macOS/Linux - system directories are protected
# pdfx will skip these and continue scanning accessible directories

Q: Database seems corrupted or giving errors

pdfx cleanup    # Remove database and start fresh
pdfx init       # Rebuild index

Q: Where is my data stored?

# View database location after running pdfx init
# Path is shown in success message
# Use `pdfx cleanup` to remove all data

Roadmap

Current Status (v0.2.0)

✅ PDF Indexing: SQLite-based PDF database with metadata
✅ Filename Search: Fast, case-insensitive filename search
✅ List Command: Display all indexed PDFs with detailed information
✅ Export Data: Export to JSON, CSV, Markdown, PDF, YAML, and HTML formats
✅ Cross-Platform: Works on Linux, macOS, and Windows
✅ Clean UI: Progress bars and organized output

Planned Features

📅 Recent Command: Show recently modified PDFs
🔍 Advanced Search: Filter by size, date, path
📊 Statistics: Show indexing statistics and storage usage
🏷 Tagging System: Categorize and tag PDFs for better organization
🤖 AI: Chating with pdf files

See FUTURE.md for detailed roadmap and feature plans.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with excellence using:

Rust - Systems programming language
rusqlite - SQLite database operations
clap - Command-line argument parsing
indicatif - Progress bars and spinners
walkdir - Recursive directory traversal
chrono - Date and time handling

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
assets		assets
src		src
test_debug		test_debug
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

pdfx

Features

Installation

From Source

From GitHub

Usage

Basic Commands

Workflow Example

Export Formats

Available Formats

Export Examples

Export Location

Database & Storage

Where Your Data Lives

Privacy & Security

Requirements

Development

Setup

Project Structure

Contributing

Troubleshooting

Common Issues

Roadmap

Current Status (v0.2.0)

Planned Features

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

Uh oh!

License

Uh oh!

ionnss/pdfx

Folders and files

Latest commit

History

Repository files navigation

pdfx

Features

Installation

From Source

From GitHub

Usage

Basic Commands

Workflow Example

Export Formats

Available Formats

Export Examples

Export Location

Database & Storage

Where Your Data Lives

Privacy & Security

Requirements

Development

Setup

Project Structure

Contributing

Troubleshooting

Common Issues

Roadmap

Current Status (v0.2.0)

Planned Features

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages