Skip to content
/ pdfx Public

Terminal Native PDF tool built with Rust

License

ionnss/pdfx

Repository files navigation

pdfx

pdfx logo

A lightning-fast terminal-native PDF indexing and search toolkit

Rust License: MIT GitHub release Release

Build Status Crates.io Downloads Contributors

GitHub Stars GitHub Issues


Features

  • Fast PDF Indexing: SQLite-powered database with metadata extraction
  • Lightning Search: Instant filename-based search across indexed PDFs
  • List & Browse: View all indexed PDFs with detailed information
  • Export Data: Export your PDF library to JSON, CSV, Markdown, PDF, YAML, and HTML
  • Cross-Platform: Native support for Linux, macOS, and Windows
  • Clean UI: Beautiful progress bars and organized output
  • Zero Dependencies: No external system requirements
  • Smart Cleanup: Complete data removal with pdfx cleanup

Installation

From Source

# Clone the repository
git clone https://github.com/ionnss/pdfx.git
cd pdfx

# Build and install
cargo install --path .

From GitHub

cargo install --git https://github.com/ionnss/pdfx

Usage

Basic Commands

# Initialize PDF index
pdfx init                    # Index current directory
pdfx init ~/Documents        # Index specific directory
pdfx init ~                  # Index entire home directory

# Search indexed PDFs
pdfx search "machine learning"   # Search for keyword in filenames

# List all indexed PDFs
pdfx list                    # Show all PDFs with details

# Export your PDF library
pdfx export                  # Export all formats to Downloads folder
pdfx export --format json    # Export only JSON format
pdfx export --format csv,yaml # Export multiple formats

# Clean up
pdfx cleanup                 # Remove all indexed data

Workflow Example

# 1. First time setup - index your PDFs
pdfx init ~/Documents
#⠋ Counting⠠ Counting files... 221400
#🔍 Indexing PDFs... [00:00:04] [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿           ] 374539/722498 files | 90,889.0426/s | ETA: 4s

#✅ Index complete!
#📊 Summary: 176 PDFs found | 722497 files processed | 122 directories skipped
#✅ Successfully indexed 176 PDFs in /Users/user/Library/Application Support/pdfx/db.sqlite


# 2. Browse your PDF library
pdfx list
# 📋 All Indexed PDFs
# 📊 Total: 170 PDFs
# 📄 1. The Rust Programming Language.pdf
#     Size: 14.37 MB
#     Path: /Users/user/Documents/books/rust.pdf
#     Modified: 2025-01-15 10:30:00

# 3. Search your indexed PDFs instantly
pdfx search "rust programming"

# 4. Export your library for sharing or backup
pdfx export
# Exporting 170 PDFs to /Users/user/Downloads/pdfx_exports
#   ✅ Generated pdfs.json
#   ✅ Generated pdfs.csv
#   ✅ Generated pdfs.md
#   ✅ Generated pdfs.pdf
#   ✅ Generated pdfs.yaml
#   ✅ Generated pdfs.html
# 🎉 Export complete!

# 5. When you're done (optional cleanup)
pdfx cleanup

Export Formats

pdfx supports multiple export formats for your PDF library:

Available Formats

  • JSON: Machine-readable format with full metadata
  • CSV: Spreadsheet-compatible format for data analysis
  • Markdown: Human-readable format with tables
  • YAML: Structured format for configuration files
  • HTML: Web-ready format for sharing online

Export Examples

# Export all formats to Downloads folder
pdfx export

# Export specific formats
pdfx export --format json
pdfx export --format csv,yaml
pdfx export --format html

Export Location

  • Default: ~/Downloads/pdfx_exports/ (or equivalent on your OS)
  • Files: pdfs.json, pdfs.csv, pdfs.md, pdfs.pdf, pdfs.yaml, pdfs.html

Database & Storage

Where Your Data Lives

# macOS
~/Library/Application Support/pdfx/db.sqlite

# Linux  
~/.local/share/pdfx/db.sqlite

# Windows
%APPDATA%/pdfx/db.sqlite

Privacy & Security

  • Local Storage Only: No cloud, no tracking, no data sharing
  • SQLite Database: Industry-standard, portable format
  • Complete Cleanup: pdfx cleanup removes all traces

Requirements

  • Rust: 1.70 or later
  • Operating System: Linux, macOS, or Windows
  • Terminal: Any modern terminal with Unicode support

Development

Setup

git clone https://github.com/ionnss/pdfx.git
cd pdfx
cargo build
cargo run -- --help

Project Structure

src/
├── cli/          # Command-line interface
├── database/     # SQLite database operations
├── indexer/      # PDF file discovery and indexing
├── helpers/      # Utility functions
└── types.rs      # Core data structures

Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Troubleshooting

Common Issues

Q: "Permission denied" errors during scanning

# This is normal on macOS/Linux - system directories are protected
# pdfx will skip these and continue scanning accessible directories

Q: Database seems corrupted or giving errors

pdfx cleanup    # Remove database and start fresh
pdfx init       # Rebuild index

Q: Where is my data stored?

# View database location after running pdfx init
# Path is shown in success message
# Use `pdfx cleanup` to remove all data

Roadmap

Current Status (v0.2.0)

  • PDF Indexing: SQLite-based PDF database with metadata
  • Filename Search: Fast, case-insensitive filename search
  • List Command: Display all indexed PDFs with detailed information
  • Export Data: Export to JSON, CSV, Markdown, PDF, YAML, and HTML formats
  • Cross-Platform: Works on Linux, macOS, and Windows
  • Clean UI: Progress bars and organized output

Planned Features

  • 📅 Recent Command: Show recently modified PDFs
  • 🔍 Advanced Search: Filter by size, date, path
  • 📊 Statistics: Show indexing statistics and storage usage
  • 🏷 Tagging System: Categorize and tag PDFs for better organization
  • 🤖 AI: Chating with pdf files

See FUTURE.md for detailed roadmap and feature plans.


License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

Built with excellence using:

  • Rust - Systems programming language
  • rusqlite - SQLite database operations
  • clap - Command-line argument parsing
  • indicatif - Progress bars and spinners
  • walkdir - Recursive directory traversal
  • chrono - Date and time handling