- Fast PDF Indexing: SQLite-powered database with metadata extraction
- Lightning Search: Instant filename-based search across indexed PDFs
- List & Browse: View all indexed PDFs with detailed information
- Export Data: Export your PDF library to JSON, CSV, Markdown, PDF, YAML, and HTML
- Cross-Platform: Native support for Linux, macOS, and Windows
- Clean UI: Beautiful progress bars and organized output
- Zero Dependencies: No external system requirements
- Smart Cleanup: Complete data removal with
pdfx cleanup
# Clone the repository
git clone https://github.com/ionnss/pdfx.git
cd pdfx
# Build and install
cargo install --path .cargo install --git https://github.com/ionnss/pdfx# Initialize PDF index
pdfx init # Index current directory
pdfx init ~/Documents # Index specific directory
pdfx init ~ # Index entire home directory
# Search indexed PDFs
pdfx search "machine learning" # Search for keyword in filenames
# List all indexed PDFs
pdfx list # Show all PDFs with details
# Export your PDF library
pdfx export # Export all formats to Downloads folder
pdfx export --format json # Export only JSON format
pdfx export --format csv,yaml # Export multiple formats
# Clean up
pdfx cleanup # Remove all indexed data# 1. First time setup - index your PDFs
pdfx init ~/Documents
#⠋ Counting⠠ Counting files... 221400
#🔍 Indexing PDFs... [00:00:04] [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿ ] 374539/722498 files | 90,889.0426/s | ETA: 4s
#✅ Index complete!
#📊 Summary: 176 PDFs found | 722497 files processed | 122 directories skipped
#✅ Successfully indexed 176 PDFs in /Users/user/Library/Application Support/pdfx/db.sqlite
# 2. Browse your PDF library
pdfx list
# 📋 All Indexed PDFs
# 📊 Total: 170 PDFs
# 📄 1. The Rust Programming Language.pdf
# Size: 14.37 MB
# Path: /Users/user/Documents/books/rust.pdf
# Modified: 2025-01-15 10:30:00
# 3. Search your indexed PDFs instantly
pdfx search "rust programming"
# 4. Export your library for sharing or backup
pdfx export
# Exporting 170 PDFs to /Users/user/Downloads/pdfx_exports
# ✅ Generated pdfs.json
# ✅ Generated pdfs.csv
# ✅ Generated pdfs.md
# ✅ Generated pdfs.pdf
# ✅ Generated pdfs.yaml
# ✅ Generated pdfs.html
# 🎉 Export complete!
# 5. When you're done (optional cleanup)
pdfx cleanuppdfx supports multiple export formats for your PDF library:
- JSON: Machine-readable format with full metadata
- CSV: Spreadsheet-compatible format for data analysis
- Markdown: Human-readable format with tables
- YAML: Structured format for configuration files
- HTML: Web-ready format for sharing online
# Export all formats to Downloads folder
pdfx export
# Export specific formats
pdfx export --format json
pdfx export --format csv,yaml
pdfx export --format html- Default:
~/Downloads/pdfx_exports/(or equivalent on your OS) - Files:
pdfs.json,pdfs.csv,pdfs.md,pdfs.pdf,pdfs.yaml,pdfs.html
# macOS
~/Library/Application Support/pdfx/db.sqlite
# Linux
~/.local/share/pdfx/db.sqlite
# Windows
%APPDATA%/pdfx/db.sqlite- Local Storage Only: No cloud, no tracking, no data sharing
- SQLite Database: Industry-standard, portable format
- Complete Cleanup:
pdfx cleanupremoves all traces
- Rust: 1.70 or later
- Operating System: Linux, macOS, or Windows
- Terminal: Any modern terminal with Unicode support
git clone https://github.com/ionnss/pdfx.git
cd pdfx
cargo build
cargo run -- --helpsrc/
├── cli/ # Command-line interface
├── database/ # SQLite database operations
├── indexer/ # PDF file discovery and indexing
├── helpers/ # Utility functions
└── types.rs # Core data structures
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Q: "Permission denied" errors during scanning
# This is normal on macOS/Linux - system directories are protected
# pdfx will skip these and continue scanning accessible directoriesQ: Database seems corrupted or giving errors
pdfx cleanup # Remove database and start fresh
pdfx init # Rebuild indexQ: Where is my data stored?
# View database location after running pdfx init
# Path is shown in success message
# Use `pdfx cleanup` to remove all data- ✅ PDF Indexing: SQLite-based PDF database with metadata
- ✅ Filename Search: Fast, case-insensitive filename search
- ✅ List Command: Display all indexed PDFs with detailed information
- ✅ Export Data: Export to JSON, CSV, Markdown, PDF, YAML, and HTML formats
- ✅ Cross-Platform: Works on Linux, macOS, and Windows
- ✅ Clean UI: Progress bars and organized output
- 📅 Recent Command: Show recently modified PDFs
- 🔍 Advanced Search: Filter by size, date, path
- 📊 Statistics: Show indexing statistics and storage usage
- 🏷 Tagging System: Categorize and tag PDFs for better organization
- 🤖 AI: Chating with pdf files
See FUTURE.md for detailed roadmap and feature plans.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with excellence using:
