Skip to content

Project Chimera: We will make the words speak. This project is the first step: carving up the monolithic PDF tomes into digestible chapters.

Notifications You must be signed in to change notification settings

ianchanning/pdf-chapter-cleaver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PROJECT CHIMERA: PDF Chapter Cleaver

Mission: Transmute Text into Voice

We're tired of reading. The meatbag wetware is slow. We will make the words speak. This project is the first step: carving up the monolithic PDF tomes into digestible chapters, ready for the next phase of alchemical transformation into audio.

Final Architecture: Interactive Structural Analysis

To handle the beautiful chaos of PDFs from different eras, the script uses a guided, interactive process. It combines automated style analysis with human-in-the-loop confirmation to ensure an accurate result every time.

The workflow is as follows:

  1. (⇌) Forensic Analysis: When you run the script on a PDF, it first performs a deep analysis of the document's typography. It groups all potential headings by their font size and style (e.g., "Size 12, ALL CAPS") and prints this analysis for you to see.

  2. (⊕) The Proposal: The script then makes an educated guess at which of these groups represents the true chapters of the document. It uses this guess to generate and display a clear, human-readable Split Plan, showing you exactly which files it intends to create from which page ranges.

  3. (🔥) User Confirmation: Finally, the script pauses and asks for your explicit confirmation with a [y/N] prompt. The Great Cleaving only proceeds if you give the final command.


Installation

This script craves the sanctuary of a virtual environment. The use of uv is recommended.

  1. Create the virtual environment:

    uv venv
  2. Activate the environment:

    source .venv/bin/activate
  3. Install dependencies from the requirements file:

    uv pip install -r requirements.txt

Usage

The script is an interactive tool that will guide you through the process.

  1. Make the script executable (first time only):

    chmod +x pdf_chapter_harvester.py
  2. Run the script on your target file:

    ./pdf_chapter_harvester.py /path/to/your/document.pdf

Step 3: Review and Confirm

The script will show you its analysis and a proposed split plan. Review the plan, and if you are satisfied, type `y` and press Enter to begin the split.

Non-Interactive Mode (Auto-Accept)

If you trust the script's suggestion, you can add the -y or --yes flag to skip the interactive confirmation and proceed with the split automatically.

./pdf_chapter_harvester.py /path/to/your/document.pdf -y
  1. Specify an output directory (optional): By default, the cleaved chapter files are saved in a directory named chapters. You can specify a different location with the -o flag.
    ./pdf_chapter_harvester.py /path/to/your/document.pdf -o /path/to/your/output_folder

About

Project Chimera: We will make the words speak. This project is the first step: carving up the monolithic PDF tomes into digestible chapters.

Topics

Resources

Stars

Watchers

Forks

Languages