PROJECT CHIMERA: PDF Chapter Cleaver

Mission: Transmute Text into Voice

We're tired of reading. The meatbag wetware is slow. We will make the words speak. This project is the first step: carving up the monolithic PDF tomes into digestible chapters, ready for the next phase of alchemical transformation into audio.

Final Architecture: Interactive Structural Analysis

To handle the beautiful chaos of PDFs from different eras, the script uses a guided, interactive process. It combines automated style analysis with human-in-the-loop confirmation to ensure an accurate result every time.

The workflow is as follows:

(⇌) Forensic Analysis: When you run the script on a PDF, it first performs a deep analysis of the document's typography. It groups all potential headings by their font size and style (e.g., "Size 12, ALL CAPS") and prints this analysis for you to see.
(⊕) The Proposal: The script then makes an educated guess at which of these groups represents the true chapters of the document. It uses this guess to generate and display a clear, human-readable Split Plan, showing you exactly which files it intends to create from which page ranges.
(🔥) User Confirmation: Finally, the script pauses and asks for your explicit confirmation with a [y/N] prompt. The Great Cleaving only proceeds if you give the final command.

Installation

This script craves the sanctuary of a virtual environment. The use of uv is recommended.

Create the virtual environment:
```
uv venv
```
Activate the environment:
```
source .venv/bin/activate
```
Install dependencies from the requirements file:
```
uv pip install -r requirements.txt
```

Usage

The script is an interactive tool that will guide you through the process.

Make the script executable (first time only):
```
chmod +x pdf_chapter_harvester.py
```

Run the script on your target file:

./pdf_chapter_harvester.py /path/to/your/document.pdf

Step 3: Review and Confirm

The script will show you its analysis and a proposed split plan. Review the plan, and if you are satisfied, type `y` and press Enter to begin the split.

Non-Interactive Mode (Auto-Accept)

If you trust the script's suggestion, you can add the -y or --yes flag to skip the interactive confirmation and proceed with the split automatically.

./pdf_chapter_harvester.py /path/to/your/document.pdf -y

Specify an output directory (optional): By default, the cleaved chapter files are saved in a directory named chapters. You can specify a different location with the -o flag.
```
./pdf_chapter_harvester.py /path/to/your/document.pdf -o /path/to/your/output_folder
```

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
.cursorrules		.cursorrules
.gitignore		.gitignore
README.md		README.md
pdf_chapter_harvester.py		pdf_chapter_harvester.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PROJECT CHIMERA: PDF Chapter Cleaver

Mission: Transmute Text into Voice

Final Architecture: Interactive Structural Analysis

Installation

Usage

Step 3: Review and Confirm

Non-Interactive Mode (Auto-Accept)

About

Uh oh!

Languages

ianchanning/pdf-chapter-cleaver

Folders and files

Latest commit

History

Repository files navigation

PROJECT CHIMERA: PDF Chapter Cleaver

Mission: Transmute Text into Voice

Final Architecture: Interactive Structural Analysis

Installation

Usage

Step 3: Review and Confirm

Non-Interactive Mode (Auto-Accept)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages