PDF tools

25 Apr 2026 • 2 min read

A CLI toolkit of focused Python utilities for PDF manipulation — merge, split, redact, encrypt, OCR, and more, with no rate limits or cloud dependencies.

PDF Tools

A focused collection of single-purpose CLI utilities for PDF manipulation — built to run locally, offline, and without rate limits.

Motivation

PDF workflows are a recurring pain point. Merging reports, splitting contracts, redacting sensitive data, making scanned documents searchable — these are tasks that come up regularly and never at a convenient time.

The typical solution is a free online tool. They work, but they come with friction: daily operation quotas, privacy concerns around uploading sensitive documents to third-party servers, and the workarounds that follow — clearing cookies, switching browsers, hunting for alternatives.

The obvious fix is a local toolchain. PDF manipulation is a solved problem in Python — libraries like pypdf, PyMuPDF, and pytesseract cover virtually every use case. The gap was just having purpose-built scripts for each operation rather than reaching for a web UI.

This toolkit was also an experiment in AI-assisted development. Python isn’t my primary language, and building this from scratch would have meant a significant ramp-up. Using an LLM as a coding pair let me focus on the problem domain rather than library APIs — and the result is a set of tools I actually use.

The source code is here .

Stack

LayerLibraries
PDF parsing & manipulationpypdf, pymupdf
OCR engine (optional)pytesseract, Pillow
PDF-to-Word conversionpdf2docx
Encryptioncryptography
Progress reportingtqdm

Installation

BASH
pip install -r requirements.txt

OCR support requires Tesseract ( project page ):

BASH
# macOS
brew install tesseract

# Ubuntu / Debian
sudo apt install tesseract-ocr

# Windows
winget install UB-Mannheim.TesseractOCR

Tools

ScriptOperation
pdf_bookmarks.pyInspect and manage bookmarks / table of contents
pdf_compressor.pyReduce file size via image and stream optimization
pdf_decryptor.pyStrip password protection from an encrypted PDF
pdf_diff.pyVisual diff between two PDF versions
pdf_inspector.pyDump page coordinates — useful for targeting redactions
pdf_merger.pyConcatenate multiple PDFs into one
pdf_ocr.pyRun Tesseract OCR to produce a searchable, text-layer PDF
pdf_page_numbers.pyStamp visible page numbers onto each page
pdf_protector.pyEncrypt a PDF with a user-supplied password
pdf_redactor.pyPermanently burn redaction boxes into the page (no hidden data)
pdf_reorder.pyRearrange pages by specifying a new page order
pdf_rotator.pyRotate individual pages or the entire document
pdf_splitter.pySplit a PDF by page ranges into separate files
pdf_to_docx.pyConvert a PDF to an editable Word document
pdf_to_images.pyRender each page to an image (PNG/JPEG)
pdf_from_images.pyPack a set of images into a single PDF
pdf_watermark.pyOverlay a text watermark on every page

Start searching

Enter keywords to search articles.