MVIDELL.SE

pdftoc

A quick draft written about the project.

Problem: I have a lot of PDFs that I read, many of them do not have a proper Table of Contents (called Outline in the PDF specification). I wrote a Recursive Descent Parser (in C/C++) that parses the PDF file to find the outline.

You still have to do the hard work of writing the Table of Contents to a file. The goal was that you should be able to just copy over the Table of Contents inside the book (if it exists). Some care is needed to set the ranges correctly. Many books start off with a roman numeral section before page 1, e.g. in K&R (see demo) Contents starts on page v and Introduction on page 1 (15th page in the PDF).

The code for the project can be downloaded with git clone https://mvidell.se/pdftoc.git