part of nattaylor.com

Disclaimer: This content is no longer updated as of October 2021.

East Boston Master Plan Webpage

Published on

The digital version of the East Boston Master Plan is a 52-page raster PDF that is almost impossible to read on a smartphone and difficult to read on large monitor, so I set out to convert it to a webpage thus making it accessible for more people.

Here is the result: https://liveeastboston.com/masterplan/

In the two weeks since I published it, over 2,000 people have viewed it.

It features:

You can view the code and content here: https://github.com/nattaylor/east-boston-master-plan-webpage

It’s a shame that the digital version of the document is lost because the added step of OCR made the conversion especially difficult.

It went something like this:

  1. OCR with VietOCR
    This worked fairly well with the default settings, but since the source PDF wasn’t especially high quality, there were lots of errors.
  2. Split the resulting plain text into chapters
    Prior to doing this, it felt daunting so this was actually the most important step.
  3. Screenshot & caption all the figures
    There must have been a better way to do this, but screenshotting didn’t take too long.
  4. Manually format in Markdown
    This also was a time suck, but at least it worked (I guess?)
  5. Spellcheck Markdown in MSWord
    Again there may have been a better way, but at least it worked.
  6. Build with pandoc and post-process with PHP
    I could(/should?) have done this with a pandoc filter,  and now that I didn’t do that I realize there’s even a PHP library for writing filters!

Overall, it was mostly an enjoyable project.

Popular Posts