East Boston Master Plan Webpage
The digital version of the East Boston Master Plan is a 52-page raster PDF that is almost impossible to read on a smartphone and difficult to read on large monitor, so I set out to convert it to a webpage thus making it accessible for more people.
Here is the result: https://liveeastboston.com/masterplan/
In the two weeks since I published it, over 2,000 people have viewed it.
- Responsive layout and images for all screen sizes
- Markdown files for easy version control
- Builds into a static file to simplify deployment
You can view the code and content here: https://github.com/nattaylor/east-boston-master-plan-webpage
It’s a shame that the digital version of the document is lost because the added step of OCR made the conversion especially difficult.
It went something like this:
- OCR with VietOCR
This worked fairly well with the default settings, but since the source PDF wasn’t especially high quality, there were lots of errors.
- Split the resulting plain text into chapters
Prior to doing this, it felt daunting so this was actually the most important step.
- Screenshot & caption all the figures
There must have been a better way to do this, but screenshotting didn’t take too long.
- Manually format in Markdown
This also was a time suck, but at least it worked (I guess?)
- Spellcheck Markdown in MSWord
Again there may have been a better way, but at least it worked.
- Build with
pandocand post-process with PHP
I could(/should?) have done this with a pandoc filter, and now that I didn’t do that I realize there’s even a PHP library for writing filters!
Overall, it was mostly an enjoyable project.