Menu

How to Import Your Pinboard Bookmarks Into DEVONthink and Convert Them to Searchable Web Archives

Favorites, Uncategorized May 30, 2019

Pinboard is a web-based bookmarking service that can optionally crawl the websites you save and store a complete copy of how they appeared at that time.

Because Pinboard is a good web citizen, they allow you to request an archive of all of your bookmarks and their saved contents as a tar.gz file.

I recently stopped using Pinboard as my primary bookmarking service and wanted to export my data and store it somewhere in a searchable, archived format.

I already use DEVONthink to archive and search all of my scanned documents and PDFs, so it seemed like a natural choice as it also supports just about any other file format – including macOS web archives.

The backup archive that Pinboard gives you contains a folder for each of your bookmarks containing the complete contents of the scraped website as well as a JSON-formatted manifest file of metadata.

I spent a few hours trying to wrangle everything into DEVONThink using some AppleScript trickery, but was never successful. But then two thoughts occurred to me:

  1. You can save a URL to your DEVONThink database and then use a menu command to scrape the website into a PDF or .webarchive.
  2. .webloc files can refer to any URL scheme – including file://.

What if I generated a bunch of .webloc files – each one pointing to the location on disk of my Pinboard bookmarks? And then imported the .weblocs into DEVONThink and told it to crawl those URLs?

It worked!

And if you also happen to have this rather unique need, well, I’ve made the PHP script that does it all for you available on GitHub.

The PHP script in the repo will read the contents of your Pinboard archive and generate a .webloc file for each bookmark. Those files can then be imported into DEVONThink as file:// URLs pointing to the archived web content on disk. Then, DEVONThink can “crawl” those file:// URLs and convert them into searchable web archives. Afterwards, the .webloc files can be deleted.

On my iMac Pro with a fast internet connection, importing 3,500+ bookmarks and their 2GB worth of web content took about four hours. After it was finished, I had a fully searchable archive of all of my Pinboard bookmarks that can be sync’d across all my of Macs and iDevices.

Hopefully someone else will find this script useful.