Importing Jekyll Posts into WordPress

Nearly four years ago I switched my main site over to Jekyll. It’s been great. But late last year I decided to make that site and its blog purely about my software business and move all of my non-work posts over to my tyler.io domain so I could have a personal site again. To encourage myself to write more, I built the site with WordPress so it would be easy to publish. That meant I needed a way to convert and import all of my old Jekyll Markdown posts into WordPress. I found a few scripts that exported WordPress into Jekyll, but not the other way around. So I hacked together my own script, which I’ve pasted below. Hopefully this will help anyone wanting to make the same transition.

The script takes a directory of Markdown posts in the following format, reads their header meta-data, and imports them into your WordPress database.

date: 2013-04-08 20:57:14
title: PebbleCam
layout: post
permalink: /blog/2013/04/pebblecam/index.html
slug: pebblecam
---
Post content...

Publishing Your Blog with Dropbox and Jekyll

Back in August I wrote about my experience switching this blog from WordPress to Jekyll. Three months in, I’m happy to report everything is going swimmingly. I survived a few high traffic moments from Hacker News and was thrilled to see the site stay up even when I managed to break MySQL on the server.

The only issue I’ve faced is a higher barrier to writing new content. Switching away from WordPress means I had to give up their web interface and one-click posting. Instead, my workflow is

  1. Write a post using some text editor – typically TextMate on my laptop.
  2. Preview and double-check that the rendered Markdown content is correct.
  3. Commit the file into git.
  4. ssh into my web server
  5. git pull the new post
  6. Run jekyll

As you might imagine, steps three through six are a little annoying. They’re just invasive enough that I dread — just a little bit — adding new content and especially correcting typos.

What I want is something more automatic. Thanks to Dropbox and a little server side magic, I’ve got a solution that completely eliminates those last four steps. And while I know I’m not the first person to come up with the following solution (although I’m having trouble finding another example online at the moment), I do want to document my setup both for my sake and anyone else looking for the steps involved.

Here’s what’s going to happen:

  1. Write a post using markdown and save it into the _posts folder of my Jekyll site stored in Dropbox.
  2. The file gets synced to my server which is also running Dropbox.
  3. A cron job on the server notices the new file and automatically runs Jekyll, updating my site with the new content.

Other than actually writing the content, everything else is automatic. The whole system took about twenty minutes to setup. Here’s how…

Configuring Dropbox

I’m assuming you’ve already got a Jekyll site built and stored somewhere in Dropbox. The next step is to share that folder via Dropbox with your server. Installing Dropbox on Ubuntu is relatively painless if you know your way around the command line. Per their instructions

cd ~ && wget -O - http://www.dropbox.com/download?plat=lnx.x86_64 | tar xzf -

Then, you’ll want to download their helper script that lets you start/stop the Dropbox daemon. It’s linked at the bottom of their Linux installer page.

Once you’ve got Dropbox installed, I’d suggest creating a new account just for your server. This lets you selectively share folders of content from your primary Dropbox account. This is important for a couple reasons. First off, I’ve got 60GB of data in Drobox — that’s way more than my small Rackspace cloud instance can handle. Also, I simply don’t feel comfortable having so much personal information just sitting around on my server.

With the software installed and running, use Dropbox to share your Jekyll folder with your new server account and wait for it to sync.

Watching for Changes

The next step is putting in place a process to automatically watch for changes to files in our Jekyll _posts folder and then rebuild the site. I’m sure there are a bunch of tools available on Linux to handle this; the first one I ran across was incron. It was surprisingly easy to setup. Like a cron job, you give it a command to run and when to run it. But instead of a date/time, you give it a path to watch and which filesystem events to listen for. Installing was simple:

sudo apt-get install incron

Then, you need to give your user account permission to run incron jobs.

sudo vim /etc/incron.allow

and add your user account name to the list — save your changes.

Finally, add your job via

icrontab -e

The icrontab jon syntax looks like

<path to watch> <file system event conditions> <command>

On my system, that ends up looking like

/path/to/Dropbox/jekyll/_posts IN_MODIFY,IN_DELETE,IN_CLOSE_WRITE,IN_MOVE /path/to/jekyll /path/to/Dropbox/jekyll /var/www/clickontyler.com

From then on, any changes to your _posts folder should automatically trigger a rebuild of your site.

Switching From WordPress to Jekyll

Last week I finally took the plunge and completely switched this website from WordPress, which I had been using for over four years, to Jekyll. There are tons of articles online about switching, so I’m not going to attempt to write any sort of exhaustive guide about the process. These are just my own first impressions — one week in — along with a few lessons learned and a couple scripts I wrote to automate the process.

Why switch?

First off, let me be clear that I didn’t switch because of any failing on WordPress’s part. I’ve been a happy WP user for years, and I’d still recommend it to other web writers with no reservations. However, because of its dynamic nature, WordPress is succeptible two to problems that I got tired of dealing with:

  1. WordPress can be slow. Because WordPress renders your site from a database on each page view, it can quickly grind to a halt during a burst of traffic. And before you email me, YES, I’m well aware of all the caching best practices and plugins you can use to speed up things. But short of having WordPress output the entire site as static html files after each change, you’re always going to run into some initial PHP overhead. Even with WP-Super-Cache installed and tuned, this site became unresponsive the last two times I landed on Reddit and Hacker News. That’s unacceptable.

  2. Security updates are a bitch. That’s especially true for a self-hosted install of WordPress. Every security point release is an annoying fifteen minutes out of my day where I have to download the latest release, upload to my server, test for any regression issues, commit the changes into Git, etc. I’ve done this a thousand times before and frankly I’ve got better things to do with my time. I don’t blame WordPress for the security fixes. In fact, I applaud them for reacting so quickly. As the most popular blogging platform I know they’re a huge target and they do a great job managing that risk. I just don’t want to deal with it anymore. With static HTML files there is no attack vector to worry about.

  3. Let’s be honest. I’m a geek, and the thought of keeping my site organized as a few folders of text files in a git repo is awesome.

Switching

I had poked around and exprimented with Jekyll a few times before finally deciding to swtich, so I was already familiar with how the system works. (The docs are available if you want to know more.) As a bonus, I’ve been writing my blog posts in Markdown for years, so there were really only two steps between me and a fully static site:

  1. Pull all my blog posts and pages out of WordPress’s database and save them as Jekyll formatted text files.

  2. Convert my existing WordPress theme into a Jekyll layout.

For those who are wondering, the whole proceess took about three hours on a Saturday night. Not too shabby.

Exporting Out of WordPress

The first big step towards migrating to Jekyll is getting all of your content out of WordPress into a format Jekyll can use. Buried deep inside the Jekyll Ruby gem is an importer script for most of the major blogging platforms including WordPress. Unfortunately for me, I don’t know Ruby, and I’m not familar with the gem system. I fooled around with their (seemingly out of date) instructions, but decided it would be faster and more foolproof just to write my own export script. Many of my pages and blog posts have custom post fields attached to them for setting things like page titles and URL slugs. Writing my own script ensured all those settings would come through during the export.

As for the script itself, there’s not much to it. It pulls all the content from your WordPress database and saves each post and page out as a Jekyll formated text file.

Building the Layout

Creating the Jekyll layouts were suprisingly simple. I basically just took my existing HTML as rendered by WordPress, saved it onto my desktop, and cut it up into a few template and include files. The layouts and includes are available to look through.

The flow of the templates is faily simple. Each Jekyll controlled page inherits from the layouts/default.html file.

{% include header.html %}
{{ content }}
{% include footer.html %}

The header.html and footer.html includes are just raw HTML that build out the bulk of the site. One thing to note inside each is that I’m using a bunch of Jekyll variables that are echoed out during the Liquid processing. Each page’s title and meta description is pulled from front matter defined in the corresponding Jekyll file. I’m also prefixing all of my static content URLs (images, stylesheets, JavaScript, etc) with a site.cdn variable which is defined globally. Currently, this points to my CDN domain on MaxCDN, but if they ever should go down (or if I switched away) I only have to change one line and re-run Jekyll to begin serving content from an alternate domain.

Any Concessions?

Yep, but only one. While I’m sure with enough hacking around I could have totally replicated my WordPress site’s structure, I didn’t want to spend a lot of time rebuilding a bunch of archive pages that didn’t matter other than (perhaps) for search engine ranking. So my old monthly archive pages as well as the indexed blog pages went the way of the dodo. To make up for the blog index, and to ensure my old content stays available in Google, I setup a simple index listing all of my posts, ordered by date.

Odds and Ends

Migrating to Jekyll gave me an opportunity to go through my four year old Amazon S3 bucket where I store all of my static content. A lot of cruft and abandoned files have built up over the years, so this was a good chance to clean it out. With a few thousand files to go through, I certainly wasn’t going to do it by hand. So here’s a quick script that scans a local copy of the bucket and checks each file to see if it’s referenced anywhere on my site. If not found, it deletes the unused file. It was incredibly easy to do since all of my content is now plain text. (Yay, Jekyll!)

For Any Mac Developers Out There

I’m putting this entire site’s content online in GitHub. Not because that’s where it’s hosted or deployed from, but simply so other people can poke around and hopefully find some useful snippet. Along with all the Jekyll stuff, you’ll also find quite a few PHP scripts buried in my Mac app folders. These are all the integration scripts that connect this site to Shine – the indie Mac dashboard I use to run my little software business. These scripts do things like process upgrades, serve downloads, display changelogs, etc. It’s all there. Just go exploring and you’re bound to find them. And if you have any questions about how/why I did something, feel free to ask.