Serving Static Content on Amazon S3 with s3up

I’ve written twice about using Amazon S3 to host your website’s static content. It’s a great solution for small websites without access to a real content delivery network. And now that Amazon has launched CloudFront on top of S3, it’s even better.

But there are still ways we can improve the performance. The trick is to upload our files using custom headers so they’re served back with a proper expiration date and gzipped when possible — i.e., the techniques recommended by YSlow.

I’ve written a command line tool which simplifies this process called s3up. The idea is simple. It uploads a single file to S3, sets a far future expiration date, gzips the content, and versions the file by combining the filename with a timestamp. Each of these actions are optional and can be controlled via the command line.

The basic syntax:

s3up myS3bucket js/somefile.js somefile.js

would upload a local JavaScript file named somefile.js into your Amazon S3 bucket named myS3bucket inside the js folder.

We can build on this command by adding the -x flag, which tells s3up to set a far future expiration header. By default, it chooses a date ten years in the future:

s3up -x myS3bucket js/somefile.js somefile.js

This lets the browser know to keep the file cached indefinitely.

Passing the -z flag uploads two copies of the file. One normally, and a second that is gzipped and renamed filename.gz.extension. You can then dynamically serve the compressed version to browsers that support it.

Finally, the -t flag uploads and renames the file filename.YYYYmmddHHmmss.extension. This lets you easily create versioned files when you need to update an existing file. (You need to use a new filename since the browser was told earlier to always load from the cache.)

Combining all three options we get:

s3up -txz myS3bucket js/somefile.js somefile.js

This uploads your file, compresses it, sets the correct expiration date, and versions the filename — all in one easy step.

If you’d prefer to choose your own string for versioning, you can specify it with the --version flag:

s3up -txz --version=v2 myS3bucket js/somefile.js somefile.js

In that example, the file would be stored in S3 as somefilev2.js.

s3up also echoes the new filename so you can quickly paste it into your HTML. If you’re on a Mac, you can automatically copy the new name onto your clipboard by piping s3up to the pbcopy command.

If you’d like to take this a step further and integrate s3up into your existing deploy scripts, you can leave off the filename argument and instead pipe the data to upload via stdin. So, if you’re using another tool like the YUI Compressor or byuic, you can run:

somecommand | s3up -txz myS3bucket js/somefile.js | pbcopy

Uploading Multiple Files

A common task is uploading a whole folder of images. You can do this in one step by appending a slash (/) to the S3 filename and using wildcards to specify multiple local files. Example:

s3up myS3bucket images/ /path/to/your/images/*.jpg

When s3up sees images/ ending with a slash, it treats it like a directory and stores all of your files into it. Make sure you remember the slash on images/. That’s what triggers the special “directory” mode uploading.

Download

You can grab the latest copy of s3up from the its GitHub project.

Using Amazon S3 as a Content Delivery Network

[Update: You might also be interested in s3up for storing static content in Amazon S3.]

Earlier this week I posted about my experience redesigning this site, focusing on optimizing my page load times using YSlow. A large part of that process involved storing static content (images, stylesheets, JavaScript) on Amazon S3 and using it like a poor man’s content delivery network (CDN). I made some hand-waving references to a deploy script I wrote which handles syncing content to S3 and also adding expiry headers and gzipping that data. A couple users emailed asking for more info, so, here goes.

Why Amazon S3?

Since its launch, nearly every technical blogger on the net has weighed in on why Amazon S3 is (for lack of a better word) awesome. No need for me to repeat them. I’ll just say quickly that it’s cheap (as in price, not in quality), fast, and easy to use. If you’ve got the deep pockets of a corporation backing you, you could probably find a better deal with another CDN provider, but for bloggers, startups, and small businesses, it’s the best game in town.

Amazon S3 is platform and language agnostic. It’s a massive harddrive in the sky with an open API sitting on top. You can connect to it from any system using just about any programming language. For this tutorial, I’ll be using a slightly modified version of the S3 library I wrote in PHP. I say “slightly modified” because I had to make a few changes to enable setting the expires and gzip headers. These changes will eventually make their way into the official project — I just haven’t done it yet.

The Deploy Process

YSlow recommends hosting static content such as images, stylesheets, and JavaScript files on a CDN to speed up page load times. It’s also best to give each file a far future expiration header (so the browser doesn’t try to reload the asset on each page view) and to gzip it. On a typical webserver like Apache, these are simple changes that you can do programatically through a config file. But Amazon S3 isn’t really a web server. It’s just a “dumb” storage device that happens to be accessible over the web. We’ll need to add the headers ourselves, manually, upfront when we upload.

The deploy script will also need to be smart and not re-upload files that are already on S3 and haven’t changed. To accomplish this we’ll be comparing the file on disk with the ETag value (md5 hash) on S3. Let’s get started.

Images

Deploying images is straight forward.

  • Loop over every image in our /images/ directory.
  • Calculate the file’s md5 hash and compare to the one in S3.
  • If the file doesn’t exist or has changed, upload it using a far futures header.
  • Repeat for the next image.

Source:

<?PHP
    $files = scandir(DOC_ROOT . IMG_PATH);
    foreach($files as $fn)
    {
        if(!in_array(substr($fn, -3), array('jpg', 'png', 'gif'))) continue;
        $object   = IMG_PATH . $fn;
        $the_file = DOC_ROOT . IMG_PATH . $fn;
        // Only upload if the file is different
        if(!$s3->objectIsSame($bucket, $object, md5_file($the_file)))
        {
            echo "Putting $the_file . . . ";
            if($s3->putObject($bucket, $object, $the_file, true, null, array('Expires' => $expires)))
                echo "OK\n";
            else
                echo "ERROR!\n";
        }
        else
            echo "Skipping $the_file\n";
    }

JavaScript and Stylesheets

The same process applies to JavaScript and stylesheets. The only difference is we need to serve gzip encoded versions to browsers that support it. As I said above, S3 won’t do this natively so we need to fake it by uploading a plaintext and a gzipped version of each file and then use PHP to serve the appropriate one to the user.

In the master config file on my website, I set a variable called $gz like so:

<?PHP
    $gz  = strpos($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip') !== false ? 'gz.' : '';

That snippet detects if the user’s browser supports gzip encoding and sets the variable appropriately. Then, throughout the site, I link to all of my JavaScript and CSS files like this:

<link rel="stylesheet" href="https://clickontyler-clickonideas.netdna-ssl.com/css/main.<?PHP echo $gz;?>css" type="text/css">

That way, if the $gz variable is set, it adds a “gz.” to the filename. Otherwise, the filename doesn’t change. It’s a quick way to transparently give the right file to the browser.

With that out of the way, here’s how I deploy the gzipped content:

<?PHP
    // List your stylesheets here for concatenation...
    $css  = file_get_contents(DOC_ROOT . CSS_PATH . 'reset-fonts-grids.css') . "\n\n";
    $css .= file_get_contents(DOC_ROOT . CSS_PATH . 'screen.css') . "\n\n";
    $css .= file_get_contents(DOC_ROOT . CSS_PATH . 'jquery.lightbox.css') . "\n\n";
    $css .= file_get_contents(DOC_ROOT . CSS_PATH . 'syntax.css') . "\n\n";
    file_put_contents(DOC_ROOT . CSS_PATH . 'combined.css', $css);
    shell_exec('gzip -c ' . DOC_ROOT . CSS_PATH . 'combined.css > ' . DOC_ROOT . CSS_PATH . 'combined.gz.css');
    if(!$s3->objectIsSame($bucket, CSS_PATH . 'combined.css', md5_file(DOC_ROOT . CSS_PATH . 'combined.css')))
    {
        echo "Putting combined.css...";
        if($s3->putObject($bucket, CSS_PATH . 'combined.css', DOC_ROOT . CSS_PATH . 'combined.css', true, null, array('Expires' => $expires)))
            echo "OK\n";
        else
            echo "ERROR!\n";

        echo "Putting combined.gz.css...";
        if($s3->putObject($bucket, CSS_PATH . 'combined.gz.css', DOC_ROOT . CSS_PATH . '/combined.gz.css', true, null, array('Expires' => $expires, 'Content-Encoding' => 'gzip')))
            echo "OK\n";
        else
            echo "ERROR!\n";
    }
    else
        echo "Skipping combined.css\n";

You’ll notice that the first thing I do is concatenate all of my files into a single file — that’s another YSlow recommendation to speed things up. From there, we compress using gzip and then up the two versions. Looking at this code, there’s probably a native PHP extension to handle the gzipping instead of exec’ing a shell command, but I haven’t looked into it (yet).

Also, make sure and notice that I’m adding a Content-Encoding: gzip header to each file. If you don’t do this, the browser will crap out on you when it tries to read the file as plaintext.

And We’re Done

So those are the main bits of the script. You can download the full script (and the S3 library) from my Google Code project.

Building a Better Website With Yahoo!

It’s been a long time coming, but I finally pushed out a new design for this website last month. I rebuilt it from the ground up using two key tools from the Yahoo! Developer Network:

The new design is really a refresh of the previous look with a focus on readability and speed. I want to take a few minutes and touch on what I learned during this go-round so (hopefully) others might benefit.

Color Scheme

Although I really liked the darker color scheme from before, it was too hard to read. There simply wasn’t enough contrast between the body text and the black background. I tried my best to make it work — I searched around for various articles about text legibility on dark backgrounds. I increased the letter spacing, the leading, narrowed the body columns, and everything else I learned in the intro graphic design class I took in college. The results were better, but my gut agreed with all the articles I read online which basically said “don’t do it.” So I compromised and switched to a white body background, while leaving the header mostly untouched. I find the new look much more readable — hopefully this will encourage me to begin writing longer posts.

CSS and Semantic Structure

The old site was built piecemeal over a couple months and, quite frankly, turned into a mess font-wise. I had inconsistent headers, font-weights, and anchor styles depending on which section you were viewing. With the new design, I sat down (as I should have before) and decided explicitly on which font family, size, and color to use for each header. I specced out the font sizes using YUI’s percent-based scheme which helps ensure a consistent look when users adjust the size. (Go ahead, scale the font size up and down.) An added bonus was that it forced me to think more about the semantic structure of my markup. (If you have Firefox’s Web Developer toolbar installed, try viewing the site with stylesheets turned off.) If there’s one thing I learned working for Sitening, it’s that semantic structure plays a huge part in your SERPs.

Optimizing With YSlow

At OSCON last summer, I attended one of the first talks Steve Souders gave on YSlow — a Firefox plugin that measures website performance. That, plus working for Yahoo!, has kept the techniques suggested by YSlow in the back of my head with every site I build. But this redesign was the first time I committed to scoring as high as I could.

As usual, I coded everything by hand, paying attention to all the typical SEO rules that I learned at Sitening. Once the initial design was complete and I had a working home page, I ran YSlow.

YSlow Before

Ouch. A failing 56 out of 100. What did YSlow suggest I improve? And how did I fix it?

  • Make fewer HTTP requests – My site was including too many files. Three CSS stylesheets, four JavaScript files, plus any images on the page. I can’t cut down on the amount of images (without resorting to using sprites – which are usually more trouble than they’re worth), so I concatenated my CSS and JS into single files. That removed five requests and brought me up to an “A” ranking for that category. (I’m further toying with adding the YUI Compressor into the mix.)
  • Use a content delivery network – At Yahoo! we put all static files on Akamai. Other large websites like Facebook, Google, and MySpace push to their own CDNs, too. But what’s a single developer to do? Use Amazon S3 of course! I put together a quick PHP script which syncs all of my static content (images, css, js) and stores them on S3. Throughout the site, I prepend each link with a PHP variable that lets me switch the CDN on or off depending on if I’m running locally or on my production server. (And, in the event S3 ever goes down or away, I can quick switch back to serving files off my own domain.)
  • Add expiry headers – Expiry headers tell the browser to cache static content and not attempt to reload it on each page view. I didn’t want to put a far future header on my PHP files (since they change often), but I did add them to all of the content stored on S3. This is fine for images that should never change, but for my JavaScript and CSS files it means I need to change their filename whenever I push out a new update so the browser knows to re-download the content. It’s extra work on my part, but it pays off later on.
  • Gzip files – This fix comes in two parts. First, I modified Apache to serve gzipped content if the browser supports it (most do) — not only does this cut down on transfer time, but it also decreases the amount of bandwidth I’m serving. But what about content coming from S3? Amazon doesn’t support gzipping content natively. Instead, in addition to the static files stored there, I also uploaded their gzipped counterparts. Then, using PHP, I change the HTML links to reference the gzip versions if I detect the user’s browser can handle it.
  • Configure ETags – ETags are a hash provided by the webserver that the browser can use to determine if a file has been modified before downloading it. Amazon S3 automatically generates ETags for every file — it’s just a free benefit of using S3 as my CDN.

So, all of the changes above took about three hours to implement. Most of that time was spent writing my S3 deploy script and figuring out how to make Amazon serve gzipped content. Was it worth it? See for yourself.

YSlow Before

Wow. Three short hours of work and I jumped to a near perfect 96 out of 100. The only remaining penalty is from not serving an expires header on my Mint JavaScript.

Do these optimization techniques make a difference? I think so. Visually, I can tell there’s a huge increase in page rendering time on both Firefox and Safari. (IE accounts for 6% of my traffic, so I don’t bother testing there any longer.) More amazing, perhaps, is the site’s performance on iPhone. The page doesn’t just load — it appears.

I’ve made a bunch of vague references to the S3 deploy script I’m using and how to setup gzip on Amazon. In the interest of space, I’ve left out the specifics. If you’re interested, email me with any questions and I’ll be happy to help.