Backing Up Everything (Again)

This will take a while. Bear with me.

I’m obsessive about backing up my data. I don’t want to take the chance of ever losing anything important. But that doesn’t mean I’m a data hoarder. I like to think I’m pragmatic about it. And I don’t trust anyone else to do it for me.

From around 2006 to 2012, I kept a Mac mini attached to our TV with a Drobo hanging off the back. It had all our downloaded movies on it. And every night it would automatically download the latest releases of our favorite TV shows from Usenet so my wife and I could watch them with Plex the next day. It worked great, and all the media files were stored redundantly across multiple hard drives with tons of storage space. (Would it survive a house fire? No. But files like that weren’t critical.) But with the rise of streaming services and useful pay-to-watch stores like iTunes, now I’d rather just pay someone else to handle all of that for me. So, I don’t keep any media files like that locally any longer.

But my email? My financial and business documents? My family’s photo and home video archive? I’m really obsessive about that.

For most of my computing life, all of that data was small enough to fit on my laptop or desktop’s hard drive. In college, I remember burning a CD (not a DVD) every few months will all of my school work, source code, and photos on it for safe keeping. The internet wasn’t yet fast enough to make backing up to a cloud (were clouds even a thing back then?) feasible, so as my data grew I just cloned everything nightly to a spare drive using SuperDuper and Time Machine. It worked for the most part. Sure, I still worried about my house catching fire and destroying my backups, but there really wasn’t an alternative other than occasionally taking one of the backup drives to work or a friend’s house.

But then the internet got fast, really fast, and syncing everything to the cloud became easy and affordable. I was a beta user of Gmail back in 2004. I was an early paid subscriber of Dropbox since around 2008. All of my data was stored in their services and fully available on every computer and – eventually – mobile device. At the time, I thought I had reached peak-backup.

I was wrong.

Now we have too much data. My email is around 20GB. My family’s photo library is approaching 500GB. That’s more data than will fit on my laptop’s puny SSD. It will fit on my iMac, but it leaves precious available space for anything else. I could connect external drives, but that gets messy and further complicates my local backup routine. (Yes, Backblaze is a good, potential solution to that.)

Another problem is that most of our data now is either created directly in the cloud (email, Google Docs, etc) or is immediately sent to it (iPhone photos uploaded to iCloud and/or Google Photos), bypassing my local storage. If you trust Google (or Apple) to keep your data safe and backed up, that’s great. I don’t. I’ve heard too many horror stories about one of Google’s automated AI systems flagging an account and locking out the user. And with no way to contact an actual human, you’re dead in the water along with all your data. Especially if you lose access to your primary email account, which is the key to all your other online accounts.

So, I need a way to backup my newly created cloud data, too. This is getting complicated.

First step. My email. This is easy. Five years ago I setup new email addresses for my personal and business accounts with Fastmail. They’re amazing. I imported my 10+ years worth of email from Google (sadly, my pre-2004 college email and personal accounts are lost to the ether), setup a forwarding rule in Gmail, and with the help of 1Password, changed all of my online services to use my new email. It took about a month to switch everything over, but now the only email coming to my old Gmail address is spam. Fastmail keeps redundant backups of my email. And I have full IMAP copies available on multiple computers in case they don’t. And if something ever goes wrong, unlike Google where their advertisers are the customer – and I’m the product – I pay Fastmail every month and can call up a live human to talk to.

Source code. I’m a paying GitHub customer. Everything’s stored and backed up there. But still, what if they screw up. I ran a small, self-hosted server with GitLab on it for a while instead of GitHub and set it to backup all my code nightly to S3. That worked great. But, I like GitHub’s UI and feature set better. Plus, it’s one less server I have to manage. So, where do I mirror my code to? (Much of my code is checked out locally on my computer, but not all of it.)

Back in 2006, my boss at the web agency I was working at told me about rsync.net. They provide you with a non-interactive Unix shell account that you can pipe data to over SFTP, rysnc, or any other standard Unix tool. You pay by the GB/month, and they scale to petabyte sizes for customers who need that. So, I signed up and used them to backup all of my svn (remember svn?) repos. With the rise of git and switch to GitHub, I cancelled my account and mostly forgot about them.

But, aha!, I now have new data storage problems. Rsync.net could be a great solution again. So, I re-signed up and setup my primary web server to mirror all of my GitHub repos over to them each night. Here’s the script I’m using…

Next up, important documents. Traditionally, I’ve kept everything that would normally go in my Mac’s “Documents” folder in my Dropbox account. That worked great for a long time. But once I started paying Google for extra storage space for Google Photos (more on that later), it felt silly to keep paying Dropbox as well. So, after 10+ years as a paid subscriber, I downgraded to a free account and moved everything into Google Drive. Sure, it’s not as nice as Dropbox, but it works and saves me $10 a month.

Like I said above, I mostly trust Google, but not entirely. So, let’s sync my Google Drive’s contents to rsync.net, too. Edit your Mac’s crontab to add this line…

30 * * * * /usr/bin/rsync -avz /Users/thall/Google\ Drive/ user@server.com:google-drive

Also, I keep all of the really important paperwork that would normally be in a fire safe in my garage in a DEVONthink library so I can search the contents of my PDFs. It’s synced automatically with iCloud and available across my mobile devices. But still, better back that up, too.

45 * * * * /usr/bin/rsync -avz /Users/thall/FireSafe.dtBase2 user@server.com:

So, that’s all of my data except for the big one – my family’s photo and home video archives.

For a long time I kept all my family’s archives in Dropbox. I even made an iOS app dedicated to browsing your library. I could have stuck everything in Apple’s Photos.app where it’s available on my devices via iCloud, but that’s tied to my Apple ID. My wife wouldn’t be able to see those photos. Plus, any photos she took on her phone would get stored in her iCloud account and not synced with the main family archive. So, we used the Dropbox app, signed-in to my account, to backup our phones’ photos.

But, like I said earlier, our photo and video library become to big to comfortably fit in Dropbox. Plus, Google Photos had just been released and it was amazing. Do I like the thought of Google’s AI robots churning through my photos and possibly using that data to sell me advertisements? No. But, their machine-learning expertise and big-data solutions make it really hard to resist. So, I spent a week and moved everything out of Dropbox into Google Photos.

Now everything is sorted into albums, by date, and searchable on any device. I can literally type into their search box “all photos of my wife’s grandmother taken in front of the Golden Gate bridge” and Google returns exactly what I’m looking for. It’s wonderful.

My wife’s phone has the Google Photos app installed with my account on it so every photo she takes gets stored in a shared account we can both access and view on all our devices.

But what’s the recurring theme of this blog post? That’s right. I don’t fully trust any cloud provider to be the only source of my data. Someone clever said “the cloud is just someone else’s computer.” That’s exactly correct. If your data isn’t in at least two different places, it’s not really backed up.

But how do I backup my 500GB+ of photos that are already in Google’s cloud? And then how do I keep new items recently added synced as well?

As usual, I tried to find a way to make it work with rsync.net. I found a great open-source project called rclone. It’s a command line tool that shuffles your files between cloud providers or any SFTP server with lots of configurable options and granularity.

First off, even if rclone does do what I need, I can’t just run it on my Mac. My internet is too slow for the initial backup. I need to use it on one of my servers so I have a fast data center to data center connection between Google and rsync.net.

Getting it setup on one of my Ubuntu servers at Linode was a simple bash one-liner. Configuring it to then work with my Google and rsync.net accounts was just a matter of running their easy-to-use configuration wizard.

Note: rclone doesn’t support a connection to Google Photos. Instead, you need to login to Google Drive on the web and enable the “Automatically put your Google Photos into a folder in My Drive” option in Settings. (And also tell your Google Backup & Sync Mac app not to sync that folder locally – unless you have the space available – I don’t.) Then, rclone can access your Google Photos data via a special folder in your Drive account.

With everything configured, I ran a few connection tests and it all worked as expected. So, I naively ran this command thinking it would sync everything if I let it run long enough:

rclone copy -P "GoogleDrive:Google Photos" rsync:GooglePhotos

Things started out fine. But eventually, due to Google API rate limits, it was quickly throttled to 300KB/sec. That would have taken MONTHS to transfer my data. And, the connection entirely stalled out after about an hour. I even configured rclone to use my own, private Google OAuth keys, but with the same result. So, I needed a better way to do the initial import.

Google offers their Takeout service. It lets you download an archive of ALL your data from any of their services. I requested an archive of my Google Photos account and eight hours later they emailed me to let me know it was ready. Click the email link to their website, boom. Ten 50GB .tgz files. Now what to do with them?

I can’t download them to my Mac and re-upload them – that’s too slow. Instead, I’ll just grab the download URLs and use curl on my server to get them, extract them, and sync them over.

I don’t have enough room on my primary web server – plus I don’t want to saturate my traffic for any customers visiting my website. So, spin up a new Linode, attach a 500GB network volume, and we’re in business. Right? Nope.

The download links are protected behind my Google account (that’s great!) so I need a web browser to authenticate. Back on my Mac, fire up Charles Proxy and begin the downloads in Safari. Once they start, cancel them. Go to Charles, find the final GET connection, and right-click to copy the request as a curl command including all of the authentication headers and cookies. Paste that command into my server’s Terminal window and watch my 500GB archive download at 150MB(!!)/sec.

(Turns out, extracting all of those huge .tgz files took longer than actually downloading them.)

Finally, rsync everything over to my backup server.

And that’s where I currently am right now. Waiting on 500GB worth of photos and videos to stream across the internet from Linode in Atlanta to rsync.net in Denver. It looks like I have about six more hours to go. Once that’s done, the initial seed of my Google Photos backup will be complete. Next, I need a way to backup anything that gets added in the future.

Between the two of us, my wife and I take about 5 to 10 photos a day. Mostly of our kids. Holidays and special events may produce a bunch more at once, but that’s sporadic. All I need to do is sync the last 24 hours worth of new data once every night.

rclone is the perfect tool for this job. It supports a “–max-age=24h” option that will only grab the latest items, so it will comfortably fit within Google’s API rate limits. Once again, setup a cron job on my server like so:

0 0 * * * rclone copy --max-age=24h "GoogleDrive:Google Photos" rsync:GooglePhotos

And, that’s it. I think I’m done. Really, this time.

All of my important data – backed up to multiple storage providers – and available on all of my and my family’s devices. At least until the whole situation changes yet again.

A few more notes:

All of my web server configuration files are stored in git. As are all of my websites’ actual files. But, I still run an hourly cron job to backup all of “/var/www” and “/etc/apache2/sites-available” to rsync.net since it’s actually such a small amount of data. This lets me run one command to re-sync everything in the event I need to move to a new server, without having to clone a ton of individual git repos. (I know I need to learn a better devops technique with reproducible deployments like Ansible, Puppet, or whatever the cool tech is these days. But everything I do is just a standard LAMP stack (no containers, only one or two actual servers), so spinning up a new machine is really just a click in the Linode control panel and couple apt-get commands and dropping my PHP files into a directory.)

My databases are mysqldump’d every hour, versioned, and archived in S3.

All of the source code on my Mac is checked out into a single parent directory in my home folder. It gets rscyn’d offsite every hour, just in case. Think of it as a poor man’s Time Machine in case git fails me.

I do a lot of work in The Omni Group‘s apps – OmniFocus, OmniOutliner, and OmniGraffle. All of those documents are stored in their free WebDAV sync service and mirrored on my Mac and mobile devices.

All of my music purchases have gone through iTunes since that store debuted however many years ago. I can always re-download my purchases (probably?). Non-iTunes music ripped from CDs long ago, and my huge collection of live music, is stored in iTunes Match for a yearly fee. A few years ago when I made the switch to streaming music services and mostly stopped buying new albums, I archived all of my mp3s in Amazon S3 as a backup. I need to set a reminder to upload any new music I’ve acquired as a recurring task once a year or so.

Also, I have Backblaze running on my desktop and laptop doing its thing. So yeah. I guess that’s yet another layer of redundancy.

A Simple, Open-Source URL Shortener

tl;dr One evening last week, I built pretty much the simplest URL shortening service possible. It’s simple, fast, opinionated, keeps track of click-thru stats, and does everything I need. It’s all self-contained in a single PHP script (and .htaccess file). No dependencies, no frameworks to install, etc. Just upload the file to your web server and you’re done. Maybe you’ll find it useful, too.

Anyway…

I run a small software company which sells macOS and iOS software. Part of my day-to-day in running the business is replying to customer support questions – over email and, sometimes, SMS/chat. I often need to reply to my customers with long URLs to support documents or supply them with custom-URL-scheme links which they can click on to deep-link them into a specific area of an app.

Long and non-standard URLs can often break once sent to a customer or subsequently forwarded around. I’ve used traditional link shortening services before (like bit.ly, etc), but always worried about my URLs expiring or breaking if the 3rd party shortening service goes out of business or makes a system change. Even if I upgraded to a paid plan which supports using a custom domain name that I own, I’m still not fully in control of my data.

So, I looked around for open-source URL shortening projects which I could install on my own web server and bend to my will. I found quite a few, but most were either outdated or overly-complex with tons of dependencies on various web frameworks, libraries, etc. I wanted something that would play nicely with a standard LAMP stack so I could drop it onto one of my web servers without having to boot up an entirely new VPS just to avoid port 80/443 conflicts with Apache. Out of the question was anything requiring a dumb, container-based (I see you, Docker) solution just to get started. Nice-to-haves would be offering basic click-thru statistics and an easy way to script the service into my existing business tools and workflows.

Admittedly, I only spent about an hour looking around, but I didn’t find anything that met my needs. So, I spent an evening hacking together this project to do exactly what I wanted, in the simplest way possible, and without any significant dependencies. The result is a branded URL shortening service I can use with my customers that’s simple to use and also integrates with my company’s existing support tools (because of its URL-based API and (optional) JSON responses – see below).

Requirements

  • Apache2 with mod_rewrite enabled
  • PHP 5.4+ or 7+
  • A recent version of MySQL

Install

  1. Clone this repo into the top-level directory of your website on a PHP enabled Apache2 server.
  2. Import database.sql into a MySQL database.
  3. Edit the database settings at the top of index.php. You may also edit additional settings such as the length of the short url generated, the allowed characters in the short URL, or set a password to prevent anyone from creating links or viewing statistics about links.

Note: This project relies on the mod_rewrite rules contained in the .htaccess file. Some web servers (on shared web hosts for example) may not always process .htaccess files by default. If you’re getting 404 errors when trying to use the service, this is probably why. You’ll need to contact your server administrator to enable .htaccess files. Here’s more information about the topic if you’re technically inclined.

Creating a New Short Link

To create a new short link, just append the full URL you want to shorten to the end of the domain name you installed this project onto. For example, if your shortening service was hosted at https://example.com and you want to shorten the URL https://some-website.com, go to https://example.com/http://somewebsite.com. If all goes well, a plain-text shortened URL will be displayed. Visiting that shortened URL will redirect you to the original URL.

Possibly of interest to app developers like myself: The shortening service also supports URLs of any scheme – not just HTTP and HTTPS. This means you can shorten URLs like app://whatever, where app:// is the URL scheme belonging to your mobile/desktop software. This is useful for deep-linking customers directly into your app.

iOS Users: If you have Apple’s Shortcuts.app installed on your device, you can click this link to import a ready-made shortcut that will let you automatically shorten the URL on your iOS clipboard and replace it with the generated short link.

Viewing Click-Thru Statistics

All visits to your shortened links are tracked. No personally identifiable user information is logged, however. You can view a summary of your recent link activity by going to /stats/ on the domain hosting your link shortener.

You can click the “View Stats” link to view more detailed statistics about a specific short link.

Password Protecting Creating Links

If you don’t want to leave your shortening service wide-open for anyone to create a new link, you can optionally set a password by assigning a value to the $pw_create variable at the top of index.php. You will then need to pass in that password as part of the URL when creating a new link like so:

Create link with no password set: http://example.com/http://domain.com

Create link with password set: http://example.com/your-password/http://domain.com

Password Protecting Stats

Your stats pages can also be password protected. Just set the $pw_stats variable at the top of the index.php file.

Viewing stats with no password set: http://example.com/stats

Viewing stats with password set: http://example.com/stats/your-password

A Kinda-Sorta JSON API

This project aims to be as simple-to-use as possible by making all commands and interactions go through a simple URL-based API which returns plain-text or HTML. However, if you’re looking to run a script against the shortening service, you can do so. Just pass along Accept: application/json in your HTTP headers and the service will return all of its output as JSON data – including the stats pages.

Contributions / Pull Requests / Bug Reports

Bug fixes, new features, and improvements are welcome from anyone. Feel free to open an issue or submit a pull request.

I consider the current state of the project to be feature-complete for my needs and am not looking to add additional features with heavy dependencies or that complicate the simple install process. That said, I’m more than happy to look at any new features or changes you think would make the project better. Feel free to get in touch.

Rockwell – Sort of like a private Foursquare meets Fire Eagle

Back in 2008, when I worked for Yahoo!, I had the good fortune of chatting with Tom Coates a few times about the now defunct Fire Eagle location brokerage service. Fire Eagle was my absolute favorite product to come out of Yahoo! during my time there. I’ve always been fascinated by real-time location data and sad that Fire Eagle’s intersection of privacy and ease of use never caught on.

Anyway, fast-forward to last Summer, I was bummed that although so much of my life is documented through photos, tweets, and journal entries, my location data was missing. A few products tried to solve this problem post-Fire Eagle. Google Latitude (née Dodgeball) gave you a way to seamlessly track your location and view your history, but they’ve since sunsetted that product. And, besides, it was a little creepy having Google track your every step. (Which I realize they still are via Google Now, but that’s another conversation.) There was also other apps like Rove and Foursquare, but none of them offered quite the feature set I was looking for. In 2010, I even went so far as to reverse engineer Apple’s Find My iPhone API so I could remotely query my phone’s location. That worked great, but doing so on a regular basis killed my battery life. But, with iOS 7’s advances in background location reporting, I knew there had to be a better way. I wanted something that would automatically track my location as precisely as possible, respect my phone’s battery life, keep that data private by default, yet still offer the ability to granularly share my location as I saw fit.

So I did what I always seem to do. I built my own app. It’s called Rockwell, and it’s available on GitHub.

Rockwell consists of two components. An iPhone app that monitors your location and allows you to annotate where you are using Foursquare’s database of named locations. And a PHP/MySQL web service you can install on your own server that the app talks to.

As you go about your day, the iPhone app uses iOS’ significant location change reporting feature to ping the web app with your location. The web app stores your location history and allows you to go back and search your history either by date or by location.

Further, since the website knows your most recent (current) location, you’re able to share that with others. In my opinion, one of the reasons Fire Eagle failed was it was (by design) very difficult to get location data out of the service. You had to go through an intricate OAuth dance each time.

With Rockwell, you simply choose the location level you’re comfortable sharing – either precise lat/lng coordinates, current city, current state, etc – and a format – plain text or JSON – and Rockwell generates a short url you can pass around. You can use that URL to embed your plain text location on your blog, or you can use the JSON version to do something more API-ish with it. There’s no OAuth shenanigans to deal with. You can have as many short geo-links as you want, and each one can be revoked at any time.

One more thing I’d like to explain. The iPhone app reports two types of check-in data to the web service. The first kind is dumb. Just your latitude, longitude, and timestamp. Many apps like Rove and Foursquare use this data to try and generate automatic location check-ins. Based on your past history, your friends’ location, and your location, they try and guess where you might actually be at a given time. Doing this well is the holy grail of location reporting. The problem is that I’ve yet to see any service get it right. In a dense urban area, with hundreds if not thousands of locations per square mile, there’s just no reliable way to figure out where you really are with precision. (At least not without some serious machine learning, which I assume Foursquare is working on.) Rockwell dances around this problem by allowing you to augment your dumb check-ins with annotated ones. Just launch the app, tap “Check-in”, and Rockwell pulls a list of nearby locations from Foursquare. Just tap on one and you’re done. It’s saved to the web service alongside the rest of your location history.

Rockwell is working great for me currently. All the basics work. The majority of the remaining work is around coming up with nice ways to show your location history in a useful way. The code is available on GitHub. I’d love it if you gave it a try, sent feedback, and maybe even a pull request or two.

Switching from GitHub to GitLab

I’ve been a happy paying customer of GitHub since early 2009. But yesterday, for a few different reasons, I deleted all of my private repositories and moved them over to a self-hosted installation of GitLab. I didn’t make that decision lightly, as I’ve been very happy with GitHub for the last five years, but here’s why…

First, I’ve started working on a new Mac app. Every time I start a new project, unless it’s open source, I create a new private repo for it on GitHub. This project happened to be my 21st private repository on GitHub. If you’re familiar with their pricing structure, you’ll know they charge based on how many private projects you have. $22 a month will get you twenty repos. But as soon as you create that twenty-first one, you graduate onto the $50 a month plan. Maybe if I were actually hosting 50 repositories with GitHub I’d be willing to pay that much, but for the foreseeable future I’m going to be in the low twenties, and $50 a month is just too much. It’s a shame they don’t just outright charge you a dollar per month per project.

The second reason is an issue I’ve been mulling over for quite a while. I love the cloud. I love having my data in the cloud. But some of it is so precious, in this case my code, that I want to know exactly how it’s being taken care of and looked after. While I have no reason to doubt GitHub has plenty of backups in place, I have no way of really knowing for sure how safe my code is. Hosting it myself has its inherit risks, too, but at least I can have full ownership of my data and be certain of the backup strategies in place. This also dovetails nicely with the pleasure nerds like myself get in doing a job themselves. Whether that’s hosting your own email (which I’m not crazy enough to do), managing your own web server (yes, please), or automating your own digital backups, there’s a sick pleasure to be had in doing a job yourself and doing it well.

A final reason for switching away from GitHub was the uneasy feeling I got watching the story of Julie Ann Horvath unfold last week. I didn’t like the idea of my money going to a company that seemed so fundamentally broken. Since then, GitHub has taken forceful, actionable steps to correct the issue, but it still worried me.

So those are my three and a half reasons for moving my private repos away from GitHub. If you agree with me, or if you have your own reasons for wanting to move away, what follows is a brain dump of the steps I took towards getting moved over and situated happily on a GitLab installation.

First off, if you’ve never heard of GitLab, go take a look through their website. It’s a Rails app that is shamefully funny in how closely they’ve copied the look and feel and functionality of GitHub. Everything from the activity timeline, to pull requests, to user and team access roles, to issue tracking, to shareable git-backed gists. It’s all very nicely implemented. Many open source projects start off strong and can later falter when the creators get bored. But I feel fairly confident in GitLab as their community open source version is based off an enterprise product they sell and do support for. Quite a few businesses are using GitLab as a GitHub replacement in situations where their code needs to remain on site.

So, where are we going to host it? My initial thought was to boot up a new virtual server with Rackspace, which is where I host all of my business servers. Rackspace is great. A little expensive, but the customer support makes up for it. Their minimum monthly price for a 512mb server, which is all we’ll need, is around $10 a month. I was nearly about to create the server when I decided to finally take a look at DigitalOcean. They’re the new hotness in cloud hosting and have a reputation for being extremely inexpensive. (Bonus points: they offer two-factor authentication on their user accounts, which is something Rackspace still lacks.) Poking around, I found I could get a comparable 512mb server with DigitalOcean for a flat $5 a month. But what really sealed the deal is they offer one-click installs of various server apps – WordPress, etc. I wasn’t looking forward to the fairly intensive setup that GitLab requires, but amazingly, GitLab is one of DigitalOcean’s one-click installs.

True to their word, I had a ready-to-go GitLab server up and running in less than a minute after clicking the “create” button. All that remained was fine tuning everything to my needs.

The first step upon getting a new cloud server is to secure it. I always follow the steps outlined in this guide. It does a good job of locking everything down and only takes about five minutes to follow.

Of note, when you get to the section about enabling ufw (the firewall), DigitalOcean boxes don’t come with everything you need installed. I had to run the following command before setting up ufw…

sudo apt-get install linux-image-$(uname -r)

Another note, and this is just personal preference, I also modify my ssh port to be something non-standard. That can be changed in…

/etc/ssh/sshd_config

Also, while the user facing side of GitLab is great, I have no idea how security conscious they are. I’d hate for an unpatched security hole in their web app to expose any of my private code. One way to mitigate that chance is to lock down web traffic to the specific IP addresses you’ll be accessing it from. Your home, your office, etc. With ufw it’s just a quick…

sudo ufw allow from your-ip-address to any port 80

for each of your IPs.

Once you’ve gotten the security taken care of, you can move on to configuring GitLab. Most of the hard work is already done for you by DigitalOcean. You’ll just need to fill in the appropriate values in…

/home/git/gitlab-shell/config.yml

and

/home/git/gitlab/config/gitlab.yml

Then restart GitLab with…

sudo service gitlab restart

With all that done, the next step is moving your repositories from GitHub to GitLab. (I’m sure there is a better direct git-to-git way of doing what follows, but this was the simplest solution for my needs.) For each of your repos, do a clean mirror to your Desktop to make sure you’ve got everything.

git clone --mirror git@github.com:username/repo-name.git

Then, cd into the repo directory and….

git remote add gitlab ssh://git@servername.com:22/username/repo.git
git push -f --tags gitlab refs/heads/*:refs/heads/*

That final git push with all the refs will push every branch and all of your tags making sure nothing is left behind.

Once done, you can safely delete your repo from GitHub.

The last step is making sure you have rolling backups of your GitLab installation and repositories in place. I looked into piecing together my own backup script until I realized GitLab already has a rake backup task available that stores everything into a single tar file. Perfect. I can then just upload that to S3 for safe keeping. To do that, we’ll be using s3cmd to handle the uploads.

sudo apt-get install s3cmd

Configure it with…

s3cmd --configure

Then, create a script in your git user’s home directory called backup.sh containing…

cd /home/git/gitlab && PATH=/usr/local/bin:/usr/bin:/bin bundle exec rake gitlab:backup:create RAILS_ENV=production
s3cmd put tmp/backups/`ls tmp/backups/ | grep -i -E '\.tar$' | tail -1` s3://bucket-name/git/

Setup cron to run that script once a day and you’re good.

PebbleCam

My Pebble arrived last week and I’ve been geeking out over it ever since. I’ve been thinking a lot about wearable tech the last few years and signed up immediately when Pebble was first announced last year. (I can’t wait to see what Apple can do in this space.)

So with a full week of Pebble use under my belt, I decided it was time to do something super geeky with my new smart watch. PebbleCam is the result of a few hours this afternoon tinkering around in Xcode.

In a nutshell, PebbleCam is an iPhone app that lets you use your Pebble as a remote shutter for the phone’s camera. You launch the app and it displays the camera. Prop the phone up, put it on a tripod, whatever, then get you and your friends in to frame. As long as you’re in Bluetooth range, clicking the “play/pause” button on your Pebble will snap a photo and save it to your phone’s photo library.

How does it work?

Currently, there’s no way to communicate from Pebble back to the phone except for the music control buttons. To take advantage of that, the app plays a blank MP3 file in the background and then listens for any remote control events (play, pause, next, previous) to come in via the Pebble. When a play/pause event occurs, the app snaps a photo and saves it to your phone’s photo library.

The code is fairly straightforward and is available on GitHub. Anyone in the iOS developer program can download the code and install the app on their phone. And for you jailbreakers out there, I’ve committed an .ipa file you can download

In the next update, I plan on assigning the next track button to change the camera from rear-facing to front-facing. That leaves one button left (previous track) to play with. Any ideas on what I could assign it to?

If there is enough interest, I’m not opposed to submitting the app to Apple for inclusion in the App Store. (I don’t see any reason why it wouldn’t be accepted.) However, before I do that, I’d need someone to create an icon for the app. Right now I’m using the official Pebble iOS app icon with a camera photoshopped on top.

Here’s a video of the app in action.

Automatically Compressing Your Amazon S3 Images Using Yahoo!’s Smush.it Service

I’m totally obsessed with web site performance. It’s one of those nerd niches that really appeal to me. I’ve blogged a few times previously on the topic. Two years ago, (has it really been that long?) I talked about my experiences rebuilding this site following the best practices of YSlow. A few days later I went into detail about how to host and optimize your static content using Amazon S3 as a content delivery network. Later, I took all the techniques I had learned and automated them with a command line tool called s3up. It’s the easiest way to intelligently store your static content in Amazon’s cloud. It sets all the appropriate headers, gzips your data when possible, and even runs your images through Yahoo!’s Smush.it service.

Today I’m pleased to release another part of my deployment tool chain called Autosmush. Think of it as a reverse s3up. Instead of taking local images, smushing them, and then uploading to Amazon, Autosmush scans your S3 bucket, runs each file through Smush.it, and replaces your images with their compressed versions.

This might sound a little bizarre (usless?) at first, but it has done wonders for mine and one of my freelance client’s workflows. This particular client runs a network of very image-heavy sites. Compressing their images has a huge impact on their page load speed and bandwidth costs. The majority of their content comes from a small army of freelance bloggers who submit images along with their posts via WordPress, which then stores them in S3. It would be great if the writers had the technical know-how to optimize their images beforehand, but that’s not reasonable. To fix this, Autosmush scans all the content in their S3 account every night, looking for new, un-smushed images and compresses them.

Autosmush also allowed me to compress the huge backlog of existing images in my Amazon account that I had uploaded prior to using Smush.it.

If you’re interested in giving Autosmush a try, the full source is available on GitHub. You can even run it in a dry-run mode if you’d just like to see a summary of the space you could be saving.

Also, for those of you with giant S3 image libraries, I should point out that Autosmush appends an x-amz-smushed HTTP header to every image it compresses (or images that can’t be compressed further). This lets the script scan extremely quickly through your files, only sending new images to Smush.it and skipping ones it has already processed.

Head on over to the GitHub project page and give Autosmush a try. And please do send in your feedback.

Open Source Updates

On this lazy Sunday afternoon I thought I’d take the opportunity to mention a few open source projects I’ve recently updated. GitHub makes sharing code so ridiculously easy, it’s a shame not to call attention to it occasionally in case other people might find something useful.

Sosumi 2.0

First up is Sosumi 2.0. Last year, when Apple launched the Find My iPhone component of MobileMe, I immediately saw an opportunity to grab persistent location information from my phone — without background processing. Although Apple didn’t supply an API for this information, it turned out to be easy enough to scrape their site and wrap it up nicely into a PHP class. Nat Friedman even used it as a way to automatically update his Google Latitude position in his playnice project, and I built a similar script for Yahoo!’s Fire Eagle service. It all worked well enough, but it was slow and prone to breaking whenever Apple updated me.com.

Fast forward to last week, Apple released an official Find My iPhone client for iPhone and iPad. The mere fact that they released this means there had to be a hidden “official” API somewhere. After a few hours messing around in WireShark I found their API end point and re-wrote Sosumi to talk to their API just like the client app. The result? Dramatically faster location updates (10x) and a solid script that’s immune to changes on MobileMe’s website.

This new version of Sosumi is available on GitHub and extremely easy to use:

<?PHP
include 'class.sosumi.php';
$ssm = new Sosumi('your-username', 'your-password');
$location = $ssm->locate();

That’s it. $location will be an array populated with your phone’s latitude, longitude, and a few other useful data points. What you do with this information is up to you!

PHP HTML Compressor

Like the name says, this project is a small PHP class that accepts an HTML document and minifies its filesize by removing unnecessary whitespace and blank lines. It takes care not to touch fragile areas like <pre> blocks. The result is HTML that renders exactly the same in the browser but (in my testing) can be up to 15% smaller. In today’s increasingly mobile world, every byte over the wire counts — and this is a simple way to speed up your page load times.

The compressor can be used in three ways:

  1. Pass it an string containing HTML and it’ll return the minified version.
  2. As full fledged command line utility. Pass it a filename or pipe content to it via stdin and it will send the minified version back over stdout. This is super useful for adding automatic compression into your deploy/build scripts.
  3. Or as a WordPress plugin that automatically minifies all of your posts and pages. Combine it with wp-super-cache and you’re well on your way to a speedy site — even on a shared host.

For an example of the type of HTML the compressor produces, just take a look at the HTML source of this site. Every page is piped through the compressor before being saved as a static file on my server.

Google Search Shell

My google-shell project is another small command line utility. It’s a simple interface to Google’s search results that talks to their AJAX Search API. It lets you easily pull down the top results for any query — including the result’s URL, title, and a brief abstract from the page. It has quite a few options that allow you to customize the output to be either human readable or digestible to other scripts. For example, here’s an ugly, ugly shell command that shows off the power of what having Google at your fingertips can do:

URL=`gshell -fun1 "imdb american beauty"`; curl $URL | \
sed -n 's/.*\([0-9]\.[0-9]\)\/10.*/\1/p' | head -n1

In case you don’t speak nerd, that tells google-shell to return only the URL of the first result for the query “imdb american beauty”. In other words, the same thing as Google’s “I’m Feeling Lucky” option. It then takes that URL, downloads it, and pipes it through a messy sed and head command that extracts the IMDB rating for American Beauty. Granted, that’s quite a lot to type — especially considering you could open a web browser and google it yourself much faster. However, if you were to add that long command as an alias in your Bash profile. you could very quickly write a command like

imdb "american beauty"

That would instantly return you the rating of whichever movie you specify. Nerdy, but cool, right?

Anyway

As always, the three projects above and all my open source code are available on GitHub. Hopefully you’ll find something useful. If you do, I’d love to hear about it — and I always welcome bug fixes and other contributions.

OpenFeedback Part Deux

A year and a half ago I wrote about OpenFeedback, an open source Cocoa framework for gathering feedback from your users. Initially, it was a sister project to Appcaster, my indie dashboard web app. Since then, Appcaster has grown up and morphed into Shine, but OpenFeedback remained unchanged. Tonight, though, I took a few hours off from Highwire and rewrote OpenFeedback from scratch.

The rewrite wasn’t strictly necessary, but it certainly didn’t hurt. The original code was hurried and in poor shape. I was able to cut the amount of code by 30% and give the dialog a more modern looking tab view.

Like before, adding OpenFeedback to your application is trivial — there’s no code required. You simply link your app against the framework and hook-up the appropriate actions in Interface Builder. In under five minutes you can have an elegant way to encourage users to ask questions, submit bug reports, and suggest new features.

My long term goal for OpenFeedback has always been for the Mac developer community to rally behind it, making it a drop-in standard much like Sparkle. That hasn’t happened yet (obviously), but Shine has been getting some good attention lately. If I’m lucky, maybe some of that goodwill will carry over and help kickstart things along.

Like the rest of my open source projects, OpenFeedback is MIT licensed and available on GitHub.

Shine – An Indie Mac Dashboard

Two years ago, shortly after I released VirtualHostX 1.0, I wrote about Appcaster – a web dashboard for Mac developers I built that manages my application updates, payment processing, etc. With the release of VHX 2.0 and Incoming!, I decided it was time to rewrite Appcaster as the original code was hurried and hastily patched over the last few years.

Today I’m happy to officially announce Shine, a revamped version of Appcaster re-written from the ground up. The goal of Shine (more on the name in a bit) was to provide clean, easy to use dashboard for indie Mac developers and also to build a stable foundation that provides for future improvement down the road.

I chose the name Shine because, at it’s heart, it’s a complimentary product to Andy Matuschak’s Sparkle project. (Inevitable tagline: Your app already Sparkles, now make it Shine.) The core functionality, like Appcaster before it, is to automatically generate appcast feeds for your product updates. But it does a whole lot more, too.

Shine Screenshot

Shine handles order processing using PayPal’s IPN service. That includes generating the license information (using either Aquatic Prime or your own, custom scheme), emailing it to the user, and managing the database of orders. It also computes aggregate stats based on your users’ Sparkle update requests, collects user feedback (bug reports, feature requests, questions), and automatically stores your application updates in Amazon S3.

In short, Shine manages my entire Indie Mac developer workflow.

The code is based on two of my other open source projects: the Simple PHP Framework and YUI App Theme. SPF provides a clean, lightweight, active record pattern to model the data, and yui-app-theme is an admin area CSS template built on top of the YUI Grids framework. Combining these two projects let me build Shine in record time (about 24 working hours).

The code for Shine is free to use (MIT license) and available on GitHub. Feel free to email me with any questions or feedback.

(Thanks to Steven Degutis of Thoughtful Tree software for his feedback on this project.)

Sosumi – A MobileMe Scraper

Sosumi is a PHP script that scrapes MobileMe and exposes Apple’s Find My iPhone functionality to the command line or your own web application. This lets you pull your phone’s current location and push messages and alarms to the device.

Like my previous blog post that dealt with AT&T’s Family Map service, my goal was to connect my iPhone with Fire Eagle by Yahoo!. There are a few iPhone Fire Eagle updaters available, but they’re all limited by Apple’s third-party application restrictions. Sosumi gets around those restrictions by running every few minutes on your own server rather than the device itself. In my case, I’ve setup a cron job to run the script every fifteen minutes and push my location to Fire Eagle.

Until Apple releases a location API for MobileMe (not likely, and not their job), this will have to do.

Grab the code on GitHub.

Example:

<?PHP
$ssm = new Sosumi('username', 'password');
$location_data = $ssm->locate();
$ssm->sendMessage('Daisy, daisy...');