Let Google Do The Work For You

One of the major challenges in web scraping is figuring out which page to scrape in the first place. Here’s a scenario: Say you need to pull some information for the film 30 Days of Night off IMDB. It would be great if you knew in advance what the URL was — something you could construct programatically — unfortunately, it’s actually http://www.imdb.com/title/tt0389722/. How can you possibly figure that out?

One solution would be to scrape IMDB’s built-in search feature and from there extract the correct URL. For IMDB, that works, but what about a site that doesn’t have a search feature? Or one that does, but it doesn’t work very well?

I’ve been web scraping for years and the hands-down, best solution I’ve come up with is to simply let Google do the work for you.

The trick is to take advantage of their “I’m Feeling Lucky” feature. Clicking that button, instead of the standard “Google Search” button, skips the results page and takes you directly to the first result. If you construct your query properly, it will almost always be the page you’re looking for.

Going back to the IMDB example, if you run an I’m Feeling Lucky search for “site:imdb.com $movie_title”, Google will send you a 302 Redirect to the appropriate page within IMDB. Voila! Not only does this get us where we need to be, but (since we’re relying on Google’s ever improving search index) it will also adjust for spelling mistakes or even partial movie titles. It’s a great technique for scraping any sort of online “encyclopedia” like IMDB, TVRage, Wikipedia, etc.

Here’s the code. Pass it a search query and it’ll extract the redirect Google sends back.

<?PHP
    function feelingLucky($q)
    {
        ob_start();
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, "http://www.google.com/search?hl=en&q=" . urlencode($q) . "&btnI=I%27m+Feeling+Lucky");
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt($ch, CURLOPT_NOBODY, 1);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");
        curl_setopt($ch, CURLOPT_REFERER, "http://www.google.com");
        curl_exec($ch);
        curl_close($ch);
        $head = ob_get_contents();
        ob_end_clean();
        return (preg_match('/Location:(.*?)$/ms', $head, $matches) == 0) ? false : trim($matches[1]);
    }

Sort Apple Mail Messages Using the Keyboard

I don’t know how I ever missed this Apple Mail plugin, but you absolutely have to give MsgFiler a try if you’re a heavy keyboard user. It lets you move messages into any folder in your mailbox using only the keyboard.

Press ⌘9 to pull up a TextMate-like list of your mailbox folders. Then, select a folder by typing the first few letters in its name and move the currently selected message(s) into it with ↩.

You can also jump to a folder with ⌘O instead of ↩. Awesome.

Introducing Appcaster + OpenFeedback

Today I’m proud to announce the release of two new open source projects: Appcaster and OpenFeedback. I’ve been working on them off and on for over nine months, so I’m very excited to finally see them out the door.

Appcaster, which I’ve written about before, is a web-based dashboard for indie Mac developers. It’s designed to manage payment and order processing and generate license files for your users. It even handles your product’s revision history in Amazon S3 and can produce reports from your users’ demographic info. It also serves as a central location to collect user feedback, bug reports, and support questions.

OpenFeedback is a Cocoa framework written in Objective-C that collects feedback from your users directly within your application. Instead of sending your users to a website or asking them to write an email, OpenFeedback gives them a simple window where they can ask support questions, file bug reports, or suggest new features. Their data is automatically sent to Appcaster for you to review. They never have to leave your application.

Collectively, I’m calling the two projects Appcaster since they’re designed to work closely with one another (and since I wanted them to be part of the same Google Code project). However, OpenFeedback can send data to any server-side script that accepts HTTP POST requests — you can easily integrate it into your existing bug tracker or reporting system.

Appcaster

When I first built Appcaster last year, I wrote a detailed overview of the application here. Aside from cleaning up a few bugs and upgrading it to use the latest version of the Simple PHP Framework, the only major additions have been adding support for OpenFeedback and graphing user demographic data using the Google Charts API.

Google’s Chart API is such a slick, clever way of doing things that I couldn’t pass by the opportunity to use it in a project. After aggregating your data, you simply craft it into a special URL and use that as the source of an <img> tag. Google parses your data out of the URL and returns a PNG formatted chart.

It took all of half an hour to create some basic stats from the Sparkle update data Appcaster collects. It would be trivial for other developers to add their own custom reports in the future.

OpenFeedback

The idea for OpenFeedback came from Cultured Code‘s task management application Things. Clicking on the Support menu item in their Help menu brings up a dialog where you can submit questions and feedback from inside the app — no need to visit a website or open an email.

I thought their implementation was a great idea and emailed them to ask if I could recreate that functionality as a Cocoa framework for other developers to use. They were nice enough to say yes 🙂

Adding OpenFeedback to your application is trivial. Like Sparkle, there’s no code required. You simply link your app against the framework and hook-up the appropriate actions in Interface Builder. In under five minutes you can have an elegant way to encourage users to provide feedback.

My long term goal is to see the Mac developer community standardize around OpenFeedback much like they have around Sparkle. Not only would it save time for developers, but it would provide users with a consistent interface for submitting feedback. That should help improve the dialogue between developers and our users — improving Mac software all around.

Speaking of Feedback

Your feedback is always welcome. This is my first open source Cocoa project, so I’m very much flying by the seat of my pants. Suggestions, improvements, bug reports — they’re all welcome. You can send them directly to me or file an issue in our bug tracker.