Scraping IMDB With PHP

Uncategorized May 21, 2008

For an upcoming project, I need to pull in metadata about movies and TV shows — genres, plot summaries, actors, etc. The de-facto source is, of course, IMDB. Unfortunately, they’re behind the times and don’t offer an API to access their data. (At least not one that I’ve ever found.)

So, here’s a quick PHP class that takes a movie title (doesn’t have to be exact) or a filename (!) and scrapes IMDB for the relevant info.

Using the scraper is simple.

    $m = new MediaInfo();
    $info = $m->getMovieInfo('American Beauty');

will output:

    [kind] => movie
    [id] => tt0169547
    [title] => American Beauty
    [rating] => 8.6
    [director] => Sam Mendes
    [release_date] => 1 October 1999
    [plot] => Lester Burnham, a depressed suburban father in a mid-life crisis...
    [genres] => Array
            [0] => Drama
    [cast] => Array
            [Kevin Spacey] => Lester Burnham
            [Annette Bening] => Carolyn Burnham
            [Thora Birch] => Jane Burnham
            [Wes Bentley] => Ricky Fitts
            [Mena Suvari] => Angela Hayes
            [Chris Cooper] => Col. Frank Fitts, USMC
            [Peter Gallagher] => Buddy Kane
            [Allison Janney] => Barbara Fitts
            [Scott Bakula] => Jim Olmeyer

At the moment, the class only returns data for movies. For TV shows I’m planning on pulling data directly from the database I’ve created for Schmooze.TV (which, in turn, scrapes its info from TVRage).

You can download the source from my Google Code project. As always, this code is released under the MIT License. Comments and suggestions are always welcome.