Let Google Do The Work For You
One of the major challenges in web scraping is figuring out which page to scrape in the first place. Here’s a scenario: Say you need to pull some information for the film 30 Days of Night off IMDB. It would be great if you knew in advance what the URL was — something you could construct programatically — unfortunately, it’s actually http://www.imdb.com/title/tt0389722/. How can you possibly figure that out? One solution would be to scrape IMDB’s built-in search feature and from there extract the correct URL. For IMDB, that works, but what about a site that doesn’t have a search …