Question

I'm trying to build a personal movie database and i want the data to be fetched from imdb ... Yes i know there are plenty api and grabber out there but none of them is doing what is need,,,

So far i couldn't come up with a solution to parse http://www.imdb.com/chart/top list and get my data from it...

I've tried to do it by a curl script but no luck !

For e.g:

I want to know if The Godfather: Part II is in top 250 ?if yes what is the rank...

Was it helpful?

Solution

API

I would look into whether or not IMDB have an API available... If they do this will likely be as simple as querying a URL and parsing the data returned with json_decode...

No API available?

Get the webpage

No need to use CURL a simple file_get_contents will do the trick...

Extract the list

Now you have the web page you then have two options:

  1. Parse the web page with a DOM parser (long winded, not necessary)
  2. Regex to extract the info you're after (simple, short)

Regex

A quick look at the source code of the list shows the list is in the format:

<td class="titleColumn">RANK. <a href="/link/to/film" title="Director/Leads" >FILM TITLE</a>

See CAPS for required information

Now converting this into a regex is simple; just remove the noise and replace with (non-greedy) wild cards...

<td class="titleColumn">RANK. <a.*?>FILM TITLE</a>

Add your capture groups:

<td class="titleColumn">(RANK). <a.*?>(FILM TITLE)</a>

and that's it...

#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#

Example

Using this in practice:

$page = file_get_contents("http://www.imdb.com/chart/top"); //Download the page

preg_match_all('#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#', $page, $matches); //Match ranks and titles

$top250 = array_combine($matches[1], $matches[2]);          //Final array in format RANK=>TITLE

Then you can do something like:

echo $top250[1];

/**
Output:

The Shawshank Redemption

*/

echo array_search("The Godfather", $top250);

/**
Output:

2

*/

You can then use standard PHP array functions to do things like search for films.

http://php.net/file_get_contents
http://php.net/preg_match_all
http://php.net/array_combine
http://php.net/array_search


Side note

Especially if you use the No API method above you might like to think about storing the results locally and only updating every X Hours/Days/Weeks to save load times etc. I assume that you are already planning on doing this (as you said you wanted a personal movie data base... But just thought I'd mention it anyway!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top