Question

I'm looking for some guidance on how to build a third party search agent for a online marketplace that does not have this functionality.

The online marketplace is pretty oldschool and has a single HTML page with all the products they are selling.

If you are interested in the details, this is the site im referring to: http://www.returbilen.se/category.html?SHOW=new&anl=1

The thing I want to build is a tool that can search the page once a day and check for my predefined search criterias. For the simplicity of the question lets say i'm interested in Volvo's. Everyday I want the tool to scan the page and check if there were any Volvo's.

If there was any Volvo's I would like the tool to send me an e-mail notification.

Any thought on how you would make such a tool? Or maybe there already exists tools like this?

This is the steps for a alhpa-version:

1) Check website

2) If website contains the word 'Volvo' -> Send an e-mail notification

This question is very broad, but the question is conceptual and is marked as conceptual

Was it helpful?

Solution

If you want to find something specific and fine-tune your searches, you could build a basic web crawler that reads the HTML of a page and searches for the text you were expecting to find. You'll need to know how the site's pages are laid out more or less, but using .NET you could just use the WebClient to download the HTML as a string like so...

// arguments could be passed into a method that wraps all this
// we're just setting them for now

var html = string.Empty;
var uri = "http://www.returbilen.se/category.html";
var query = new StringBuilder();
var args = new Dictionary<string, string>
{
    { "SHOW", "new" },
    { "anl", "1" }
}

// loop through the arguments to build your query string
// using a counter because you can't get the index of a
// un-ordered Dictionary and I'm loath to order query strings

var count = 0;

foreach (var arg in args)
{
    count++;
    query.AppendFormat("{0}={1}{2}", arg.Key, arg.Value, count < arg.Count 
                                     ? "&" : string.empty );
}

// now fetch your HTML as a string

using (var wc = new WebClient())
{
    html = wc.DownloadString(string.Format("{0}?{1}", uri, query.ToString()));
}

After this you can use the HtmlAgilityPack to parse the nodes and find what you want. However, you could also do something similar using a PHP simple script that loads the HTML based on the criteria you specify, then looks for whether your search term exists...

// same argument setup as before and this could also be passed
// into a basic function call, same looping logic, etc.

$uri = 'http://www.returbilen.se/category.html?';
$query = '';
$args = array(
    'SHOW' => 'new',
    'anl'  => '1'
);    

$count = 0;

foreach ($args as $k => $v) {
    $count++;
    $query .= $k . '=' . $v;

    if ($count == count($args) {
        $query .= '&';
    }
}

// now load the HTML to use PHP's DOM parser

$html = file_get_html($uri . $query);

// now loop through the nodes to find the product you want
// making sure your search is more or less case invariant

foreach ($html->find('div.product') as $product) {
    if (strtolower(strpos($product->find('div.name')), 'volvo') !== false) {
        // do whatever you wish with the result
    }
}

After you set this script up, you can just place it in a WAMP folder and schedule a job to call it at a given time, then open whatever report file is generated when it's done. Or you can create a page that will make a call to it like so assuming you return it as JSON...

$.getJSON('searchsite.php', function (data) {
    // parse results into Knockout or add via jQuery
}

... and look if you have any hits for the Volvo, or just scrape the whole product detail using the PHP DOM Parser. You can also create something similar in .NET but you'd need to create a WebAPI project or a web service and then return your results in JSON.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top