Search Algorithm for a web application that needs to look for a specific value

https://stackoverflow.com/questions/21817743

12-10-2022
|

Question

I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).

For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.

Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.

One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?

I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.

I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.

La solution

This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow