anemone ignore url links including a certain phrase
-
26-10-2019 - |
Question
I am running a web scraper with anemone on ruby and I am giving my server some problems when it visits pages that require a logon.
The pages all have a phrase, say, "account" in the url and I want the program to completely ignore and not go to any link with a destination containing this string.
How can I do this?
Solution
Anemone has a skip_links_like
method:
skip_links_like(*patterns)
Add one ore more Regex patterns for URLs which should not be followed
So adding something like
skip_links_like /\/account\//
should take care of it:
Anemone.crawl("somesite.co.uk", :depth_limit => 1) do |anemone|
anemone.skip_links_like /\/account\//
#...
end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow