Question

http://www.example.com/books?_pop=mheader

What would be the regular expression to match this and any URL that has "books" in the URLs as one of the pattern matches ? This site has a books category and various other sub-categories under that. How do I traverse down to search all the URLs for book ?

require 'anemone'
Pattern = %r[(\/books)*]
Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_pages_like(Pattern) do |page|
    puts page.url
  end
end
Was it helpful?

Solution

http://rubular.com/ is a useful tool to test regex for Ruby.

The regex would be simple, /http:\/\/.+(books)/. It matchs http:// as well to help ensure it is a url. Here is a rubular test against http://www.example.com/reference-books-2300.

OTHER TIPS

The pattern to match /books in your url should just be "/books"

This is a good site to test your regular expressions http://regexpal.com to ensure you have at least that part of your code right.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top