how to get the content of a particular url using nutch database

https://stackoverflow.com/questions/22854912

ruby-on-rails
solr
nutch

27-06-2023
|

题

I am new with nutch concept. I have configured everything properly best of my knowledge. Am able to crawl the links, and i can get crawled urls also.

My problem is that, want to fetch content of webpages separate for every link, and am not able to find the solution for it.

Can anyone please help me??

Thank you.

解决方案 2

I have seperate the files through a logic..as i am able to get the content for all urls in a single file with a particular pattern repeating for every record or url..i have seperate the content on row field.

其他提示

Use the nokogiri gem http://rubygems.org/gems/nokogiri for parsing the content of webpages and select the link using nokogiri selector

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow