Вопрос

while following this link i'm getting this error but can't figure out it http://wiki.apache.org/nutch/NutchTutorial

runtime/local$ bin/nutch parse $s1 ParseSegment: starting at 2013-10-11 17:43:36 ParseSegment: segment: crawl/segments/20131011173126 Exception in thread "main" java.io.IOException: Segment already parsed! at org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:89) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213) at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:247) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:220)

Это было полезно?

Решение

This will happen when you want to parse an already parsed segment. Note that if you use the "crawl" command it also parses the segment.

If you really want to parse again, just remove the crawl_parse directory inside your segment (i.e. crawl/segments/20131011173126/crawl_parse) and issue the parse command again.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top