how to solve segment: crawl/segments/* error

https://stackoverflow.com/questions/19317994

nutch

30-06-2022
|

문제

while following this link i'm getting this error but can't figure out it http://wiki.apache.org/nutch/NutchTutorial

runtime/local$ bin/nutch parse $s1 ParseSegment: starting at 2013-10-11 17:43:36 ParseSegment: segment: crawl/segments/20131011173126 Exception in thread "main" java.io.IOException: Segment already parsed! at org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:89) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213) at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:247) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:220)

해결책

This will happen when you want to parse an already parsed segment. Note that if you use the "crawl" command it also parses the segment.

If you really want to parse again, just remove the crawl_parse directory inside your segment (i.e. crawl/segments/20131011173126/crawl_parse) and issue the parse command again.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow