문제

while following this link i'm getting this error but can't figure out it http://wiki.apache.org/nutch/NutchTutorial

runtime/local$ bin/nutch parse $s1 ParseSegment: starting at 2013-10-11 17:43:36 ParseSegment: segment: crawl/segments/20131011173126 Exception in thread "main" java.io.IOException: Segment already parsed! at org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:89) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213) at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:247) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:220)

도움이 되었습니까?

해결책

This will happen when you want to parse an already parsed segment. Note that if you use the "crawl" command it also parses the segment.

If you really want to parse again, just remove the crawl_parse directory inside your segment (i.e. crawl/segments/20131011173126/crawl_parse) and issue the parse command again.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top