문제

I just want to follow up this question.

So, I downloaded the Wikipedia dump of February 2014 and run the command with WikiExtractor.py as suggested:

cat mywiki-pages-articles.xml | python WikiExtractor.py -b 500K -o extracted

However, after more than one running hour, I got nothing but an empty file named wiki_00.

Do you have any suggestion for this problem?

도움이 되었습니까?

해결책

OK, So I found the solution for this problem.

Last time when I run the command above I added the "screen" instruction before it. In this case, screen will just cat the xml file without tuning it to the WikiExtractor.py. The result is therefore an empty file.

I fixed this by putting the above command in a file, making the file runnable and run the screen command on it.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top