Question

I'm trying to use crawler4j to crawl websites. I was able to follow the instructions on the crawler4j website. When it is done it creates a folder with two different .lck files, one .jdb file and one .info.0 file.

I tried to read in the file using the code that I provided in this answer to read in the file but it keeps failing. I've used the same function to read text files before, so I know the code works.

I also found someone else that asked the same question a few months ago. They never got an answer.

Why can't I use my code to open and read these .lck files to memory?

Was it helpful?

Solution

Crawler4j uses BerkeleyDB to store crawl informations. See here in the source.

From the command line you can use DB utils to acces the data. Already covered in SO here.

If you want to access the data in your Java code, you simply import BerkeleyDB library (Maven instruction there) and follow the tutorial on how to open the DB.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top