Frage

I am trying to load a dbPedia dataset in .nt format into MarkLogic using the MarkLogic Content Pump. I'm using MarkLogic 7, with an XDBC server running on port 8005 on my machine. My data is present in a file, persondata_en.nt, and I am using the following command to load it.

C:\mlcp-Hadoop2-1.2-1\bin\mlcp import -mode local ^
-host localhost -port 8005 -username admin -password admin ^
-input_file_path "C:\dbp\persondata_en.nt" ^
-input_file_type RDF

This command results in a Premature EOF exception, as below.

2014-03-18 11:56:28.401 WARNING [1] (AbstractRequestController.runRequest): Error         parsing HTTP headers: Premature EOF, partial header line read: ''
2014-03-18 11:56:28.503 WARNING [1] (AbstractRequestController.runRequest): Error  parsing HTTP headers: Premature EOF, partial header line read: ''
2014-03-18 11:56:28.605 WARNING [1] (AbstractRequestController.runRequest): Error parsing HTTP headers: Premature EOF, partial header line read: ''
2014-03-18 11:56:28.707 WARNING [1] (AbstractRequestController.runRequest): Error parsing HTTP headers: Premature EOF, partial header line read: ''
2014-03-18 11:56:28.809 WARNING [1] (AbstractRequestController.runRequest): Error parsing HTTP headers: Premature EOF, partial header line read: ''
2014-03-18 11:56:28.810 INFO [1] (AbstractRequestController.runRequest): automatic query  retries (5) exhausted, throwing: com.marklogic.xcc.exceptions.ServerConnectionException:    Error parsing
 [Session: user=admin, cb={default} [ContentSource: user=admin, cb={none} [provider:  address=localhost/127.0.0.1:8005, pool=0/64]]]
 [Client: XCC/7.0-20140204]
 com.marklogic.xcc.exceptions.ServerConnectionException: Error parsing HTTP headers:     Premature EOF, partial header line read: ''
 [Session: user=admin, cb={default} [ContentSource: user=admin, cb={none} [provider:  address=localhost/127.0.0.1:8005, pool=0/64]]]
 [Client: XCC/7.0-20140204]

I'm using pretty much the same command as specified in the example load scripts in the tutorial here. Has anyone faced this problem before? Any help would be great. Thanks!

War es hilfreich?

Lösung

Thanks for your help guys. I managed to figure out the reason for the issue. I had not configured the settings for the MarkLogic XDBC server properly. I reset the server following the instructions from the documentation, and was able to successfully insert triples into the store.

Andere Tipps

Have you tried -input_file_type rdf instead of -input_file_type RDF? Looking at http://docs.marklogic.com/guide/ingestion/content-pump I see lower-case "rdf" in various examples.

In general an "Error parsing HTTP headers: Premature EOF" means the response from the server was interrupted. This is not a very common error, but I have seen it happen for various reasons.

One problem occurs when the JVM is short of memory. In this case it spends all its time in garbage collection and the connection times out. That might seem unlikely since the persondata should be less than 1-GiB and mlcp should not need the whole file in memory anyway. But you could test that theory by making a smaller nt file with, say, 1% or 10% of the lines. If you want to see how often GC is running, add -verbosegc to the JVM arguments in the mlcp script.

Another problem I have seen is a firewall that decides to close the connection after N seconds. Another is a badly overloaded server - one that is paging heavily or otherwise unable to let MarkLogic do its work.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top