I'm very new with Solr (installed it yesterday...) and I already have the first problem.

I have my dir Solr-Indexing where I copied the post.jar and where I have my core dir Collection1. In Collection1 I have the conf dir with the schema.xml, solrconfig.xml and some pdf documents (which I want to import/index).

I start my Solr instance with java -Dsolr.solr.home="D:/download/Solr-Test/Solr-Indexing" -jar start.jar(I see no problems in the log) and then I try to load a pdf in my collection with the command java -Dtype=application/pdf -Durl=http://localhost:8983/solr/collection1/update/extract -jar post.jar collection1/Lucene.pdf.

The output is the following:

D:\download\Solr-Test\Solr-Indexing>java -Dtype=application/pdf -Durl=http://localhost:8983/solr/collection1/update/extract -jar post.jar collection1/Lucene.pdf
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/collection1/update/extract using content-type application/pdf..
POSTing file Lucene.pdf
SimplePostTool: WARNING: Solr returned an error #404 Not Found
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/collection1/update/extract
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/collection1/update/extract..
SimplePostTool: WARNING: Solr returned an error #404 Not Found for url http://localhost:8983/solr/collection1/update/extract?commit=true
Time spent: 0:00:00.015

Where is the problem? Do I miss something?

What I can see ist that the URL in browser is http://localhost:8983/solr/#/Collection1 with a # and what I use is without #. Could this be a problem? I tried to add the # in my command too, but then I get the message:

D:\download\Solr-Test\Solr-Indexing>java -Dtype=application/pdf -Durl=http://localhost:8983/solr/#/collection1/update/extract -jar post.jar collection1/Lucene.pdf
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/#/collection1/update/extract using content-type application/pdf..
POSTing file Lucene.pdf
SimplePostTool: WARNING: Solr returned an error #405 HTTP method POST is not supported by this URL
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 405 for URL: http://localhost:8983/solr/#/collection1/update/extract
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/#/collection1/update/extract..
Time spent: 0:00:00.031

Here are my schema.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="simple" version="1.1">
  <types>
    <fieldtype name="string" class="solr.StrField"/>
    <fieldtype name="long" class="solr.TrieLongField"/>
    <fieldtype name='text' class='solr.TextField'>
        <analyzer>
            <tokenizer class='solr.WhitespaceTokenizerFactory' />
            <filter class='solr.LowerCaseFilterFactory' />
        </analyzer>
    </fieldtype>
  </types>

  <fields>
    <field name="id" type="long" required="true" />
    <field name='fullText' type='text' multiValued='true' />
    <dynamicField name="*" type="string" multiValued="true" indexed="true" stored="true" />
    <copyField source="*" dest="fulltext" />
  </fields>

  <uniqueKey>id</uniqueKey>
  <defaultSearchField>fullText</defaultSearchField>
  <solrQueryParser defaultOperator="OR" />
</schema>

and solrconfig.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<config>
  <luceneMatchVersion>LUCENE_45</luceneMatchVersion>
  <directoryFactory name='DirectoryFactory' class='solr.MMapDirectoryFactory' />

  <lib dir='${solr.core.instanceDir}\lib' />

  <requestHandler name='standard' class='solr.StandardRequestHandler' default='true' />
  <requestHandler name="/select" class="solr.SearchHandler" />
  <requestHandler name='/update' class='solr.UpdateRequestHandler' />
  <requestHandler name='/update/extract' class='solr.extraction.ExtractingRequestHandler'>
    <lst name="defaults">
    <str name="uprefix">s_meta_</str>
    <str name="fmap.content">t_content</str>
    <str name="fmap.dcterms:created">d_created</str>
  </lst>
  </requestHandler>
  <requestHandler name='/admin/' class='org.apache.solr.handler.admin.AdminHandlers' />
  <admin>
    <defaultQuery>*:*</defaultQuery>
  </admin>

</config>

As you can see, I'm on a windows(7) machine.

有帮助吗?

解决方案

It's a file path error. Try collection1\conf\Lucene.pdf. You're using a windows machine, use \.

What the error is telling you is that it can't find the pdf file you specified. So the last part of the command, "collection1\Lucene.pdf" should be the culprit. You said you put the pdf files to be indexed in collection1\conf. So you should try this:

java -Dtype=application/pdf -Durl=http://localhost:8983/solr/collection1/update/extract -jar post.jar collection1\conf\Lucene.pdf

If it still doesn't work, try giving the full path to the pdf you're trying to index.

Word of advice: When learning Solr, don't begin by messing with file structure.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top