Question

We're currently running ColdFusion 8, but planning on moving to ColdFusion 10 soon. One of the biggest issue with this move is that one of the most important applications we run includes a full-text document search that is currently built with Verity Collections. It basically allows the users to search through the text content of hundreds of PDF documents.

I just created a new Solr Collection in my development ColdFusion 9 instance, and tried to update the collection using the existing indexing logic that runs daily to update the collection using PDF documents stored on the local server as F:\PDFS\[documentId].PDF:

<cfsetting requesttimeout="3600" />

<cfquery name="getDocs" datasource="myDB">
    SELECT DISTINCT
        itemNo,
        edition,
        description,
        status,
        'F:\PDFs\'
            CONCAT documentId
            CONCAT '.PDF'   AS  document_file
    FROM    SKU_ATTRIBUTES
</cfquery>

<cfindex
    query="getDocs"
    collection="mysolrcollection"
    action="refresh"
    type="file"
    key="document_file"
    title="description"
    custom1="itemNo"
    custom2="status"
    custom3="edition" />

It ran for about 10 minutes, and then bombed with the following exception:

Java_heap_space__javalangOutOfMemoryError_Java_heap_space___at_orgapacheluceneutilUnicodeUtilUTF16toUTF8UnicodeUtiljava236___at_orgapachelucenestoreIndexOutputwriteStringIndexOutputjava102___at_orgapacheluceneindexFieldsWriterwriteFieldFieldsWriterjava232___at_orgapacheluceneindexStoredFieldsWriterPerFieldprocessFieldsStoredFieldsWriterPerFieldjava56___at_orgapacheluceneindexDocFieldConsumersPerFieldprocessFieldsDocFieldConsumersPerFieldjava37___at_orgapacheluceneindexDocFieldProcessorPerThreadprocessDocumentDocFieldProcessorPerThreadjava234___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava762___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava745___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2215___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2187___at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java238___at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava60___at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava140___at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69___at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54___at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131___at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333___at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303___at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232___at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089___at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365___at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216___at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181___at_orgmortbayjettyhan

Java_heap_space__javalangOutOfMemoryError_Java_heap_space___at_orgapacheluceneutilUnicodeUtilUTF16toUTF8UnicodeUtiljava236___at_orgapachelucenestoreIndexOutputwriteStringIndexOutputjava102___at_orgapacheluceneindexFieldsWriterwriteFieldFieldsWriterjava232___at_orgapacheluceneindexStoredFieldsWriterPerFieldprocessFieldsStoredFieldsWriterPerFieldjava56___at_orgapacheluceneindexDocFieldConsumersPerFieldprocessFieldsDocFieldConsumersPerFieldjava37___at_orgapacheluceneindexDocFieldProcessorPerThreadprocessDocumentDocFieldProcessorPerThreadjava234___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava762___at_orgapacheluceneindexDocumentsWriterupdateDocumentDocumentsWriterjava745___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2215___at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2187___at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java238___at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava60___at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava140___at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69___at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54___at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131___at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333___at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303___at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232___at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089___at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365___at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216___at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181___at_orgmortbayjettyhan request: http://localhost:8983/solr/mysolrcollection/update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin 

When I look at the Solr Collection in ColdFusion Administrator, it is much much larger than the original Verity Collection - the existing Verity Collection was about 84-85MB with 9000+ Documents, and this one is 1.3GB with only 847 Documents.

This search functionality is critical to the application, and I fear if the migration to Solr does not work, we'll have to hold off on upgrading to CF10.

Was it helpful?

Solution

Make sure you have ColdFusion Hotfix 2 for ColdFusion 9.0.1 installed.

Cumulative Hot Fix 2 | ColdFusion 9.0.1

The Hotfix includes some major bugfixes for Solr, especially when it comes to indexing .PDF files. Or install ColdFusion 9.0.2, but that doesn't support Verity any more. So you won't be able to switch between Verity and Solr.

OTHER TIPS

This sounds like a one-time import process. Have you tried batching your results to maybe 500 docs per iteration. Coldfusion doesn't do well when pages go beyond 1 minute in my experience.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top