How can I copy hadoop data to SOLR

Question 1

If you have a standalone Solr instance, then you could face some scaling issues, depending on the volume of data.

I am assuming high volume given you are using Hadoop/HDFS. In which case, you might need to look at SolrCloud.

As for reading from hdfs, here is a tutorial from LucidImagination, that addresses this issue, and recommends the use of Behemoth

You might also want to look at Katta project, that claims to integrate with hadoop and provide near real-time read access of large datasets . The architecture is illustrated here

EDIT 1

Solr has an open ticket for this. Support for HDFS is scheduled for Solr 4.9. You can apply the patch if you feel like it.

Question 2

You cannot just copy custom data to Solr, you need to index* it. You data may have any type and format (free text, XML, JSON or even binary data). To use it with Solr, you need to create documents (flat maps with key/value pairs as fields) and add them to Solr. Take a look at this simple curl-based example.

Note, that reading data from HDFS is a different question. For Solr, it doesn't matter where you are reading data from as long as you provide it with documents.

Storing index on local disk or in HDFS is also a different question. If you expect your index to be really large, you can configure Solr to use HDFS. Otherwise you can use default properties and use local disk.

* - "Indexing" is a common term for adding documents to Solr, but in fact adding documents to Solr internal storage and indexing (making fields searchable) are 2 distinct things and can be configured separately.