Pregunta

I understand that Splunk does not need a lot of functionality that a MySQL database would provide, and to index and perform searches on Big Data it might not be a good option to use a relational database.

Does Splunk use Lucene as a search engine, or have they made their on-disk data format?

I am sorry if there are any problems in the way I am asking the question. This is my first question on Stack Overflow.

¿Fue útil?

Solución

Splunk uses its own search engine, it's not based on any 3rd party.

Its search engine is based on files only, no database behind it. It does not store fields, but raw data only. The fields are extracted during search time, and due to that are very dynamic. Its also very fast in finding keywords in the data (needle in haystack).

  1. Breaking the data into time-based events, attaching time for each raw event.
  2. Marking every word found in the events and their location across the index
  3. Storing the events in compressed format (tar.gz)

To be more detailed, Splunk is storing data in the following way:

  1. Very fast search for keywords inside the events
  2. Look in the original raw data
  3. Create new fields on the raw data and use them with statistics commands.

Source: http://www.splunk.com/web_assets/pdfs/secure/Splunk_for_BigData.pdf http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Howindexingworks

+3 Years experience Splunk architect.

Otros consejos

Splunk has proprietary data format for their indexes. Lucene is not used, and Splunk has it's own Search language called SPL.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top