Pregunta

I have an ESB that processes lots of transactions per second (5000). It receives all types of request in different type of formats (xml, json, csv, and some are formatless). As you can imagine that is a lot of requests being processed.

The problem is due to requirements, I have to log every single of this data for auditing/issue resolution. These data have to searchable using any part of the request data that comes to the user's mind. There major problems are:

  • The data (XML) are heavy and cause insert locks on our RDBM (SQLServer 2008).

  • Querying these large data (XML, and other unstructured data) takes a lot of time especially when they are not optimized. (Free Text Search didnt solve my problem, it is still too slow).

  • The data grows very fast (expectedly - I am hoping there are databases that can optimize saved data to conserve space). A few months data eat up hundreds of gigabytes.

The question is, what database or even design principle can best solve my problems: NoSQL, RDBMS, others? I want somethign that can log very faster and search very fast using any of part of the stored data.

¿Fue útil?

Solución

I would consider Elastic Search: http://www.elasticsearch.org/

The benefits for your use case:

  1. Can scale very large. You just add nodes to the cluster as the data grows.
  2. Based on Lucene, so you know it's a time tested search engine.
  3. It is schemaless, so you don't have to do any ETL to store data. Just store it as is.
  4. It is well supported by a good community and has many enterprise companies using it (including Stack Overflow).
  5. It's free!
  6. It's easy to search against and provides lots of control over how to boost certain results so you can tune it for your domain.

I would consider putting a queue in front of it in case you are trying to write faster than it can handle.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top