Question

Am building a website which will have articles, policies and laws and text stuffs. I am storing all the data (in some cases the articles with over 8000 characters) in MSSql 2008 database. I read some articles where they are saying text data should not be stored in databases. Where should they be stored? in .txt files or something? I also want to search through the data. If they are stored in DB i can use stored procedures etc. If stored in docs, i would need to use tools like Lucene. Am i right? Is my approach of using DB wrong for this project? Please enlighten me.

Was it helpful?

Solution

You will be using a DB of some description for this project no matter how you look at it, whether it be: 1) An old fashioned flat file database (txt documents, not recommended for large scale projects imho) 2) A traditional text storing database 3) A database of documents

The argument whether to use a DB of text or a db of documents depends on which skills/knowledge you possess or are likely to get access to (or assistance with). It sounds to me like you are more comfortable with a DB of text and in my opinion there is nothing wrong with that - worst case scenario if there ends up being a genuine need for documents to be used in the long run rather than straight text storage you should be able to generate the documents automatically from a text database - I suspect doing the reverse would be a lot more tricky (converting a load of proprietary documents to text for storage and insertion). Generating a plain text file from a text databse is trivial, and most vendor document formats support the importing of plain text documents for subsequent formatting.

For a large project like this you really need to spend some time considering what your documents are likely to be used for and by whom, and what methods best match them. If you are providing a database for people that heavily use MS Word and want to download your data you probably need to consider using a document DB. If it's just the information you want to provide (and web-based tools) you want to consider how you want to manipulate your own data.

This is all opinion obviously, but my last advice would be make sure you use utf-8 text from the outset if you go down the text route (bitter experience).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top