سؤال

For faster reporting and performance analysis, we want to insert our web server logs into Sql Server. This will allow us to see traffic patterns, issues, slowdowns in near real-time.

We have a daemon that listens for request/response events from our load balancer and bulk inserts into the database.

However, we get around 1 GB of logs per day and we only need to keep about a week around (at least in this raw form).

What is the best way to store this data and the best way to delete old entries?

We've talked about storing each day's data in its own table, e.g. Log_2011_04_07 would have all the entries for that day, and then dropping the oldest table. A view could be created to span all the day tables for easy querying. Is the feasible?

هل كانت مفيدة؟

المحلول

You should look in to partitioning.

http://technet.microsoft.com/en-us/library/dd578580%28SQL.100%29.aspx

The cool thing about partitioning is that you have just one table name (as opposed to the multiple table approach) so your insert statements remain static. It works with every application - it's completely transparent to queries. You don't have to worry about what happens if you end up with different indexes or statistics on each of the tables, either.

You create a partition function that decides how to break up the table into multiple tables behind the scene. The function can only take one input parameter/field, and in your case, it would be a date field. The function can break up the table by date, week, month, or year - in your case, you'd want date, 24-hour period.

Then build a SQL Server Agent job that uses T-SQL to swap out the last partition every day. The delete becomes a metadata operation, and it's blazing fast. Swap the partition, then drop the old one out.

نصائح أخرى

We developed a webstatistic logging product 6 years ago that allows us to track every click of a users visit.

What we did were to buld record every visit as you wrote and the have the scheduled daemon parse the logs and normalize the data for further lookup later. As soon as the data/record was parsed, it was removed to keep the data structure low.

For our next version of the product, we will distribute the bulk-collectors seperately on the websites and then use the daemon to collect the data and clean up afterwards by issuing commands to the bulk-service.

This way we can handle a "scheduled maintainance" without loosing data.

Regarding the cleanup issue on the center server, our current plan is to add "timestamps" to be able to archive data after eg. 3 month.

We have thought this just like MIP-MAP textures in 3D games/rendering. The closer you get, the more detailed data, the further away, the more "grouped" and less detailed.

So on day to day basis, we can observe visitors patterns, but after 3 month those data arent really relevant and we compress the data into less details.

We havent decided if we will break the database into chunks for this to keep the "detail level" seperated pr. database. But we just might, as there is some nameing issues if we store different levels in same database.

Hope you can use this for something? I cant provide you with example code as its part of our company's product.

Create another table Daily_tables with two columns: Table_name and Date_table_created. In your code that creates a new daily table (that loads the web logs), add another entry to populate the Daily_tables table with the name of the table created, and the timestamp (current date time). Create a SQL agent job that will run a TSQL script every week. The TSQL should drop all tables names (Table_name) from the Daily_tables with a Date_table_created timestamp that is older that 7 days.

Hope this is what you were looking for :)

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top