Question

Suppose, I am developing an application to keep track of daily sales of of a small retail shop of only three employees.

Suppose, the owner agrees to buy only one PC, and, refuses to purchase a copy of DBMSs like MS SQL, Oracle, etc. As a result, in order to have a no-frill system, I have decided to develop an MS Access file-based desktop application.

On an average, the retail store handles 5 customers/hour, and, on an average, each customer purchases 15 items. The store is kept open 15 hours/day. So, on an average, ≈7000 records will be entered in the table-DailySales. There will be no multi-user. One user at a time. Change of user in every 5 hours or so.

Suppose, I design the database as follows:

 1. UnitType {ID, Type, Desctiption} 
 2. Product {ID, Name, Description, UnitTypeID}
 3. ProductPriceHistory {ID, DateTime, ProductID, PricePerUnit}
 4. DailySales {ID, DateTime, ProductID, Qty}

But, I suspect, this design is sure to end up in one table with huge number of rows. For instance, after only one week, DailySales table will become so huge that the database will be cumbersome to manage. If I can anticipate correctly, this design is destined to fail.

How can I address this issue and think of a better design?

Was it helpful?

Solution

You could use sqlite. It can store a lot of rows and work on many operating systems (Windows, Linux, Android, MacOSX).

You could consider installing and using some Linux system on that single PC and develop your system on that (perhaps as a web application using some database), and use some free software RDBMS like PostGreSQL or MariaDb (or MySQL, very close to MariaDb). They are capable of dealing with a lot of rows (see this for PostGreSQL, and this and other things for MySQL). In practice, the limits are constrained by the hardware capabilities.

I have decided to develop an MS Access file-based desktop application.

That might have been not the best decision. You should consider free software alternatives (like those mentioned above), and you might think of some web application (usable from several browsers, perhaps on cheap tablets).

Notice that RDBMS of hundred of millions rows are routinely deployed on Linux systems running PostGreSQL or MariaDb (or MySQL, a near equivalent).

Whatever technical solution you think of, don't forget to backup the data very periodically and to define some backup procedure (and to check once in a while that you are able to restore from the backups).

Most of the cost is probably related to your development time and efforts and to your skills. That is probably more costly than the hardware or any software license you'll need.

If I can anticipate correctly, this design is destined to fail.

This is false if you use a real RDBMS on a Linux system (those freely available on most Linux distros), or if you use sqlite. Your design is valid (and you could use free software for it; all the products mentioned here are free software). Your choice of database and of operating system is questionable. BTW, developing from scratch your own POS software might be more expensive than using existing solutions (and you might even find, adapt and improve some free software ones).

For instance, after only one week, DailySales table will become so huge that the database

10000 more rows each day is tiny. Most RDBMS (and sqlite) can handle that. In 3 years, that means 10 million rows, not a big deal. Of course you need to dimension the disk correctly (but assuming 4Kbytes of disk space per row, 40Gbytes is not much; probably in your case each row consumes only several dozens of bytes). But your database is small or tiny w.r.t. to today's practice. Don't be worried by the number of rows (but do define correctly relevant database indexes, they are related to the queries you'll make). Most databases can very easily handle many dozens of millions of rows (if your database schema is good enough), this is not an issue today. So you don't have a "huge number of rows" but a rather small one.

If (for a reason you did not explain) you need to develop a desktop software (not something running in a browser) you could develop some desktop application with a GUI on Linux using some RDBMS (e.g. using Qt). However, a web application could be used from several cheap tablets. And you can find HTTP server libraries (e.g. Wt or libonion, for C or C++ on Linux) to develop it (see also this).

OTHER TIPS

after only one week, DailySales table will become so huge that the database will be cumbersome to manage

I question that. I have been using MS Access regularly for more than 20 years, and used it successfully for databases with more than 500.000 up to 1 million records, as long as there were only a few concurrent users. You need, however, to care for similar things as when using bigger systems:

  • 100% physical separation between the database backend and frontend (which is easily missed in case you are developing both backend and frontend with MS Access)

  • proper normalization and indexing

  • proper upgrade strategy for new versions of the application & the DB schema

  • implementing backup & recovery strategy, maybe a database repair & compactification strategy

  • long term archival strategy

Given the expected number of records per week, I guess you should aim for a strategy where you archive older sales records from time to time when they are older than (approx.) six months.

Depending on the requirements of your client and the use cases involving older sales, this could mean

  • to archive the whole database file and then delete all older sales from the current "live database", and/or

  • to aggregate older sales in a space-efficient way, so one can provide those data online for a much longer period than just 6 months. Maybe the "total sales per day" is sufficient after 6 months?

  • to let the users switch between older database files for accessing older sales (probably in a "read-only" fashion), and newer database files records for newer sales.

If a six month interval is too frequent for your customer (12 months is often required for tax reasons), you may pick a different serverless db system which supports more records/larger files than 2GB (like SQLite or MS SQL Server Local). Or you just try MS Access and look how far it gets your, I would not be astonished if a full year of data can be handled properly in your case. Just make sure you implement an archive strategy which can be applied when it becomes necessary.

The limits

The volumes should not be a problem for a modern database.

MS-Access has a constraint of maximum approximately 2 GB per table. Looking at the type size, it appears that one record of DailySales is currently around 24 bytes. Let's round it to 40. This means that MSAccess would still be able to store 50 millions records, which means 64 years of sales data if your shop makes in average 15.000 lines per week.

A more concrete constraint could be the type of the ID field. If you go for auto-numbering, which is a 4 byte unsigned integer, you'd be limited to 4 million records, a limit which could be reached within 5 years already. A workaround could be to use a composed primary key, with the business year and the autonumber, and reset the autonumber every year.

The performance

You may be more worried in performance. What is important there, is to index the tables at least on their ID fields (for accelerating the joins). Also index the date in DailySales (for accelerating sorting).

Just for illustration, indexing allows the database to find any record in 10 years of sales data in less than 15 reads, instead of going through 7 millions records.

The biggest impact on the performance with MSAccess, is the multi-user access, since every PC will run an MSAccess engine that will have to access the file on its own, whereas on a DBMS you'll have a dedicated server process. However, in your use case, you only have one PC, so this should not be your main concern.

The design

Without knowing the objectives, it's difficult to judge the design. But from what I can see:

  • Quantitative sales statistics on products will be easy, assuming that the unit type of products never change.
  • Sales figures will be difficult to compute because there's no easy join between ProductPriceHistory where the price is stored and DailySales which holds the quantities to multiply with the unit price. You'd better store a ProductPriceHistoryID in the DailySales.
  • I'd even suggest to store the price used in the DailySales, because this could allow to register ad-hoc rebates, in case of customer bargaining or small issues on a specific product box.

Conclusion

If despite your arguments, the owner doesn't want to invest in a DBMS, you can certainly start small with MSAccess. If after the first years, the performance will decrease significantly, despite indexes and other optimizations, then you could switch to a more robust system.

If you want to support ad-hoc queries, just use SQLLite, Sql Express or other 'local' db engine.

If you do not expect to have to run ad-hoc queries, but support a fixed set of reports, you can look into the approach of 'materialized views'. This is an approach where you look at the view as an observable over underlying observables. The stream of data entry events is your root observable, which is pushed down a pipeline of processing to fan out into each materialized view. In other words. How you store those views and retrieve them is then a much more trivial problem.

Licensed under: CC-BY-SA with attribution
scroll top