Question

'I have to develop a software which is meant for Business Analyst of “Future Stores” Supermarket, the software performs the Association Rule Mining on given transitional data of supermarket sales transactions and prepares Discounting policy by preparing Combo. The software makes use of the data mining algorithms namely Apriori Algorithm. The Association Rules will be displayed in User friendly manner for generation of discounting policy based on positive association rules.'

From where can I get the supermarket dataset to check the Apriori algorithm which i have coded?

Was it helpful?

Solution

To get a market dataset, you can go here : fimi.ua.ac.be/data/ and download the retail dataset.

It is an anonymized datasets of transactions from a belgian store.

It is perfect for testing Apriori or other frequent itemset mining and association rule mining algorithms.

OTHER TIPS

Instead of looking for a real-world dataset, you should design a small, specific dataset for each unit test. The dataset should provide the minimal necessary precondition to verify a single feature of the system. This will make it easier to detect bugs, maintain tests over time, and demonstrate the capabilities and usage patterns of the system to other developers.

An example from a different domain would be tests for a User Subsystem that creates and validates logins to a website.

  • addsNewUser - empty dataset
  • throwsExceptionForDuplicateUsername - single-user dataset
  • correctPasswordPasses - same dataset
  • throwsExceptionForIncorrectUsername - same dataset
  • throwsExceptionForIncorrectPassword - same dataset
  • throwsExceptionWhenNewUsernameExists - two-user dataset

Update: If you need a very large dataset to perform integration or performance testing, you are probably left with writing a program to generate a random collection of purchases. I doubt any existing supermarkets are willing (or able) to part with their real datasets.

That being said, while working as a contractor for a health insurance provider many years ago (pre-HIPAA) I was given a sample dataset to work with. It contained real patient information including SSNs and confidential medical history. :(

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top