Question

According to the book I am reading, a statistical database is a database that permits queries that derive aggregated information but not queries that derive individual information. At current time, is it possible to build a statistical database? And how? Can it be created using SQL?

Was it helpful?

Solution

What is a statistical database?

At a high level, it's just a type of database that only stores statistical data. An example is a census database. Typically, access control for a pure SDB is straightforward: Certain users are authorized to access the entire database.

It's about the data not the engine

Keep in mind, this is about the data and not the type of relational database management system you chose such as MySQL, SQL Server, Oracle, etc. Any database can be a statistical database if you store statistical data in said database such as census information and so forth.

Typical structures for statistical databases

That being said, most statistical databases are Online Analytical Processing (OLAP) which is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations (e.g.: derived aggregated information and not derived individual information).

OLAP vs OLTP

Can statistical databases be created using SQL

Yep. Any database engine can be used to create a statistical database if the data is one that provides data of a statistical nature, such as counts and averages and the users accessing the database are querying aggregate, or statistical, data from said database as opposed to individual records on the users the statistics are ultimately based on.

Being most of the popular database engines use a form of SQL to get data to and from the underlying system, then the answer to the question is, "yes".

Why is individual information restricted?

This greatly depends on the business or organization using the statistical database for their needs. In general, individual records, what the statistics are based on are restricted for privacy reasons. For example, medical statistics based on medical records from the local hospital.

Therefore, security is a big concern for statistical databases in order to help prohibit users from unearthing individual information from the use of aggregated statistical information.

Examples of how individual records can be compromised with statistics

A statistical user of an underlying database of individual records is restricted to obtaining only aggregate, or statistical, data from the database and is prohibited access to individual records. The inference problem in this context is that a user may infer confidential information about individual entities represented in the SDB. Such an inference is called a compromise. The compromise is positive if the user deduces the value of an attribute associated with an individual entity and is negative if the user deduces that a particular value of an attribute is not associated with an individual entity. For example, the statistic sum(EE· Female, GP) = 2.5 compromises the database if the user knows that Baker is the only female EE student.

Hope that helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top