Question

What I understood about the term "Dark data" ,

Dark data is a type of unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed. It is similar to big data but differs in how it is mostly neglected by business and IT administrators in terms of its value.

Also, IDC, a research firm, stated that up to 90 percent of big data is dark data.

The questions are,

-- Why the hell Big data exits & makes noise in the market though Dark data is more important?

-- Also what factors makes this separation of the Big data & Dark data?

I would really appreciate if you share some knowledge drops on this topic.

Was it helpful?

Solution 2

Dark Data refers to actual data in the narrow sense (bits and bytes, text, images, sound and so on) with certain characteristics, mostly around being neglected or underappreciated in some way.

The following statement therefore makes sense:

We've accumulated 100TB of dark data that we have no idea what to do with.

Big Data is a set of technologies, practices and solutions related to solving business problems in a particular way, mostly a variation on collecting and storing vast bodies of information and using it to some purpose. Big Data does not typically refer to data in the narrow sense (bits and bytes, etc.).

Consider this:

We've accumulated 100TB of big data on our servers.

Doesn't it sound awkward?

As you may tell, Big Data is more of a marketing/business metaphor. When Marketing picks up the scent of Dark Data and turns it into a buzzword like Big Data, then we can start comparing them apples to apples. But for now, we have:

Dark Data == underutilized and underappreciated data 
Big Data == collecting, storing and analyzing vast bodies of information

With that in mind I can attempt to divine the meaning of the initial quote stating that "up to 90 percent of big data is dark data" (wording which I personally find lame and mostly designed to grab attention):

Up to 90% of data collected under Big Data initiatives is not utilized to its fullest potential: most of its real value still lies hidden and unrealized.

I'm guessing the rest of the piece talked about how data science is still in its infancy and how much work still lies ahead, if we ever hope to tap all those unseen insights.

OTHER TIPS

There are three types that a dark data has

  1. Data that is not currently being collected.

  2. Data that is being collected, but that is difficult to access at the right time and place.

  3. Data that is collected and available, but that has not yet been utilized, or fully applied.

Big data problems are not caused by the inaccessibility of data, but by the abundance of data.

Companies going after dark data problems are usually not playing in existing markets as customers are aware of their problems. They are creating new markets by surfacing new kinds of data and creating exceptional applications with that data. But when they succeed, they become big companies.

Check the link to know more about the difference written by Doug Miles, director of market intelligence for AIIM.

Dark data is unmanaged, uncategorized and untapped - it occupies valuable storage and could contain hidden risks as it often exists at the periphery of a company's information/retention policies, therefor has not yet been analyzed or processed. While it is similar to big data, the difference is that it is neglected by business in terms of its potential value.

Dark data can be email inboxes of long departed employees, old financial information, forgotten copies of spreadsheets. If auditors and lawyers have to become involved in finding this information, costs can become astronomical. Having "dark data" can put companies at a high level of risk for being fined or sanctioned due to having unprotected, confidential information on their systems but not being managed or protected. Like Big Data, Dark Data can take up terabytes worth of diskspace.

Big data is the information that is managed, structured and protected. It makes a lot of noise because there is a lot of it (we create 2.5 quintillion bytes of data every day) and companies need to provide resources to safeguard it (think volume, velocity and variety.)

Through analysis, some Dark Data may be converted to Big Data.

Dark data is that digital information that is currently not being used. However, this dark data could be assets that an organization collects, processes and stores in the course of its regular business activity for future use.

Potentially, this data can be used to drive new revenue sources, eliminate waste and reduce costs. As a result, many organizations store dark data for regulatory compliance purposes VS.

Big data refers to large-scale data that is generated in a digital environment. This big data is generally large in size and has a short generation cycle.

It includes not only numeric data but also text and image data. Big data environment is more diverse than previous ones. Generally speaking, big data is all the records from IoT devices, machines, and devices. There are some solutions designed for Big data (e.g. Machbase database, Hadoop, etc.) They usually process a bunch of data in real-time, including data storage and analysis functions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top