Blog der Hauptbibliothek

Dark Data in Research

16. April 2019 | HBZ | Keine Kommentare |

This post is also available in: Deutsch

In 2018, IBM estimates the share of dark data at approx. 80% of the total amount of data. Overall, just 0.5% of the data is analysed at all. The potentially usable part of dark data is estimated at approx. 35% (Source: https://www.ibm.com/blogs/think/be-en/2018/04/24/marketing-dark-dark-data/). May these estimates be accurate or not, it is clear that large amounts of data are produced but never (re)used. 

Dark data is commonly understood as a subset of big data, a popular and vague term. It includes video and audio recordings of human language as well as unstructured text data, such as those produced millions of times today in social media. Big data analysts have set themselves the task of finding ways and means of tapping new insights. Commercial companies usually aim to create capital value and hope for better predictions of market developments and user behaviour. Dark data lies unused in the shade and could be valuable – but nevertheless: a large part of it is not.

Dark data in research consists of unstructured raw data, log files and notes, which are not reused but still take up valuable storage space and thus contribute to continuous data growth. Until recently, even well structured and documented data from research projects had been kept away from the public and degraded to dark data. It is only through the reorientation of research funders towards Open Data that valuable research data is made visible. Data repositories in which researchers can register and document data have a key role in this process.

However, the publication of research data alone hardly prevents the problem. While some repository operators require a certain quality of data, everything can be published without any review in other repositories – even data that is not or poorly documented. Even though most research projects do not fall under the term big data, the total amount of research data distributed in the cloud does. Researchers have to face the challenges of storing data that is worth preserving and, so to speak, is “business-relevant” and the well-considered deletion of worthless data.

The Swiss National Science Foundation (SNSF) is supporting the exchange of information on these issues as part of the “Scientific Exchange” program, so that researchers can communicate with each other on documentation and metadata standards: http://www.snf.ch/en/funding/science-communication/scientific-exchanges/Pages/default.aspx

_
Picture Credits: David Huang, Unsplash

Abgelegt unter: Open AccessResearch Data
Tags: