Is big data getting too big? A whopping 90% of all existing data in the world was produced in just the last two years – and only 20% of it is being used. With big data analytics unable to keep pace, this exponential rise is proving impossible to cope with, and there’s one very obvious result: most of the data being collected by businesses, individuals and by Internet of Things sensors is not being used.
Unstructured, unused and unloved data lurking on the computers, servers and archives of organisations across the globe is clear evidence that businesses, while becoming increasingly digitised and data-centric, are still living in the dark ages.
Unused or 'dark' data is the story of the business world failing to live up to expectations on a massive scale. Dark data is defined by Gartner as ‘the information assets that organisations collect, process and store during regular business activities, but generally fail to use for other purposes’.
“Primarily we are talking about transactional information, log files, metadata which has not been used, small bits of unanalysed information which appear to have no value and may well be seen as the waste product of other systems and processes,” says John Culkin, Director of Information Management at Crown Records Management, who advises firms on data policies. He also adds to that list draft, temporary and old emails, and ZIP files.
“Dark data makes up around 80% of total content in any organisation,” says Stephen Mackey, Senior IM Consultant at information management firm Kefron, who insists that it’s the result of standard day-to-day business processes. “Dark data is all the content that is left behind, hidden in systems and servers, and underused or forgotten about,” he adds.
According to IDC, 90% of unstructured data is never analysed, which is often the result of a dangerous anti-delete attitude, fuelled by both compliance regulation and the availability of cheap data storage in the cloud and elsewhere.
“For a retail or manufacturing company, for instance, financial information may be rightly kept as a record,” says Culkin, adding, “but although data generated by many sales and delivery systems is not required, it is rarely deleted.” But a conservative attitude towards data conversely creates risk.
There are two main ways dark data can damage a business. Firstly, there’s a security risk in not deleting data. “It’s important that files are not forgotten about,” says Mackey. “If they are not monitored and kept safe, the business-critical information they contain could be mined without knowledge and used for nefarious reasons.”
Data that isn’t going to be used should either be deleted or protected from unauthorised access, because confidential, sensitive and unstructured information could include customer account details, which produces compliance issues.
The second way dark data can harm a business is by indirectly costing it money. “Many businesses are unaware of what kind of data even exists, and it’s this hidden data that hampers internal reviews and external audits,” says Mackey. “What if an issue is raised about an account from two years ago, and payment is called into question, but the invoices and records cannot be found?” he asks. The answer is simple; dark data costs businesses money.
Furthermore, it’s well worth bearing in mind that being overloaded with silos of big data can also mean a loss of business focus – not being able to see the wood for the trees was never what data generation and the IoT was meant to be about. “Being overwhelmed by data and not having useful managed information can be the difference between excessive costs and risk, and ultimately the ability to deliver high quality products to customers,” says Culkin.
There’s a belief that all data could potentially be helpful one day, and that big data analytics – particularly as the field sees improvements – can extract something useful somewhere along the line. But big data which doesn’t provide insight that addresses a company’s needs is counterproductive. “Modern analytical tools are easier to use and more powerful than ever, but they are not magic,” says Culkin. “The right questions still need to be asked of data before useful information is found.”
Absolutely, and at a rate that few imagined, and one that is only going to increase. Much IoT activity is fully justified, with sensors providing real-time data that can be transformed into business-critical information, to prevent problems with machinery, for example, to highlight trends, or to measure both performance and quality of industrial assets.
“Not all data is equal – just because you can generate data doesn’t mean it needs to be kept indefinitely,” says Culkin, who calls IoT sensors ‘data acquisition systems’ that continually collect data of all types purely because they can.
Having a clear understanding of why data is being collected, what questions the system is going to be used to answer, and who it will be useful to are important considerations in trying to fend off data overload, according to Culkin.
“These questions should be asked when configuring or even purchasing sensors and telematics otherwise you can end up with disjointed, unanalysed data that is simply collected for the sake of it and that has no value,” he says of dark data.
Keeping disparate data in a business system is unproductive and risky, but there are ways to address this that can generate a competitive advantage. “If you can connect, organise and analyse this dark data, not only can you secure it against potential risks, but you’ll also be able to use the content for productive purposes,” argues Mackey. “The company that is able to identify useful and relevant data, and subsequently adopt or make use of it to the fullest, is the organisation most likely to succeed, outperform and deliver value outwardly and inwardly.”
Of course, there is a major IT resources issue at the heart of the problem of dark data. “Many companies do not have the basic content management, storage systems, search functionality and reporting ability in place to utilise dark data effectively,” says Mackey.
The result is a 'splintering' of responsibility for data within organisations – an AIIM Research report found that 80% of companies have yet to allocate a senior role responsible for overseeing what a company does with its data.
“That results in disorganisation and poor intelligence when it comes to analytics,” notes Mackey. Until that’s addressed – and particularly in the coming era of a fast-growing IoT – more and more data is likely to be generated and left to go dark.