Het belang van metadata bij Big Data | Van Bussel Document Services

John McAfee vluchtte uit Belize, uit anst voor een aanklacht in het kader van een moordonderzoek, maar hij verdween niet. Hij bleef zichtbaar voor het publiek door blogposts, tweets en mediarapportages. Hij had waarschijnlijk de politie kunnen blijven ontwijken en virtueel aanwezig te zijn, ware het niet voor een enkele elektronische broodkruimel: een foto van McAfee op de website van Vice, een magazine over kunst en cultuur uit New York. Het beeld onthulde weinig, maar de informatie die in die foto was ge-embed gaf precies de coordinaten weer waar de foto genomen was. En dat was het begin van het einde voor McAfee….

Volgens ‘The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East’, een rapport uit december 2012 door IDC en EMC is metadata een van de snelst-groeiende subsegmenten van dat digitale universum.

Het probleem is dat terwijl het volume metadata groeit (en ook het belang ervan om data te kunnen begrijpen) het geen gelijke voet houdt met de groei van ‘big data’. IDC noemt dit het ‘big data gap’ en het vereist van CIO’s om hun data management strategie te ‘herdenken’.

Bijvoorbeeld voor het gebruik van het resource description framework, of RDF. Het wordt gebruikt om informatie op het Web te presenteren, ‘the data is broken up into subjects, verbs and predicates (also called “triples”), then graphed’. In RDF zijn data en metadata zo nauw verweven dat de twee moeilijk van elkaar te onderscheiden zijn.

‘However, in order to use this approach, typical data stores have to be broken down into these triples that can be connected’, zo stelt Gwen Thomas, oprichter van het Data Governance Institute. ‘The tight coupling of data and metadata is almost certainly in everybody’s future, but not in this decade for most of us. Companies investing in ‘big data’ tools will need to think beyond storing and analyzing large data sets to consider the tags or labels that give the data context over time. Without metadata, companies will forfeit some of the deep insights big data can yield, including the identification of important business trends by analyzing detailed data over time’.

‘The interesting thing about big data tools is that they let you store pretty much anything that you want very easily, and they ‘persist’ the data’, zegt Phil Shelley, CTO van Sears Holdings Corp., en CEO van MetaScale. ‘But the data is completely useless if it’s not meaningful in a way that you can use it and retrieve it’.

Gwen Thomas stelt dat de komst van ‘big data’ de houding ten opzichte van metadata aan het veranderen is. ‘When you’re talking about ‘small data,’ there’s always the possibility of actually sampling the data itself to see what’s in it’, zo stelt ze. ‘But you don’t have that option with big data. It’s like drinking from a fire hose: You’re going to get knocked over’.

Metadata in de relatie tot context is iets wat al lang erkend wordt, maar wat niet altijd geaccepteerd is, zo zegt Shelley. ‘Documenting the metadata is a boring chore most people don’t want to do. People want to get the data in, use it and produce value. Traditionally, the business has been able to circumvent the dreaded metadata discussion because data repositories — such as a relational database — neatly organize the data into rows and columns, and the metadata is heavily implied by the structure’. Maar: ‘The advantage of big data tools is that they don’t enforce a strict schema on your data when you put it away. They allow you to read the data and apply schemas onto the data at the time you read it’.

Zo is de belofte van de ‘big data’ tool Hadoop ‘its ability to maintain years’ worth of corporate data for analysis. But the disadvantage is if you don’t have a good handle on what the data is and on what the metadata is, you really don’t know after a while what you have’, zo stelt Shelley.

‘Metadata not only helps provide a data legacy for businesses’, zegt David Marco, president van EWSolutions. ‘Metadata can also help companies establish data consistency’. Er is ook een andere reden: ‘reducing the IT footprint‘. ‘When companies have multiple order entry systems, financial systems and copies of the same data, they’re spending money to keep the redundant systems operational, and that can be incredibly costly’ zo stelt Marco. ‘How do you get off of them? You have to know what data you have, what it means, where it’s located — and that’s all metadata management’.

Marco heeft twee boeken geschreven over metadatamanagement, die zeer de moeite waard zijn om kennis van te nemen.

Van Bussel Document Services

Auditing, Strategic Consultancy and Research

Leave a Reply Cancel reply

Share This:

Leave a Reply Cancel reply