Big Data – Big Data Basics

Big Data 3Vs cardboard-box-icon

This post is to continue from the introductory post Big Data – Introduction (this link will open in a new tab of your current browser window) on Big Data about the “3Vs” that define Big Data. As I researched the subject of Big Data, three terms – Volume, Velocity and Variety stood out in relation to the “3Vs” of Big Data which leads me to explain to you in this post the widely accepted definition of Big Data from Gartner (the world’s leading information technology research and advisory company) analyst Doug Laney who has characterised Big Data as “data that’s an order of magnitude greater than data you’re accustomed to.”

Accordingly, this “3Vs” model for describing Big Data spans three dimensions, data increasing in volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources).

The first dimension/characteristic, Volume is about how Ed Dumbill, program chair for the O’Reilly Strata Conference (the leading event that offers the nuts-and-bolts of building a data-driven business – the latest on the skills, tools, and technologies you need to make data work and bringing together practitioners, researchers, IT leaders and entrepreneurs to discuss big data, Hadoop, analytics, visualisation and data markets –  the people and technology driving the data revolution), describes Big Data as “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

To give you an idea of the volume of data that is increasing exponentially on an annual basis, customer transactions at Walmart is reported to estimate to more than 2.5 petabytes of data every hour. Perhaps these infographics, courtesy of the online storage site Mozy, and Cisco will help you visualise the meaning of pentabytes of data and how it expands further into zettabytes sometime into the future.

Visualizing The Pentabyte Age

Infographic credit : http://mozy.com/blog/misc/how-much-is-a-petabyte/

The Internet in 2015

Infographic credit : http://blogs.cisco.com/news/the-dawn-of-the-zettabyte-era-infographic/

Velocity, the second dimension/characteristic describes the frequency at which data is generated, captured and shared in every imaginable device that all produce torrents of data.

I am sure you have heard of a batch process that takes a chunk of data, submits a job to the server and waits for delivery of the result. In a batch process, the incoming data rate is slower than the batch processing rate but the result is useful despite the delay. For many new applications sources of data, the batch process is just not possible anymore since the speed of data creation is even more important  than the volume. The data is now real-time or nearly real-time  information streaming into the server in a continuous fashion.

The available data in the world today comes from everywhere, this Variety, the third dimension/characteristic signifies the proliferation of data types that add new data types  which no longer fits into neat, easy to consume structures of traditional transactional data, all of which exists as a by-product of ordinary  operations: those being generated by humans from posts to social media sites, digital pictures and videos, purchase transaction records, and GPS signals from cell phones, and from “sensor” data generated from computers and network devices and embedded chips used to gather climate information, from refrigerators and airplanes to bodily implants, and more.

The International Business Machines Corporation (IBM) adds Veracity as the fourth dimension of Big Data. Veracity is when the confidence of the quality (precision and accuracy) of the variety and number of information sources is doubted.

I guess this is enough to known briefly about the basics of Big Data.

References:
About 2012, O’Reilly Strata Conference, viewed 13 December 2012, < http://strataconf.com/strata2012/public/content/about >

Andrew, M & Erik, B 2012, Big Data: The Management Revolution, Harvard Business Review October 2012, Boston, MA, USA

Dave, F 2012, The 3 I’s Of Big Data, Forbes, viewed 13 December 2012,
< http://www.forbes.com/sites/davefeinleib/2012/07/09/the-3-is-of-big-data/ >

Diya, S 2012, The 3Vs that define Big Data, Data Science Central, viewed 13 December 2012, < http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data >

Lorraine, F, Michele, O’C,  & Victoria, W 2012, Data, Bigger Outcomes, American Health Information Management Association, viewed 18 November 2012,
< http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049741.hcsp?dDocName=bok1_049741 >

Stefan, S 2012, The 3 V of BIG Data, Agile Commerce, viewed 13 December 2012,
< http://multichannel-retailing.com/2012/05/the-3-v-of-big-data/ >

What is big data? 2012, International Business Machines Corporation (IBM), viewed 18 November 2012, < http://www-01.ibm.com/software/data/bigdata/ >

Leave a Reply

Your email address will not be published. Required fields are marked *