What is big data?

“… everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”. ~ Dan Ariely

What is big data? Why do we use term big data? Why There is a high demand for big data developer/scientist? In this post I am going to review some of most important concepts of it and define some words.

As O’Reilly said: “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it“.

According to IBM: ” Big data is the data characterized by 3 attributes: volume, variety and velocity“. These are most important words, need to be defined.

Volume is referring to the size of data. In big data, data is so large and we can’t handle  it like before. Small data is when it fit in RAM. Big Data is when is crash because is not fit in RAM. So we need more storage like Hard Disks or etc. to handle this problem.

In the term of big data, Variety means data heterogeneity. Data come from different sources and have are structured like old table based data, semi-structured like key-value or unstructured like text. In any big data system we have to prepare appropriate systems to work with different data types.

Velocity is indicating that data generation rate is so high. We need proper tools to get data, preprocess it and either store it or do the real time processing.

There are less known Vs, Vacillation, Variability, Value. Which is used in some context. They mostly define the way you have to analyze data.

In this short article, I talked about big data and general aspects of it. I went through definition of common 3Vs of big data, which are Volume, Variety and Velocity, and also less common 3Vs, Vacillation, Variability, Value. In future I write more about importance of big data, common proper usage, and analyzing big data.