Big Data: It’s not a technology, it’s a collection of large amounts of data points generating from various sources at a very high speed. And, all this generates a lot of valuable information which can be utilized for the best purposes in every field.
“A revolution that will transform how we live, work and think.”
5v’s that defines Big Data
- Volume: It defines the data points which are being generated from various sources in huge volumes and are in the very huge form i.e., of Exabyte and Zettabytes. If we talk about past couple of decades, various big firms’ collected & stored data related to employees only.
But now, these big firms, apart from collecting all the data of their employees, are also collecting the details of their clients, partners, products & services in which they’re dealing upon, and all this leads to the extension of more and more data. If we calculate the amount of data, which is being generated from the beginning of the time until 2003, is equivalent to data which is currently being generated in every 2 days. So, that’s volume.
2.Variety: There are mainly three types of data we consider i.e., structured data, unstructured data and semi-structured data. Out of all these, we’re very much familiar with structured data which is in the form of pure text (person’s name) or in numeric (their age) which is stored in external databases. But, the rest of the two types are new in big data.
Unstructured data is in the form of PDF files, video files, audio files, images, tweets, likes, comments etc. Semi-structured data is in the form of XML files, JSON files, emails, JavaScript files, sever log files, sensor data, etc. These are the varieties of data which we’re generating from various sources like mobile devices, satellites, social media networks, IT & Non-IT organizations, etc.
3.Velocity: If we’re dealing with huge volume of different types of data, generated from various sources, then the data has to be processed fast which we call Analysis of streaming data. In other words, big data velocity deals with the speed at which data travels from various sources like machines, business processes, networks, mobile devices, social media sites, etc. And, the flow of data from these sources is gigantic and constant, which needs to be stored and processed quickly, and this is not possible with traditional data processing applications.
4.Veracity: The data points which have been collected & stored from various sources, in different forms, often deals with inaccuracy. Under this we’ve to deal with poor quality of data, also in huge volumes (say for example: Twitter posts with hash tags, typos, abbreviations and colloquial speech) which is not precise and uncertain. But, big data and analytics technology allows us to work with these types of data.
5.Value: Whether the data is big or little, no matter generated from anywhere in whatever format, should have some value – means we can properly utilize the data at its right cause for its validness. The significance, worth, or functionality of the data to those consuming it is presumably the most pertinent to various firms or organizations. As, we’re aware that data in itself has no importance or utility, but still we need valuable data to get the information.
On the ending note, all these ‘V’s of Big Data’ are discussed in any Big Data Hadoop training. Hope you find it interesting!





