Introduction to BIG DATA: Examples, Types & Characteristics

Big Data! Yes, do you really know what exactly it is, and its influence to the world today?.

In order to understand the term 'Big Data', we first need to know what 'Data' is. Oxford dictionary defines 'data' as - "The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. " Now imagine what ‘Big Data’ is. Big Data is a term used for a collection of data sets that are large and complex, which are difficult to store and process using available database management tools or traditional data processing applications.

Examples of Big Data:

The following are some of the examples of 'Big Data'- The New York Stock Exchange generates about one terabyte of new trade data per day. Amazon handles about 16 million user data per day to recommend products. Walmart handles more than 1 million customer transactions every hour. Facebook stores, accesses, and analyzes 35+ Petabytes of user-generated data. Single Jet engine can generate 15+terabytes of data in 30 minutes of a flight time. With thousands of flights per day, generation of data reaches up to many Petabytes. Over 260 million of tweets are created daily. The list goes on and on.

Characteristics of Big Data:

1. Volume: It refers to the ‘amount of data’, which is growing day by day at a very fast pace. The size of data generated by humans and machines are massive.

2. Variety: It refers to the heterogeneous sources and nature of data; structured, semi-structured or unstructured. Earlier we used to get data from spreadsheets and databases, but now the data are coming in the form of images, audios, videos, PDFs etc. This variety of unstructured data poses certain issues for storage, mining and analyzing data.

3. Velocity: The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet demands, determines real potential in the data. It deals with the speed at which data flows in from sources like business processes, social media sites, sensors, Mobile devices, application logs, networks etc. The flow of data is massive and continuous.

4. Veracity: Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and incompleteness. Data available can sometimes get messy and maybe difficult to trust. The volume is often the reason behind the lack of quality and accuracy in the data.

5. Value: It is fine to have access to big data but unless we can turn it into value, it is useless.The questions are; by turning it into value, is it adding to the benefits of the organizations who are analyzing big data?. Is the organization working on Big Data achieving high ROI (Return on Investment)? Unless, it adds to their profits, it is useless wasting money and resourses.

Categories of Big Data:

Big data could be of three forms:

1. Structured: The data that can be stored and processed in a fixed format is called as Structured Data. Data stored in a relational database management system (RDBMS) is one example of ‘structured’ data. It is easy to process structured data as it has a fixed schema. Structured Query Language (SQL) is often used to manage such kind of Data.

2. Semi-Structured: This is a type of data which does not have a formal structure of a data model, i.e. a table definition in a relational DBMS, but nevertheless it has some organizational properties like tags and other markers to separate semantic elements that makes it easier to analyze. XML files or JSON documents are examples of semi-structured data.

3. Unstructured: The data which have unknown form and cannot be stored in RDBMS and cannot be analyzed unless it is transformed into a structured format is called as unstructured data. Text Files and multimedia contents like images, audios, videos are example of unstructured data. The unstructured data is growing quicker than others, experts say that 80 percent of the data in an organization are unstructured.

Comments

  1. I am appreciative of this blog's ability to provide information on such an important subject. I discovered other segments here, and I'm excited to put these new instructions to use. Indonesia import data

    ReplyDelete

Post a Comment

Popular posts from this blog

AWS: Benefits of using Amazon S3

Google To Acquire Looker For $2.6 Billion

Python - GUI - Tkinter(Bar & Pie Chart)