Sunday, April 21, 2013

What is Big Data?

A lot of people are talking about Big Data these days, but what is it and why is it important. Throughout the day, I talk to a fair number of customers and get the impression that many really don’t understand what big data is or what it can do and why it is important. So, I wanted to use a post and try to clear that up.

In short, big data is just what it sounds like; it is a lot of data stored in a database and ultimately used to create reports that tell a story. Because there is so much data, the story it tells is generally pretty accurate. The bigger the data, the more accurate it is likely to become.

With that basic definition done, lets back up and define a couple of terms. First, what is data? Data are pieces of information. For example, data would include names, purchase history, medical information, internet searches, etc… really any information.

The next term to define is database. A database is an organized ‘holding bin’ for information. For example, you can think of your iPod as a database of music or your phone’s contact list as a database your friend’s contact information. The important thing about a database is that it is well organized. Each data element is in its own field so a person can report on each element or report on various combinations of the elements. 

As a visual person, I find the best way to understand databases is by thinking of Microsoft Excel.  As you probably know, Excel has rows and column that can be sorted, searched, etc. (among other things). You can think of a database as a giant Excel sheet where each column has a title in the top row and each field down that column contains a distinct piece of information. We’ll use a phone’s contact list as an example (yes, this is a sample of my real contact list, haha):

Notice how, in the example, each individual piece of information is in its own box. The column headers tell us what is in the fields below. Now, you might say, ‘my contact list doesn't look like that’ and you’d be right. When information is displayed to the user, the information is pulled from these organized tables and displayed in a prettier way. That is called a user interface.

Now, with that defined let’s go back to big data. You see in my example above, I have only 6 rows containing 9 columns of specific information types (first name, last name, etc). With big data, those rows and columns would be almost innumerable but still well organized. With that much information, you can generate massive reports that tell a story. For example, the story the contact list above tells us that all of my contacts live in Nowhereville, CA. That is a story. If the database were bigger, we might be able to see the percentage of my contacts that live in Nowhereville as compared to other places where I have contacts. With enough information, one may be able to predict the probability that the next contact I add to my list will or will note live in Nowhereville, CA.

With big data there will be more and different columns (and definitely more rows). Those might include things purchasing history, terms searched on Google, medical things… you name it. With enough data, other, bigger stories emerge. For example, we might see that people in Nowhereville, CA all buy a particular type of widget or people in Nowhereville tend to get a particular disease.

This quantity of data can help to find correlations that no one knew existed. In medicine, for example, we may find that a particular type of person, with particular habits, with a particular disease, and taking a particular medication, have a higher rate of ear infections than people who do not meet the same criteria. How is that valuable? Well, if we know what specific factors contribute to a disease or condition, for example, we can work to proactively prevent it thus improving length and quality of life.

Big data is not only useful in medicine, it can also be used in business and marketing to identify buying behaviors and develop predictions that help marketers to “speak” the right language to the right people to improve sales. In law enforcement, trends can be identified that may predict crime before it happens. The possibilities are as enormous as the data.

The next logical question is where the data comes from. That’s simple – you are on it right now. The data comes from computers. They are everywhere and can record everything. For example, when you buy something at the store, there is a computer at work recording the details. The credit card company gathers data about the purchase such as the product, the store and location, etc. What if you pay cash? Haven’t you ever had a store clerk ask for your zip code, phone number, or email address? Do you belong to any store’s discount or rewards club? It’s all data. What about driving? Your car more than likely has a computer in it. Do you have E-Z Pass? Do you use GPS? Data. Internet searches – data. Social media posts – data. Cell phone usage – data…. Data is recorded EVERYWHERE. According to IBM, we create 2.5 quintillion bytes of data every day.  That means 2,500,000,000,000,000,000 bytes or units of information. Put it all together, and voila, you have seriously BIG data.

The challenge for big data providers, obviously, is getting all that data in to one centralized database. First, the data is so huge that normal computers cannot accommodate it so massive numbers of huge and powerful servers are used. Also, it is spread all over the place. Still, big data companies are making progress. Who are these providers? Well, while many others may exist, the big companies like IBM, Oracle, and SAP are the first that come to mind.

Now that a simplified foundation is laid, if you want to learn more about big data, I recommend visiting IBM’s website. They have a great definition and some examples of use cases that may help to make it all clear. 

Also, here are some articles that show big data at work in a variety of industries: 


Related Posts Plugin for WordPress, Blogger...