A lot of people are talking
about Big Data these days, but what is it and why is it important. Throughout
the day, I talk to a fair number of customers and get the impression that many
really don’t understand what big data is or what it can do and why it is
important. So, I wanted to use a post and try to clear that up.
The challenge for big data providers, obviously, is getting all that data in to one centralized database. First, the data is so huge that normal computers cannot accommodate it so massive numbers of huge and powerful servers are used. Also, it is spread all over the place. Still, big data companies are making progress. Who are these providers? Well, while many others may exist, the big companies like IBM, Oracle, and SAP are the first that come to mind.
In short, big data is just
what it sounds like; it is a lot of data stored in a database and ultimately
used to create reports that tell a story. Because there is so much data, the
story it tells is generally pretty accurate. The bigger the data, the more
accurate it is likely to become.
With that basic definition
done, lets back up and define a couple of terms. First, what is data? Data are pieces
of information. For example, data would include names, purchase history, medical
information, internet searches, etc… really any information.
The next term to define is
database. A database is an organized ‘holding bin’ for information. For
example, you can think of your iPod as a database of music or your phone’s
contact list as a database your friend’s contact information. The important
thing about a database is that it is well organized. Each data element is in its
own field so a person can report on each element or report on various
combinations of the elements.
As a visual person, I find
the best way to understand databases is by thinking of Microsoft Excel. As you probably know, Excel has rows and column
that can be sorted, searched, etc. (among other things). You can think of a
database as a giant Excel sheet where each column has a title in the top row
and each field down that column contains a distinct piece of information. We’ll
use a phone’s contact list as an example (yes, this is a sample of my real
contact list, haha):
Notice how, in the example,
each individual piece of information is in its own box. The column headers tell
us what is in the fields below. Now, you might say, ‘my contact list doesn't look
like that’ and you’d be right. When information is displayed to the user, the
information is pulled from these organized tables and displayed in a prettier
way. That is called a user interface.
Now, with that defined let’s
go back to big data. You see in my example above, I have only 6 rows containing
9 columns of specific information types (first name, last name, etc). With big
data, those rows and columns would be almost innumerable but still well
organized. With that much information, you can generate massive reports that
tell a story. For example, the story the contact list above tells us that all
of my contacts live in Nowhereville, CA. That is a story. If the database were
bigger, we might be able to see the percentage of my contacts that live in
Nowhereville as compared to other places where I have contacts. With enough information,
one may be able to predict the probability that the next contact I add to my
list will or will note live in Nowhereville, CA.
With big data there will be more
and different columns (and definitely more rows). Those might include things purchasing
history, terms searched on Google, medical things… you name it. With enough
data, other, bigger stories emerge. For example, we might see that people in
Nowhereville, CA all buy a particular type of widget or people in Nowhereville
tend to get a particular disease.
This quantity of data can help
to find correlations that no one knew existed. In medicine, for example, we may
find that a particular type of person, with particular habits, with a
particular disease, and taking a particular medication, have a higher rate of ear
infections than people who do not meet the same criteria. How is that valuable?
Well, if we know what specific factors contribute to a disease or condition,
for example, we can work to proactively prevent it thus improving length and quality
of life.
Big data is not only useful
in medicine, it can also be used in business and marketing to identify buying
behaviors and develop predictions that help marketers to “speak” the right
language to the right people to improve sales. In law enforcement, trends can
be identified that may predict crime before it happens. The possibilities are as
enormous as the data.
The next logical question is
where the data comes from. That’s simple – you are on it right now. The data
comes from computers. They are everywhere and can record everything. For
example, when you buy something at the store, there is a computer at work
recording the details. The credit card company gathers data about the purchase
such as the product, the store and location, etc. What if you pay cash? Haven’t
you ever had a store clerk ask for your zip code, phone number, or email
address? Do you belong to any store’s discount or rewards club? It’s all data.
What about driving? Your car more than likely has a computer in it. Do you have
E-Z Pass? Do you use GPS? Data. Internet searches – data. Social media posts –
data. Cell phone usage – data…. Data is recorded EVERYWHERE. According to IBM, we create 2.5 quintillion bytes of data every day. That means 2,500,000,000,000,000,000 bytes or
units of information. Put it
all together, and voila, you have seriously BIG data.
The challenge for big data providers, obviously, is getting all that data in to one centralized database. First, the data is so huge that normal computers cannot accommodate it so massive numbers of huge and powerful servers are used. Also, it is spread all over the place. Still, big data companies are making progress. Who are these providers? Well, while many others may exist, the big companies like IBM, Oracle, and SAP are the first that come to mind.
Now that a simplified
foundation is laid, if you want to learn more about big data, I recommend visiting
IBM’s website. They have a great definition and some examples of use cases that
may help to make it all clear.
Also, here are some articles that show big data at work in a variety of industries: