Prince
Prince Author of Genesis, A technical consultant with experience in Information technology and services. Love exploring and developing new innovative and interesting stuffs. Knowledgeable in an assortment of programming languages, scripting languages, operating systems, and applications.

Big Data: What Does it Really Mean?

Big Data: What Does it Really Mean?

A Simple article to say, What does Big data mean?

Summary

Every day in the U.S. some 7 billion shares of stock or other securities are traded on various financial exchanges, and fully two-thirds of these trades are executed by computers using algorithms to trade with other computers, without human participation. Roughly 33,000 discrete trades take place every second on the New York Stock Exchange and of course they must (and do) take place in a particular order, one trade at a time, each trade separated from the next by just a few microseconds.

Financial transactions generate an enormous quantity of data to be stored, processed, and re-processed. But all human activity is now generating such data at an accelerating rate. From investing and retail spending, to Web browsing, social media, and mobile phones, it's estimated that 90% of all data in use today have been accumulated within just the last two years!

The term “big data” is often taken to stand for the enormous quantity of data, but in a classic 2001 Gartner report, analyst Doug Laney suggested that problems are created not just by the volume of data being generated, but by the increasing velocity at which it is being created (33,000 ordered trades per second, for instance), as well as by its proliferating variety (think of “unstructured” data such as text documents, videos and pictures, for instance).

If you have been as baffled as I have been when it comes to why “big data” is somehow qualitatively different than not-so-big data, then I highly recommend a brand new book on the subject by Viktor Mayer-Schonberger and Kenneth Cukier, entitled Big Data: A Revolution That Will Transform How We Live, Work, and Think. The authors’ definition of “big data” was worth the read, all by itself:

"Big data," they say (emphasis mine), "refers to things one can do at a large scale that cannot be done at a smaller one.”

And it really is that simple. Data now have sufficient volume, velocity, and variety that we can do things with data today that would have been impossible before.

For example, rather than having to settle for processing statistical samples of data, computers can now troll through complete, comprehensive data sets, which makes a big difference.

  • Xoom, a “big data” firm specializing in the analysis of international money transfers, detected a slight aberration in the volume of Discover Card transactions coming from New Jersey and, while each transaction looked fine on its own, this anomaly revealed criminal activity, which would have been impossible to detect with just a sample of the data.
  • AirSage analyzes the geo-locations of millions of mobile-phone subscribers to compile real-time traffic reports in more than a hundred US cities.
  • MarketPsych compiles real-time sentiment analysis indices based on tweets, portraying the changing amounts of optimism, gloom, fear, anger, and other emotions that tend to drive securities trading, in one-minute increments across 119 countries.

Now contrast the "big data" approaches mentioned above to how data have been traditionally compiled. The Consumer Price Index, for instance, is the most common measure of inflation in the US, published every month by the Bureau of Labor Statistics and relied on by banks and businesses, not to mention the Federal Reserve and any number of other governmental and international agencies. Calculating these figures costs $250 million a year, and involves hundreds of government staff making phone calls, visiting stores, and sending emails to track some 80,000 prices. Then, a few weeks after each monthly survey is fielded, the new CPI figure is released. The problem, as Big Data's authors point out, is that sometimes even "a few weeks can be a terribly long lag” (as demonstrated when the 2008 financial crisis hit).

A "big data" alternative approach to tracking inflation was hatched by two MIT economists, using software to crawl the Web and collect more than half a million prices of products sold in the U.S. every single day (that's roughly 200 times as much data as is being sampled each month by the BLS). The "PriceStats" tool created by Alberto Cavallo and Roberto Rigobon now produces more accurate CPI indicators on a real-time basis, and for pennies on the dollar. Is it useful? Well, their project detected a noticeable deflationary swing in prices in September 2008, immediately following the Lehman bankruptcy, while official CPI data didn’t report it until two months later, in November!

Having a universe of data also means that the figures don't always have to be flawlessly accurate, either. If there is a lot of data, we can tolerate some messiness - which is in stark contrast to the prevailing idea that governs how samples of data are drawn for analysis. If all you have is a sample, then every point in the sample really counts, and even slight errors can throw off your conclusions. So if it costs money to put your sample together, then you want to make sure the sample is accurate. But the pricing data that Cavallo and Rigobon retrieved by scouring the Web can be messy precisely because it's exhaustive, and the errors tend to cancel each other out.

One of the bedrock principles of doing business with big data is that the data will inevitably include errors and mistakes, but the radically increased volume, variety, and velocity of data will tend to overwhelm these errors. Randomness and universality are both important, but messiness can be tolerated.

There are some cautions that are appropriate when we consider big data’s promise. In addition to the privacy issues involved in such massive troves of instantly available data, we’re all going to have to become better at understanding the scientific method. There is an established discipline called “evidence-based medicine,” in which doctors are encouraged to use good data practices rather than their own hunches to diagnose and treat patients more effectively. All of us in business will need to start thinking more carefully about how we evaluate, relate to, and use evidence ourselves.

Big Data's authors suggest that business managers everywhere will soon be required to have at least a basic knowledge of how to use and relate to data, arguing that in the near future “Mathematics and statistics, perhaps with a sprinkle of programming and network science, will be as foundational to the modern workplace as numeracy was a century ago and literacy before that.”

And as per the book Extreme Trust, all this data is radically increasing the level of transparency in the world, which makes it ever more difficult and costly to keep secrets or tell lies. Big Data will indeed mean big changes – for governments, businesses, and all of us. The latest issue of Customer Strategist, the Peppers & Rogers Group journal, is substantially devoted to the issue of how Big Data will change business competition. And it will be a Big Change.

                             Hope this article could be very useful……………

comments powered by Disqus