What is big data?

How would you define Big Data? Big hype, Big BS? The best definition I heard thus far is from George Dyson, who recently stated at the Strata conference in London that the era of big data started “when the human cost of making the decision of throwing something away became higher than the machine cost of continuing to store it.” What I like about this definition is that it is a more abstract framing of the term and therefore has a broader validity. It implies a relation between two kinds of costs and thus does not depend on absolute values in terms of TBs. Classical definitions you can find on wikipedia talk about big data being “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” How will “on-hand” databases look like ten years from now and won’t they be capable of easily process what we today consider to be big?

There are also the three “Vs” that are suppose to be typical for Big Data: Volume, Variety and Velocity. We talked about volume being a moving target, dozens of tera bytes, peta bytes, whatever. There are sectors that have dealt with what today is considered to be big data long before the term big data has been coined. For example scientific data volumes have always been among the largest, just think of the vast amounts of data that are beign processed at CERN (one petabyte per second) and in the LHC Computing Grid (many petabytes per year).

 

Variety is something that has been managed in the past as well, before big data became such a hype. As a data miner you always look for new and different data sources to increase the quality of your models and the amount of variance the model can explain. When I worked at Otto we looked into integrating messy call-center data and weather data into models which would predict customer responses, i.e. sales, for more efficiently sending out catalogues. At xing we found a way to integrate the non-standardized tags with which a user can describe his profile (wants, haves), which definitely would fall under the data variety aspect (see paper).

 

Thus in my opinion the most noteable aspect of big data is velocity: the aim is to get away from daily batch processes updating your data, analysis and models once a day, to online streaming updates. A good example for such a use case are recommender systems, whose underlying models shouldn’t be updated only once a day but online to reflect the current trends in user behaviour. If users suddenly started buying product z with product a, you don’t want to wait until the next day to update your recommendations. You would lose sales by not recommending product Z to users who already looked at product A but not Z. Another example is the notorious prediction of click-through-rates (CTR) in online marketing environments. This is done to decided which ad to serve or to determine how much to spend for an ad. Here is another interesting use case of real time analytics:http://userguide.socialbro.com/post/16003931427/how-can-real-time-analytics-for-twitter-be-useful-for-yo. One of the more important technologies with regard to stream processing of data is Storm (http://storm-project.net/). The most promiment user of Storm is Twitter, which acquired the company that developed Storm, BackType, in 2011.

Kommentar schreiben

Kommentare: 10
  • #1

    Felix Sampson (Freitag, 03 Februar 2017 01:01)


    Wow, wonderful blog layout! How lengthy have you ever been running a blog for? you make running a blog look easy. The full look of your website is great, let alone the content material!

  • #2

    Delinda Boren (Samstag, 04 Februar 2017 16:39)


    My family all the time say that I am killing my time here at net, except I know I am getting experience daily by reading thes good posts.

  • #3

    Fallon Alcina (Sonntag, 05 Februar 2017 10:24)


    Today, I went to the beach with my children. I found a sea shell and gave it to my 4 year old daughter and said "You can hear the ocean if you put this to your ear." She placed the shell to her ear and screamed. There was a hermit crab inside and it pinched her ear. She never wants to go back! LoL I know this is entirely off topic but I had to tell someone!

  • #4

    Austin Mickley (Montag, 06 Februar 2017 19:22)


    Thank you for the good writeup. It in fact was a amusement account it. Look advanced to more added agreeable from you! However, how can we communicate?

  • #5

    Rico Dry (Mittwoch, 08 Februar 2017 18:39)


    My developer is trying to persuade me to move to .net from PHP. I have always disliked the idea because of the costs. But he's tryiong none the less. I've been using WordPress on numerous websites for about a year and am anxious about switching to another platform. I have heard good things about blogengine.net. Is there a way I can transfer all my wordpress posts into it? Any kind of help would be really appreciated!

  • #6

    Tammi Scheuermann (Donnerstag, 09 Februar 2017 01:55)


    I think that is among the such a lot significant info for me. And i'm glad reading your article. But wanna observation on few general things, The web site taste is perfect, the articles is really great : D. Excellent task, cheers

  • #7

    Nina Anastasio (Donnerstag, 09 Februar 2017 13:07)


    Inspiring quest there. What occurred after? Good luck!

  • #8

    Kelsie Hermanson (Donnerstag, 09 Februar 2017 17:34)


    Today, I went to the beachfront with my children. I found a sea shell and gave it to my 4 year old daughter and said "You can hear the ocean if you put this to your ear." She put the shell to her ear and screamed. There was a hermit crab inside and it pinched her ear. She never wants to go back! LoL I know this is completely off topic but I had to tell someone!

  • #9

    Wendie Flore (Donnerstag, 09 Februar 2017 21:45)


    Hi, its pleasant article regarding media print, we all know media is a wonderful source of information.

  • #10

    Nerissa Seely (Freitag, 10 Februar 2017 01:18)


    Excellent goods from you, man. I have bear in mind your stuff prior to and you are just extremely magnificent. I actually like what you've acquired here, certainly like what you're saying and the way by which you say it. You're making it entertaining and you still take care of to keep it wise. I can not wait to learn much more from you. This is actually a tremendous web site.