It was my first time at a “predictvie analytics world” conference and I was quite curious how it would differ from other data analytics type of conferences. I was especially interested how it would compare to the Strata conference in London which took place beginning of October. So let me jump right into the talks that I found noteworthy:
The keynote by Eric Siegel, who’s also the program chair of this conference, was about uplift modeling and how this is a paradigm-shift to classical response and churn modeling. His definition of uplift modeling is “analytically modeling to predict the influence on a customer’s buying behavior that results from choosing one marketing treatment over another.” So instead of trying to model the behavior the aim is to model marketing influence on behvaior. The key idea is that, contrary to ordinary response modeling, it is taken into account that a marketing campaign can have a negative effect on the recipients (the so called “Do-Not-Disturbs”) or no effect at all (“Sure-Things” and “Lost-Causes”). Not contacting these segments increases campaign ROI. The modeling of such incremental gain models is more complex than e.g. normal logistic regression- or decision-tree modeling since you have two different data sources. If you are interested in the details you should checkout this paper from Radcliffe and Surry: http://stochasticsolutions.com/pdf/sig-based-up-trees.pdf . A prerequisite for doing uplift modeling is having two datasets, one with users who underwent a specific treatment (campaign A) and a control group who didn’t (or campaign B). A single user can only be in either one of those sets. On the basis of these two sets one model is developed. For me I haven’t figured out if it’s actually worth going through the effort of developing my own uplift modeling algorithm in R according to the paper above. Apparently this more complex modeling approach only makes sense in very specific settings.
One of the reasons I wanted to attend the PAW conference was Suresh Pillai’s talk on attribution modeling at ebay. Attribution modeling and User Journey analysis are topics I am quite interested in. Basically attribution modeling is the attempt to attribute the adequate portion of success to a specific online marketing campaign. For example, if a user clicked on a banner ad for ebay, afterwards searched for a specific product on google and clicked on an ebay PPC ad and finally bids on auction, how much of this “conversion” (the actual bid) is supposed to be attributed to the banner click and how much to the google ad? Currently the most common attribution “model” applied is the last click model, i.e. in our case only the google ppc ad would be attributed this conversion. This is obviously wrong for so many reasons (as well as other arbitrary “toy” models such as uniform-, first click- or u-curve attribution etc.). Ebay’s approach is to look at the baseline of a users activity before having contact with a specific marketing campaign and to analyse the incremental gain after the contact. What they found out is that is not as important to look at what kind of channel a user has contact in (e.g. search ads vs. display banner) as to look at the specific user behavior before the online marketing contact (for example by RF-Segments (recency&frequency in terms of activity etc.).
Michael Sinn presented how Otto (europe's biggest ecommerce retailer) is using artificial neural networks to forecast demand for specific products. They implemented an algorithm from BlueYonder (neurobase) which apparently is based on an algorithm also used at CERN and outperforms every other commercial data mining tool. Before they had a deviation from plan of >+-20% for 63% of their articles and after only for 11% of their articles. Implementation time was 3 years, cost savings double digit milions. The models incorporate 135GB of historic data, 300 million datasets per week with 200 predictors and generates 1 billion single forecasts per year.
One the second day Fred Türling’s talk stood out. He talked about how United Internet Media (one of Germany’s top online publisher, reach of 30 million users) is integrating the information they have about user’s contacts with UIM’s online marketing campaigns on the one hand, with CRM data from a client’s datawarehouse on the other. The user journey data and online marketing data is mainly based on cookie data and seen as “Interest”-Data (e.g. interest in cars/sports etc.). The CRM Data from a clients datawarehouse describes the “Intent”-Data (what products did a user look at). Through the implementation of tags/pixels on a client’s website up to 5 attributes from a clients datawarehouse are merged with the cookie based online marketing user journey. This data is then used to develop different kinds of models, e.g. to calculate a similarity to existing high CLV-customers. One of the most apparent applications is to identify users who aren’t yet customers of a client and are similar to valuable customers. The analytics stack at UIM is amongst others based on hadoop, exasol, SPSS, Weka, SAS and UNICA. Interesting stuff, integrating CRM Data and Online Marketing Data will definitely be the future in Online Marketing.
What really blew my mind was Dr. Manfred Beleut’s (Pareq AG) presentation on how predictive analytics and modeling techniques are being applied on predicting a cancer-patient’s response to a specific therapy. According to Dr. Beleut the sad truth is that therapies for cancer-patients haven’t really increased in efficiency in the last couple of decades. Through more advanced technology the diagnosis of cancer happens a lot earlier compared to the past, but subsequent therapies haven’t increased life expectancy significantly. What basically is new about Dr. Beleut’s approach, compared to the classical “tumor suppressor onco” mehod, is to take into account the reciprocal influences of systems of gene mutations. Beleut's procedure classifies a patient as being in either cancer state a, b,c,d or e. What sounded revolutionary is that this classification is independent of the the size of the tumor and correlates with the survival rate across all major types of cancer. As a “business” analyst I took from this to always take into account the systematic nature of the “systems” (duh) we try to model and to don’t mix up correlation with causality.
Compared to the Strata conference in London last September the PAW conference was a lot more business focused. Whereas the Strata is more of a data science and big data technology conference, the PAW was more on specific cases of data mining applications. At the end of the PAW’s first day there was a panel discussion on what a data scientist is. It was agreed upon that a data scientist is someone who is proficient in IT/Hacking, Business and Statistics, but not necessarily an absolute expert in any one of those areas. What is required for a data team to work are experts for each of those topics since the needed knowledge e.g. wrt specific machine learning algorithms such as SVM or SVD, or the IT knowledge to handle big data and make those algorithms scale in sum is too much for one single person to handle. This is quite contrary to what is being suggested by others (http://www.forbes.com/sites/danwoods/2012/03/08/hilary-mason-what-is-a-data-scientist/). In my experience persons who are experts in all three areas are very seldom, even ones with expert status in two of those three areas. At Strata there were a couple of people I would consider to be a “data scientist”, e.g Ben Fields (http://strataconf.com/strataeu/public/schedule/speaker/139626) or Klaas Bosteels (http://strataconf.com/strataeu/public/schedule/speaker/139216). If you need to build a data (science) team you shouldn’t wait for the one, but rather hire experts in in each of those areas and have someone who is good in project management and knows enough about stats and hacking to efficiently develop the team.
Kommentar schreiben
Emelina Willmore (Mittwoch, 01 Februar 2017 18:55)
Hello! I could have sworn I've been to this blog before but after checking through some of the post I realized it's new to me. Nonetheless, I'm definitely glad I found it and I'll be book-marking and checking back often!
Clint Isaac (Mittwoch, 01 Februar 2017 22:05)
Hi there, I found your website by the use of Google while searching for a comparable matter, your website got here up, it seems to be good. I have bookmarked it in my google bookmarks.
Hello there, simply became aware of your blog through Google, and found that it's really informative. I am going to be careful for brussels. I'll be grateful if you happen to proceed this in future. Many people can be benefited from your writing. Cheers!
Annette Sharrow (Donnerstag, 02 Februar 2017 20:53)
Excellent, what a webpage it is! This webpage provides useful facts to us, keep it up.
Retta Wierenga (Freitag, 03 Februar 2017 15:33)
Hi there, this weekend is pleasant for me, since this point in time i am reading this fantastic informative piece of writing here at my home.
Makeda Foret (Samstag, 04 Februar 2017 10:53)
Excellent post. Keep writing such kind of information on your blog. Im really impressed by it.
Hello there, You've done an excellent job. I'll certainly digg it and for my part recommend to my friends. I am sure they will be benefited from this website.
Erminia Correia (Dienstag, 07 Februar 2017 03:22)
Hello, i believe that i saw you visited my weblog thus i got here to go back the prefer?.I am attempting to find things to improve my website!I suppose its adequate to make use of a few of your ideas!!
Tam Struck (Dienstag, 07 Februar 2017 10:10)
This post provides clear idea in support of the new users of blogging, that truly how to do running a blog.
Jesenia Thorp (Donnerstag, 09 Februar 2017 12:20)
Hi there, its nice paragraph regarding media print, we all be aware of media is a wonderful source of data.
Ulysses Calder (Freitag, 10 Februar 2017 05:22)
Good day! Do you know if they make any plugins to safeguard against hackers? I'm kinda paranoid about losing everything I've worked hard on. Any tips?