Demo of SAP Predictive Analytics with SAP HANA



hello i'm philip muggle stone and i'd like to demonstrate how you can take advantage of predictive analytics on big data with ACP hana in this example we're going to look at custom x segmentation in real time first let's have a quick look at the data we've got for the customers we can see for each customer we have information on their lifetime spend their more recent spend their income and also an index of their loyalty what we'd like to do is to group these customers together into a small number of of groups or segments I'm using clustering with predictive analytics we can see that the cluster for the moment is empty now we'll just try a first example on 150 customers just to get a good understanding of the data we're working with so we'll perform the k-means clustering for this we're using the k-means algorithm which is delivered as standard in memory with a CP hana predictive analysis library we can see that the application creators have three clusters and we can see how many customers have been put into each of the different clusters we've created total of 150 customers so very simple and straightforward to do how how about though now we look at a larger volume of data a bull database of 1 million customers I've just over a million customers how long will the clustering take to do that well let's launch the task and in fact in just two or three seconds we're able to perform clustering on over 1 million different customers again using those 4 different variables and we can see here how many customers have been put into each of those 3 clusters what we can also see for our clustering effort is the silhouette information in this case where we've chosen to do three segments or clusters we can see the silhouette is about 0.5 value of 0 typically shows that the the confidence in clusters we've created is not great whereas the closer the silhouette is to want the more confident we can be that we've got a very effective asset of clustering that's been performed now if you are a developer wondering how to put this kind of functionality into an application it's very straightforward the SP HANA platform provides a couple of different ways of accessing predictive analysis capabilities and then you can access those from all sorts of front ends whether that's a zippy predictive analysis tool or you can do that through your own custom or in fact in our own SCP we do have our own applications that take advantage of these capabilities as well so the broad way of making that possible in terms of the code required to do what we just did in fact it's very very straightforward in this case we were using the k-means capability from the predictive analysis library we just need to do a very simple call through sequel script to make that possible in our custom application and to get the silhouette value the reducer validate k-means call as well just afterwards now we've seen an example for the predictive analysis library which is running everything in memory inside the handler database but we don't have all of the capabilities we might necessarily want available in the bridge analysis library today so we also offer the ability to form analysis using R and that working hand in hand with SCP HANA on the server so for this example we can might just run another clustering example on on the data this time we'll run that with R and we'll choose to create five clusters this time and again in one or two seconds we can see the process is done we're able to group those customers this time using the algorithm available with R and and here the only difference we see is that the actual code we have created we actually embed a little bit of our script into our our application on the server side so we've seen how we can look at the beauty of an ASUS library we can use R but from from a user perspective what's very important is to work out how many customers is optimal for us are we creating three or five what's the right number of customers the one thing that we can do by taking the power of Hana being able to do this in real time in memory is we could actually recommend and in fact have the application run through all of the clusters we've had from 2 3 4 5 up to 10 and actually get as the silhouette values having done the clustering and looking for the highest or or best fit silhouette that we can find to actually allow us to to know so if we saw descending we can see that the best silhouette actually related to where K or the number of clusters we create is 2 so in fact probably the most optimal number of clusters we would create on this data would be 2 so we can use the power of Hana to allow us to to make that possible and to do that extremely quickly and do that in real time now a second aspect to this example application is also we're able to perform time series analysis worked for exponential smoothing one of the many algorithms we have available in the predictive analysis library right now so we've got some similar information here we've got actually a million sales transactions the difference here is we've got sales over time transactions over the last couple of years in fact for a number of electronic items and what we want to do is to is to try to fit understand better what the patterns in terms of how the how the information has been growing over time and also use that to predict what the future forecasts and sales might be in the future so we use the predictive analysis capabilities of Hana to make that happen so we can run a first example we're going to get 1 million sales transactions so a fair amount of data let's perform a first simple time series analysis and what we can see we've got a blue bar for each of the actual sales information so Hana has actually aggregated over a million sales transactions in real time for the entire data set providers the results then at a monthly level and we can see we've got the actual amounts and then in the green line is showing the fit which is basically their the predicted amount based on on single exponential smoothing now actually what's maybe more interesting for us to do here is to just do a quick subset we've actually got about 5 different product categories so what we might want to do here is to actually do a quick analysis of cameras so this time I know it is not only aggregated the data two cameras in real time for us but it's also filtered the data just to have that subset of cameras available so we see the numbers a little bit lower than they were before now if we look at the fit we can see that this green line it seems to lag a little bit behind it since we almost be one one period behind that that's actually quite typical for single exponential smoothing typically it involves a lag factor however there are other more more complex exponential smoothing options available to us so what we might try in this case to avoid that lag factor is to try double exponential smoothing so if we try this example again in real time you will aggregate the data actually perform as a fit this time the line looks much much better we've actually been able to with double exponential smoothing to actually forecast that out for a few months in the future as well now this fit looks very good is it seems quite satisfactory it eats particularly interesting here because this data does trend we're trending very very gently upwards with within a bigger sales the in more recent time so that's that's showing us a lot of a lot of information about how well the predictive analysis capabilities have been able to fit to this data now what we might also want to do is look at a different product category for example in my one cup of televisions so if we then rerun and again with Hana we can do interactively very quickly even on a very large volume of data we can see here that we have a fit well it's it's kind of not as good as it was a moment to go it seems here at the data if we look at this this monthly monthly summaries we seem to every second month in the quarter it seems to be a little bit load we seem to maybe have some seasonality in terms of how the data is evolving across time so a double exponential smoothing may not be the best fit for that in fact another method that's available is called triple exponential smoothing and this takes advantage as she takes into account seasonality whether that's quarterly or annually or weekly or daily and it also takes into account be the trending that double exponential smoothing was doing as well so we can then in this case try a triple exponential smoothing example to see if that fits a little bit better and yes we can see that the green line outfit is looking a little bit better although it doesn't seem to be totally fitting in the right place all the time so with Hannah we've got a lot of different options to fine-tune how we actually make that fit happen so we can see here we've actually got for example a seasonal smoothing factor and between 1 & 0 0 & 1 and if we maybe move that factor up a little bit to take more account of recent data and a little bit less taking into account the more distant data from two years ago we maybe get a better fit or we can also do at this point is maybe choose to pull the forecast forward a little bit more as well we run our time series analysis in real-time and now we can see that that fits much much better it's a much much more close fit and we four actually now eleven months or forecast in the future based on the seasonal model that we've put together so what we can see with Hannah is that both the business analysts and data scientists can take advantage of predictive analytics and do that on big data scenarios with the in-memory data processing capabilities of Hana but also how developers can consider custom applications this being an example custom application that inherently embeds and integrates these capabilities thank you

One Comment

  1. Jens Wachtel said:

    Hello saptechnology. I don't think it's possible but is the source code and date of the demo available somewhere?

    June 29, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *