Amplify Predictive Analytics with Data Visualization



hi everyone and thank you for joining us for today's webinar amplify predictive analytics with data visualization I'm Hayley modify with rapid miner and I'll be your moderator for today's session I'm joined today by founder and CTO rapid miner dr. Ingram years well welcome Ingo hey good morning everybody thanks Hayley for the introduction yeah today it's really a topic I am very interested in this is I think one of the three key problems we as data scientists often encounter the other two are really about ease of use and the skills gap around data science we always of course have data quality promise but the third problem really is about how can we actually get the greatness we are creating all the goodness of the models into the end of more people how can we actually make sure that all the good patterns we find are also exploited and used for yeah improving our business so this is really what this webinar today is about the implication of all our results with the help of data visualization before we talk about how long we can do this I actually would like to motivate the prominent a little bit better and in order to do so I would like to introduce a new character to you the data scientists and his name is Joe all right so let's have a look into the story of Joe Joe is a data scientist so we will see what Joe is doing every single day but erm I hope you can actually relate to this a little bit at least so many of the people here in the webinar today will probably see similar problems like Joe here all right so Joe is sitting like all we all do in front of his computer working with prongs like Redford minor and sift through all them like huge amount of data Joe has accessible to him so he can really look for new patterns tried to optimize us to optimize um the business by figuring out okay how can we really do the right things that is those daily job and he's doing this every single day of course not obviously is finding good stuff but hey let's say six months later Joe finally found but Jean Perrin in the beta let's say you have some churn problem and you're looking for our other people were shifting over to competitor for example or stop your contract so you found some pattern you can believe you can actually exploit this pattern well in those worlds this is really a pot of gold so this is so that's not a potato at the pot of gold I just want to make sure that people not think my drawing skills are bad so this pot of gold is important for Joe and not just for Joe also for his fellow data scientist colleagues frakkin Peter so Fred and Peter are totally excited they congratulate Joe he gets the first prize as the best data scientist ever the three of them are celebrating the whole night everybody's happy but about a week later something happens a week later on job suddenly became very sad so why is that and that's a typical pattern I as a data scientist see and on the other data scientists you first totally excited about the results but then little later you would like to take the results and transform your business but you figure out actually nobody else really cares it's only Fred and Peter your fellow data scientist colleagues who really liked what you did but everybody else not so much so why is that in many cases Joe's models are not used or not even recognized or the goodness is not recognized because those predictive models they tend to be really hard to understand all those formula or the mathematics strange patterns which really take take some time to – yeah to understand fully that's why also unfortunately poor Joe Fred and Peter often sit some in the basement they're not really well other people don't do not really recognize the great value they're creating because they simply do not understand the great value and this disconnect of course is a problem but Joe thought about ok what can I do to amplify my voice what can I do actually know to take this great pot of gold I found and give this into the hand of more people so that actually our business can improve ok I hope you can relate to the small story of Joe the data scientist because I certainly can I'm in the street for 15 years now and I've been running to this problem over and over and over again I found a great more with the potential to change the business but unfortunately I didn't really manage very valves to get this model into production use well often ideas but not always so how can I do this in a way that actually this works much better how can I really well improve the number of times or increase the number of times my models are actually used and in general I would like to present a framework to you in this framework recall the operationalization of models because think about this if you created a predictive model many people believe well yes this is about generating some insights well it is or it's about just predicting what's going to happen well yes it is as well but if you're really honest to yourself that was really matter that much that you just especially that yes just you Joe know what's going to happen and if you're honest not that much because if I just know what happens but I'm not doing anything about this then nothing really changes so the only thing which really matters to your organization is that you do some actions that you do something today so that you will get to the best outcome and the best is the result tomorrow and that's exactly what we call operationalization of models you take the model yes you do all the scoring create all the predictions but actually then there's something else which needs to happen and this is something else is what we call the operations ation often it's about turning those predictive insights you created into actions and then performing those actions so that is the whole vision actually here on predictive analytics if you if you think about this how can analytics then help us to find the right page the right business section the right business section in the sense of getting to a better output or outcome tomorrow and I would like to motivate this in general for those of you who haven't been there yet who are new to predictive analytics but even for the experience people just I don't like to help you to better understand this whole framework around predictive analytics a little bit better before we see how we can connect this to data visualization products for example I click to show you one particular way of operationalization but let's spend a minute or two on the framework first okay in order to do this and to in order to motivate this let's have a look at a very very simple example I think this is the most simple example I could think about and it's about the weather forecast if you know that it's going to rain tomorrow should you bring your umbrella yes or no think about this for a second the immediate answer well you can't give me the answer right now but if you think about this since you would say like what write it down or write in the chat over ever probably mind me people will just say like well yeah sure go to rain should I bring my umbrella I on the other hand say well it depends and what does it depend on well first of all I know I have the better prediction the vela forecast is going to rain but they might be more they might be more sorry I'm going to call the worst point in time ever there might be more information visit available to you so you know it's going to rain but maybe develop forecast also tells you well it's windy or it's going to be windy so in that case bringing an umbrella is not a good idea because if you for example commute by walking then actually umbrella is going to be blown away so that's not good so if it's windy and rainy so maybe you should better take the car then for this from your commute to work well but if you take the car and it's windy and rainy many other people will do the same so there's more traffic so you should plan for a longer commute so you see that actually a very simple prediction like should you bring your umbrella or it's going to rain and then the question should you bring your umbrella it's actually leading to a half complex decision framework already depending on so much other information which you might have available so that is exactly the whole point from well if you think about just gathering information and making your decision yourself and then actually getting forecast to those information and baby you can actually do even a little bit more this is collected information to find or optimize automatically to find the right course of business action soand this is really coming in like four different layers and before different layers really started with business intelligence more traditional data visualization something we will definitely see today a little bit more and it ends with something we call prescriptive analytics so let's go through this weather forecast here on the left and also the churn example Vivi are already touched upon on the right so with the pure business intelligence approach you could look into the data from the past and say for example of all in the last year it has rained on 231 days that is really an interesting piece of information unfortunately not going to help you at all with creating a weather forecast for the next day so this piece of information alone is not helpful to you a particular helpful at least for the customer churn use case there are you in a similar situation actually you could say for example well we lost five million customers last year that is 23% of the overall user or customer base you have um and that is good to know but at the same time it's also shocking and more specifically it's too late you can't do anything about this because now of those customers already moved on they're gone you can't keep them any longer so the pure bi approach is helpful as it is for so many aspects it's unfortunately not really helping you finding the right business cause of business X so at least not an optimal one so you can get a little better insight and build a better gas based feeling but you're most certainly not finding the optimal course of action to get to the best outcome of the future well maybe you can add a little bit more predictive style at least into her into the mix by saying like well maybe I'm not just looking at the last year I'm looking at the previous three years for example and I know that this for example has rained 231 days in the last year and 217 days in the year before and 253 days in the year before that well it's probably pretty safe bet now to say well it's really rain at least 200 days in the next year so that's what I call it B I see bad prediction you basically take the example the aggregated information from the past and then on this aggregated information you find some trend curves or trend lines or some thresholds you think will be reached well that's definitely better than nothing but that's exactly what I meant before venues when you say like well in order to get turn insects from the past into predictions for the future if all you have is this aggregated information typically well it's a little bit better but still it's kind of like a gut feeling how do you know if it's not to hundred days or 10 days you don't really know and especially you don't know anything about the Crescent is it going to rain tomorrow so it's really difficult not to find a cause of action same is true for the customer base if you lost 23 percent in last year 21 percent a year before 25 percent in the year before of that then let's say it's a safe bet to say you're going to lose at least 20 percent of the customers so yeah it's somewhat predictive but not really and also there's typically not a lot of predictive analytics actually involved and that's exactly the next phase nor the next approach let's say for example you could take a predictive model now to go through all the particular days and also the information you had about those days from the past then you can actually take this model now to create the probability for rain for tomorrow so now you could for example they say tomorrow it's going to rain for with the likelihood of 95% all right so that is really the most data scientists feel very much at home so this is exactly what we are doing every day finding those models so that we actually able to make create predictions for every single situation in the churn case we can for example create a prediction or which tells us you will lose John Smith tomorrow with the likelihood of more or there's a likelihood of 92% so this is really the norm but if you have actually the predictions to other information and run optimization methods on top of this that is now really what we call prescriptive analytics because now you for example take other information into account and figure out well with the other information I actually should go by car and plan for a longer commute or for example in the case of fur mr. Smith you in the churn case you could figure out that I should give mr. Smith a call tell him about our new service improvement initiative and also offer a three percent discounts on renewal three percent not 10% three percent enough to keep John Smith so finding the right course of action that's really the prescriptive analytics so that is that is interesting because if you if you think about the value really then funnily enough often the value of the B I like spy predictions it's actually relatively small you can't really act on this it's really more insights and yes you can make some business changes based on that and this will definitely deliver some value but the interesting thing is that by going down to the detailed level and automatically do the right thing for every single situation that creates even more value so there is really kind of like an operational ization spectrum then so and this spectrum really is interesting because some of the elements I would not even go down that route and actually creates like those millions of like smaller predictions it's really those are the cases which are more well infrequent and often are a little small yes to teach a little bigger decisions you need to make so this is what every stars on the bottom left here this is for example if strategic decision like well should we create a new product line and what kind of product line should be acquire a company and often the best way to achieve this really well yes you definitely can use analytics in general but also predictive analytics to figure out what's going not likely going to happen but really those are like kind of on/off decisions and and they take a long time and automating them probably doesn't make a lot of sense even from any technical decisions for example defining pricing policies or underwriting policies doesn't make a lot of sense sense but if you go further down here so the the number of decisions typically grow and the duration per decision gets shorter like for single decisions for example making a specific higher define a price for a specific product on a specific day those are the systems may actually you can know operationalize it a little bit more scale its plays an important role here because just a number of decisions rose and then you go further up here into operation and it's fully automated examples for this would be for example making a cross selling offer approving a credit or even stopping fall instant excellence which of course needs to happen extremely fast so in our churn case this definitely falls into this operational bucket so if you think for example about the tail core which 50 million customers and they wants to predict who is really loyal and who is about to churn those are cases very really you can't just have a single person have ever look in many cases but the big big question here is I'm actually skipping the next slide because we talked about this already it's often the most visible thing so you can save a lot of money by operationalizing this and fully automating this so but but the big question though really is that is good so we have millions of decisions and in this particular case if you have 50 million customers and you make predictions for every customer every day basically you end up this billions of decisions you make it why not just fully automating this then so why are not taking your predictive model you have created and put this into some well automation system and rapid mind actually can do this as many of you might know and just automate this and and forget about this well and that's exactly the reason why Joe has often some problems sits in the basement in this serious said about this there are so many different reasons sometimes people call them political or whatever there they are there's so many different reasons actually why people contract also made this whole full process and you want to have a human being actually in the loop I give you a couple of examples let's say you created the model which is totally disrupting your business process and you go go to your boss and tell the your boss like hey look I found this predictive model if you would just change your business process then we can reduce or churn rate by 10% of course people are excited to hear something like that but at the same time well somebody needs to make the decision that the whole business should be changed in order to be able to make this decision you're not just trusting some kind of little crystal ball because that is really what a predictive model despite all the math and statistics is for many people so if the trust is not there this decision that won't be made you're not just automating this whole thing so you first need to understand what's going on and if this is really working and just creating a cross validation and accuracy estimation it's just not good enough for that because there another thing people don't understand so this is really they need to see it for some time first build some trust fear is another interesting aspect so just imagine your going to your doctor and the doctor by knows the robots and some pictures are taken automatically and the decision to make it surgery it automatically triggers and done so you're going to the doctor's office and based on what the machine learning algorithm is saying said like some robots is performing a surgery on you well although this actual decision might be better then then the decision any any human being is making I personally and I am really I am the data scientist I am kind of Joe I have some problems actually that's myself so I can't imagine really anytime soon that I would let a little just accept the fact that there's no shoe being any longer involved in life and death the situation or decisions about myself and that's kind of funny although I actually trust machines that everything's a lot I always like to have some human element into the decisions like that so this is often the case I still believe that machine learning can support the decision making process by for example pointing out the most likely cause of action or supporting for example doctors by looking or pointing out regions on x-rays as an example to find maybe their most like there's some cancer but we often assume beings often prefer to still have assumed being the loop confirming what the machine is saying so then there's rarity I'm sure states like so you're letting on the moon that's not happening every single day so probably you still want to have some zoom being also there in the loop although most of those algorithms of complete ease flying that they are spaceship alone but just as an example if probably better example would be thinking about your expiring another company this is not happening very frequently and it's also not just based on the cresst like well to do this company we are going to acquire develop positively that they make some revenue forecast well that's one thing in the predictive model can help you there may be but there's so many other factors are the company cultures of fit its how's the market developing etcetera etcetera there's so many other factors this will be very unlikely fully automate any day is soon and then last but not least they have such high molarity people are not very familiar with machine learning models yet but for example they are very familiar with data visualization products like it's a blue or a click or whatever so and it's interesting that this might be a very good channel for us it's data scientist now for sharing information for sharing our models for sharing the prescriptions predictions or prescriptions with other people and I think I will focus no most on this bottom right bucket because it can also help you actually with all the other three so if you actually present your predictions in an environment which which you're comfortable with then or other people are comfortable then it's much more likely that people will accept and those results okay so that is really the whole framework or the motivation really also for why should you really care about bring it together let's say machine learning predictive and prescriptive models and data visualization well you can amplify your voice you can bring this into for in the front of many more people they are familiar with this and they can be at least support us in making the right course of action then so how cool would it be then well so don't do it deliver predictions and recommendations for actions into room non dashboard to you and that's exactly what we are looking at next so I know shifting gears a little bit and opening a couple of products and by the way just an invite please feel free to ask questions at any point in time those are monitors and made from time to time yeah thrown over to me let's see but definitely at the end we will also spend some time answering as many questions as possible okay so let's have a look in through into click first and maybe so let's see their habits here we go so this is clixsense and well the product which was recently released bye-bye click in addition to their click view product line it's CERN data visualization products of the eye products many of you might be familiar with and I created here very simple or actually around CERN so we stay on the churn use case here we have some data for the United States and there is no a typically result actually in a more bi like fashion um so you took data from the past so for example we have aggregated information here for the different states of the United States 110 R and so on and he calculates some properties for example how much churn do we have in Montana here 21% to see this on the table here on the right or in Delaware 14% on Colorado versus 13% so if this information of course you can visualize this like in this map here we can definitely see that Montana has the highest churn rate here we have firms that New Hampshire here well I just moved you a couple of years ago don't ask me for every single state of the United States I probably look like terribly fail so let's not go into more details here for know so that is a typical situation and of course I could know a low drill down here and and have a look for example if I am a regional manager I'm most interested in Maine and New Hampshire and Vermont and Massachusetts and Rhode Island and Connecticut and now I'm actually almost impressed that I found all of them right away well this is really your reason let's say you're responsible for a New England so this would be your region here you can make a selection and you can get the updated information here well we have all those states here in New Hampshire actually most earn our highest turn rate actually methods and followed by mr. Schuester's and so on the average certainly this nine point seven eight percent those are typical applications do you know about your regional while I offered to hide share it maybe I should do something about this but what now what can you do really so and that's exactly where now the combination of data reservation product like clique and rapid miner can kick in so this is the first – sports i Rea today a mobile i'll extends let's make notice selection so we have now selected those six out of the 58 states here as you can see at the top and move on through this prediction tip in effect i do something that will right away because the last time I did this I did this for Montana I have to reload now the selection or the data calculation for the selection I just made so what's happening now in the background is I take the selection those six states the New England states deliver those this information about the selection to record minor rapid minor takes this information creates a model for churn for those six states makes a prediction for all the cases where we would love year and most interested in and delivers those predicted cases back into click that's exactly what happens in the background and additionally we see the most important influence factors here and how do they differ for a unique New England states so if I scroll down here you can actually see know how large are the predictions for your region and you can see while also in the future New Hampshire will probably get most sure and the rest is pretty much equal competitive New Hampshire at least so you know get the predictions for your own reason so where it's going to be most surance so where should you maybe focus on and this is now unconscious to the to the dashboard well in this particular case it was New Hampshire and it was very strong in both cases it's not a huge surprise but at least you've got this confirmation here as well well the other states actually changed a little bit but this also you can see typical reasons for journal maybe not always reasons but also like okay is there some patterns which are which I'm like sticking out here and indeed for you reason you England you can see that actually this age brackets between 20 and 30 here is much more like the integer and actually infects 2.5 x around 2.5 x more likely then what you see in all the states so in this red curves you're the red bar chart this is the information you have for fall for all the states but in your particular region this age bracket actually is even more important also the number of males are higher so if you for example now wants to work with your marketing team on this couple of it churn campaigns it might be a good idea to focus on this age brackets first like like mail people with in between 20 and 30 so that is now we know what's going to happen we get some prediction about Europe your region you made this election like you're used to in in in clixsense you get the prediction delivers and I will show you later how you actually integrate both products you get the prediction delivered in to rip into clique but can we go a step further and of course we can because reppin – not just crane the prediction on top of the aggregated information we can also get the prediction really who are the customers how many customers per state gain most here in New Hampshire 7 I think followed by mr. solution mr. chooses and so on and who are the customers with the highest like you to churn and this is exactly what this table here shows are sorted by churn confidence those are the people you should focus on first those other people who are churning most likely then if I scroll down a little bit you're getting to a region where people actually not very likely to churn but I also traded the so-called upselling model didn't talk a lot about this but you probably are familiar with the concepts and there's upselling model here be created actually for all those people who are not likely to turn and who are currently using one of our cheaper product packages p2 and p3 you figured out who is most likely to well there's most willing to to purchase the more expensive package p1s bell or instance so this is now very very actionable insight you know exactly was not insurance we can focus on those people you know where you have some opportunity to increase your revenues so as I would be responsible for the New England reason I know what now exactly what I need to do and could actually act on this and this is exactly the difference between here on the despot side as we all know it's data scientists like great I know what has happens I would focus maybe on Montana and that's it or whatever buggy here we'll just see well yeah New Hampshire Massachusetts here but missus tools becomes actually less of a problem for whatever reason may be other activities you already did so it's focused on New Hampshire here and those other people who are earning in New Hampshire we can see here and can focus on it or focus on them so it's actionable but based on the predictions of the future and this is really this is of course what now rapid miner can bring to the table but instead of having the need to understand every single model people can stay in their usual working environments which in this particular example here is clixsense okay so how are we doing this um let's switch over to rapid miner um I can I hope you can all see my screen here this is the process I have created a very simple one I in the beginning I just lost load the raw data then actually I take the parameters from click and the parameter I'm taking from click actually adjust these states you're selecting in in the beginning you could also select the year you're using for modeling but actually I didn't explain this so let's skip this for now it's small I here so here we have no basically air filter defined this is using the states which are coming from click well some of you might not be familiar with this performance and some of you might not be very familiar with mine at all right now so this is a so-called natural which it can be filled from outside of the process in fact here you can see that I prefilled this with all the states of the United States so as a default I just deliver everything basically so I'm not filtering out anything so that's what's happening here but click that can deliver the inputs well those are the states I clicked on and so this is those of the state I at the Regional Manager from New England I'm most interested in so deliver this and then the rest is well kind of typical data fine stuff so if I show you the data we actually see if I find this right away I'll take it from here in rapid miner why not I have here this churn status column and you can see that in many cases we don't know that sure instead is meaning like we don't know if those people are going to turn yes or no and in other cases they actually have this information about lawyer people and some of you also have to churn cases here okay so of course as always the task is to build a predictive model taking the information we have into into account and predicting those other people we know about if filtering both and create a turn model and then we apply this model on the people where we don't know about and those are the predictions we have seen in the second and third best part now we do something very similar here on the on the upselling opportunities so we know what packages are used and we can try to find out other opportunities for people who actually show a profile which is closer to to the higher value Texas which is p1 here in in the end alright so other information the data very quickly so we have information about how much they spent in the in the previous three years on the mobile billing and the landlines the age the package they are they're using the gender of all we saw this already in there in the dashboard okay so typical data finds Brooke flow so what can you do next then well the next step is actually that you can easily store this on rapid – server so you install a new server repository you can do this here and create repository and take the server here and after you created a new server you can save just really is that all you need to do that horrid deployment works it's pretty simple actually you just save the process on the server and after you did this let's now shift to a server it only takes a few clicks actually to turn this process now into a so-called web service look in here because I can type my password correctly alright so all you need to do is really you go to the the process we just deployed on the server here which is this churn and upselling process you take this process and you can click here on export a service you're on the right side and after you did this you get a new web service here we have this and there's new web services really really easy to to con to be configured so this is the process which is going to be executed you can select what kind of outputs you want to create you can trade charts or maths or XML or JSON files or deliver whatever can spend it to reach you it can be created with rapid miner but in this particular case I just go with a very simple table here and now the the last thing I can configure is well f2 metros for the year used to for modeling I called it m2 and the states I'm most interested in that's p.m. one and I bind URL query parameters to those two metals what does that mean well if I test this app service Eno River minor servers server you can see now that I can get here some some URL and for example can say let's go with New York and test this thing on three other bits now I'm only getting the information for for New York so instead of the full data set so I'm getting now all the information like for example what is the prediction in cases where we knew it already we don't need a prediction all other kids we have the prediction including the the confidence and so on so every single piece of information she delivered now from this net service now you can take actually this URL and feed it into clique so let's move over here and how are you doing this well in click there is this data loads editor and I'm not really truly a click experts myself I have to admit but I spent a couple of hours really only and in learning how to use it and I have to say like yeah I I managed quite faster to really set it up all I'm doing here is well I define a new data set I call it churn and I'm loading all different kinds of columns here and I rename some of them all different kinds of columns here from this URL it just has been specifying and I can use those parameters here those other states which are selected as a default I select all of them but later you would see how could we can also eat here you actually can see already how we can select a specific arm yeah states are also here the year this is the way how you can actually access the information that you clicked on in click so you use the dollar symbol and then round brackets and the variable name in this case if you're selected if I scroll down a little bit you see the same also here so at least visits here they are actually do the same for both selecting the states and selecting the years so this is the format that's all you do you do is you would you define the name you define what columns you're interested in from this table which is delivered by the web service and then it's just the URL from the best service you just can copy here from this direct link and that's kind of it then you click on load data and that's that's takes a while all right so now I upload the data and I can now go into install rom yeah into our app here in click and can rough it the data as we have seen this before so this is the overall workflow there's only one smoke have it for those of you who use both products specifically you know and want to try it out after you made this selection you need to tell click that it actually needs to reload this data if you look up on Google for a click reload button then actually will find a nice solution how to do this out of the desk bot there's a little bit more elegant than actually making you go back to the to the data load editor both both ways with words you can read all them through the editor that's kind of the standard click sends way or you you at this reload button here to your dashboard which is I find a little bit more comfortable than going to the editor okay so let's go back into the into the presentation and that recap what we just saw so how do you amplify predictive analytics but click and general probably with all the others relation to some we use or make use of this deployment mechanism of rapid – servers so you design a process first analytical process in rapid miner then you just save it in the rapid – server repository and on the server you go into this map interface and turn this process into that service so that's basically one single click and we return a table now on the clique size us this rapid miner web service as a datasource you design a dashboard like you always would do this in in in in clicker and click sense and that whenever you make a selection you can also feed them through this through mechanism into rapid miner processes so basically reacted to user selections and whatever the user is most interested in make all kinds of offer processing and deliver the results then back to click and that's on a side note since we can do what all kinds of process is of course you can only quote unquote I only use this for trading predictions but you can of course also do all kinds of data preparation work especially when it's more data preference that's a more advanced statistics like for example finding and removing outliers or what else like the normalizing data all those kind of things you can do this kind of data prep of course with rapid miner as many of you know actually then run those rapid miner processes to prepare the data for click as well so we of course also have the whole data blending and cleansing functionality is part of rapid miner so you can do this or since you still work with miner process with 15 other operators in total they are a lot of operator for example through sending out emails triggering other web services so whatever you can do in a rapid miner process you can embed this cause of action also now into click through this web service integration and in general for those of you who are a little less familiar with rapid miner and this might be the first touch point for you in general this is really an important point for the whole rapid miner platform so we really go through all three very important wealth phases of the data science workflow or the analytical lifecycle so the first phase is or most often the first phase is really like interesting and more importantly and preparing the data so really like connecting all the dots connecting all the data points or data or data sources making sure that you blend them right together on the right we clean the data so really there's hundreds of operators and reprimanded for doing the data prep part though many of the data is coming from no matter what the scale is you can do the data prep in Hadoop classes in memory wherever you want the same is true for the modeling and validation phase which of course key to figuring out what's going to happen and and finding the right course of action so finding the right machine learning model and then isn't central point there's 250 models in total in repet minor you can all try them out you can also do a lot of model optimization and automation or automate this whole process like doing the right parameter selection figuring out what the right features are you've done to build the model on so most of you will know this but just for the photos before the few of you who are never safe scene reprimand and action before it's really very powerful there and of course the validation is equally important so we know exact Klee how well those models work in practice and in the future and then the last bucket that's really the one we focused on today most it's really the operationalization bucket and in general of course you can deliver all those results into all kinds of business applications can also automatically trigger the execution of certain excellence after imagine for example instead of just visualizing the churn cases why not just creating a campaign to those people and automatically let's say in marketing automation software or app sports you name it and sending out this campaign so we could in theory also made this whole process but as we have discussed before that's not always the best and most appropriate approach so sometimes really the best approach is just to embed the predictive insights and excellence into a data visualization platform like click so this is a particular example for one particular operationalization then and then before we wrap and open this up for questions answers I think really you should give it a try and also this integration with Reverend miner in general we talked a lot about the operationalization buckets which is definitely one of the things which doesn't makes us very unique how to how many systems you can connect and how well the predictive modeling is then integrated into the usual business workflows you know that reprimanded really is about the self-service predictive analytics aspect so yeah it's called options you can code if you want to but you don't have to so you really it it's very easy it's effortless as we call this be guides you with things like our wisdom of crowds so you can actually get recommendations what's the next best step very important for us this speeds you up finding the right model um we saw in the beginning like why is Joe still taking like six months well because sometimes there is nothing in the data but like in those six months he was running through so many different data sets and tried so many different use cases and that's only possible because we be accelerating this whole model finding process and of course as an open source a leader here we really embrace a lot of innovative solutions coming from ourselves but more important actually also from our community and embrace also other modern solutions especially in the big data space um yeah I think I will a rabbit at this point so it's a great platform click tense it's a great platform as well so bring together two good great platforms seems to be a very very good idea for those of you who have who have clicks and already or click view this it's practically in the same way you should definitely give it a try and move there because this is really the key as we discussed in the beginning the ax of T to really get your voice heard and make sure that your models are really well recognized and then hopefully also fully operationalized and and optimize your business processes so at this point I would like to open it up for questions and answers Haley thanks thank you for a great presentation today you covered a lot of really great information and we appreciate your time so as Ingo mentioned we'll take any questions that you might have now if you could submit your questions through the questions panel we'll take those questions now so it looks like we have a question here for you and go is this integration possible with QlikView Enterprise Server or is it only possible with that no it's also under zgm no it's also possible with QlikView Enterprise Server on and it works practically really in the same way I in fact the first shortly I don't have a clicker now really available otherwise I would I would show it to you the first time it's race I of this you crazy actually first created actually this click view and it's expecting the same be on our documentation server Doc's rapid manner comm either is already published or it will be published in the next couple of days there is a complete set of documentation around click view and click sense and how to integrate this so it's possible both and also then the document great thanks I have another question here for you this person says I use click view mainly how does one integrate QV and rapid minor is there a QV connector for a rapid minor and do you have example asked for this integration I fortunately I don't have an apple I could show you right now it's too bad yeah good good a few points so there's a couple of multiple ways the first off the first thing I should also mention also for rapid manner we have for the possibility to write the PDF as I think the the extent I don't know exactly but we can actually export all kinds of data into the click view format right away and from there you can load it again that is I'm saying always possible and it's good way or an addition similar to the answer before you can also make the integration via the web servers and that looks practically the same as we saw this no four clicks ends no significant differences different uija but that's about it but the concept stays exactly the same can we get a trial software you're asking for click send so for every bite of the answer actually is in both cases yes so for a click sense you can you can definitely download sir version or from from the click websites for click view I think the same is true not Huntress and sure though for rapid minor that might be the quarter question yes there is there first of all our Community Edition which is freely available to you also without any limitation and time and then for our commercial offer there is there's a trial version for that as well great um looks like we have another question can Robin miner integrate with any other applications like tableau yes indeed the integration is that system law was used as an example the integration with tableau it's not let's say completely bi-directional so we can for example deliver data into tableau through the old data format as an example we can also write to tableau or into table formats just all offer as a miner processes what I mean by not completely bi-directional right now to the best of my knowledge we can't take for example user selections out of tableau and deliver them into rapid manner web services well our episodes can do this obviously about certain you just saw this here for click but there is no integration of the tableau side for this fool what I would call like a bi-directional communication beside that since trouble of an example there's many many other data visualization products out there of them almost pretty all of them by now support this kind of like a vector service based integration and so the integrations and theories always works and more or less the same be but in addition to this we also have a lot of other business applications that say like Salesforce with Salesforce connectors Marketo hops sports if a lot of connectors really for on premise based software but we have even more and in fact by no more than 500 connectors to cloud-based business applications so I would be so bold and say you name the application and the answer is very likely going to be yes we support that as well great thanks another question are you able to invent our and Python code with rapid minor process of course we are yeah no that is actually a feature resistor relatively frequently used because so many data scientists and science teams have for our app I think holders on as part of the team let me just actually create a new process here not even sure if I actually install it yeah here for example I have the highest in scripting and stores here so there's an extension on all marketplace so in general whenever you are missing something and rapid miners always a good idea to click here and get more operators this will bring up the rapid miner marketplace and all our marketplace there are additional operations which are created by rapid miner or by a third parties or community members so the Python and our extension here's the are one can see that one moves up here execute our I think that's actually yeah no it's right so um the I aren't eyes are both a rapid miner extension so we are of your hosting those and really what all you do is visit but you put them as part of your workflow so let's say you have some data you can even feed the data into the Python scripts here and then you can do whatever you want to do inside of this off the script so you can add your own homegrown data prep scripts or your most preferred model is in in case you have something invite no our can then deliver as we do here the results at the end which are then can be fed into the rapid miner process again I think this is a really important person because sometimes you have such a special prep data prep step just as an example if you can do this rapid mind approach for that but if you already have in our scripts or in Python scripts which is solving a particular problem by not just embedding this and I think the key point I would like to make around is this if somebody of you spent the time to coat this wouldn't it be actually a nice idea that you can share this with so many other people maybe even non coders as well and that's exactly the idea you can now create your favorite scripts manage them in rapid miner can even save those predefined operators here with all scripts as a building block so that you and other people can reuse them again accelerates your work supports the collaboration as a team and there's also really to get the best out of both roads all of the arm Python scripting girls as well as the non-coding rosa just makes you faster for those thinner charts like you don't need to code whatever how to do a multiple nested cross-validation including a feature selection parameter optimization etc this is something you could actually pick together with a couple of clicks and record minor but you would need multiple hundreds of lines of code to actually implement this now in empires and so so it's really I think it's all about combining the best of both worlds and and that's what I've caused coach optional and in the beginning you don't have to code from actually all use cases but if you want to you can always do it last click comment on that one since we also have our loop offering which is pushing down computations into Hadoop clusters we also supports a PI spark and a spark are so even if you want to push down or the execution of our Python script into a dupe clusters you can use rapid miners for actually governing or even this process so and that's really what we see often in practice that people really use rapid miners the governing platform also taking care about version control scheduling the operationalization integrations through web servers and database tools doing all the standard analytical charts and then somewhere in the middle is your very very specialized a script or python script solving a problem only you have they actually have to call because there's no other basis that you could could solve it and this is really a very efficient way to work in the situation we encounter quite frequency great thanks I'm another question here can the integration with clicks be used in the community version of rapid miner well you need yeah well there is one way at least to do this which would be which would be basically by writing out of the file formats that extra part of the community edition I I think so so so that would be one version but it's of course not that elegantly then where the integration with the year if it's a better service because that's the truly bi-directional approach that you can also react directly to the user selections and click for that you would need the rapid – server products in a version which is not freely available at the community product this is a commercial offers off of ours great can you do market basket analysis and rapid miner yes you can there's actually again on the marketplace I don't think that I fix install this right now let me secretly no I I don't have to so there's a complete extension around this I don't remember the name otherwise I would look it up right now but there's an extension all markets by marketplace for exactly that you also have a couple of standard algorithms like FP growth and Association rule learning as part of rapid minor itself so if you look for SP growth here you already see the small shopping cars here those are operators which are available to you as they're part of the rapid manner core and then they are extensions which are further optimizing those functionalities think think I have another question for you how does the complexity of a model affect the actionability in the insight what's the question about how does the accuracy of the model and affect this or sorry I just didn't get it acoustic sorry how does the complexity affect the actionability is it well in at least two different place so great question actually um in at least two different ways the first way the fact that very likely more complex model if you're not doing your validation of the model right tends to be less robust and tend to have produced more overfitting so what what that means is really likely that you're very good on the data you train it on but it's the data changes the limits on they always change because the world changes the model is not very robust so I often prefer to really even if it's not as accurate go with a more robust model which is this is maybe it coming with a little bit less complexity because I just would expect that for small data changes or smaller changes in the world my model is not immediately outdated because it's in both cans kinds of operationalization either either if you automate the processes you want to be sure that everything is working as expected but also if you visualize the results to the user if those results change too frequently people again they're on this process of building trust they've wondered a little bit like whoa what's going on why is this model now it's created some prediction yesterday it's creating a different prediction today that is not wrong but it's not a result of that the door changed in the models of the robust sorts its reacted faster then to those changes and that's not really good well let's let's prozer the biggest effect what was the second one I had in mind in terms of like of course higher complexity also means if you actually need to somebody to sign off somebody needs to say like wall especially in the fully automated process well I trust this model is good often what I found in practice is you can proof of course with the right validation techniques while this model is going to perform well in 98% of the cases in the future I properly cross validated this etc etc you do all your work as a data scientist and then sooner or later somebody outside yeah but can you expand them a little bit to me and of course with the decision tree here and like with a depth of three that's easy and people can typically follow understand but then explain that deep neural network or then explain a support vector machine with higher on our a radial basis function has a kernel function story and all of a sudden things get a little bit difficult and nobody can really follow what this model is doing so even if people get this abstract number while it's 98% correct then they look into the bottle they don't get it and they have really have a problem then to trust the game this is a game a little more important when it comes to the automation automated version of operationalization because when the shin being is still in the loop but you can always do is of course you create the predictions and often you can also create the predictions Plus why this prediction is how it is so basically it shows these the the influence vectors which led to the prediction as it is and by doing that you can build enough trust in the human being so it's really it's um it's really those two effects understandability one thing and then the other one is just a normal overfitting problem and missing robustness but that's why I personally in practice often prefer a little bit less complex model over a super accurate one coming with a higher complexity but that's just based on my my own experience great thanks and go so it looks like we're just about at the top of the hour so I'm gonna go ahead and wrap up the question and answer if your question was not addressed on today's session we'll have someone follow up with you with an answer to your question so as a reminder the recorded version of this presentation is going to be sent to all registrants within the next few business days and rapid minor will also be at click connections next week we're a sponsor of click connections down in Orlando so if you're there make sure you stop by our booth and thanks again for everyone for joining today's presentation and have a great day thanks bye

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *