When Big Data and Predictive Analytics Collide

right side like to run introduce ice is a little tux is that in the room from smelters acts is an expert in paramount ative help me out here sorry mother markup language thank you going up third I'm tired I'm jet like totally blind to know how to say FML three FML that's why CS thank you very much he's vice president family is excelling for AXA mentis and involved in in whole and analytics in survival in six enhance fever p.m. FML account says I hear it a thousand welcomes thanks Charles thanks for the organizers great to be here has been two years so it's really a pleasure to come back so the title is when big data and brave analytics collide it's really just you know I was trying to think of an interesting title to have people pay attention but it's i'll be talking about p FML as Charles alluded to but EML is part of you know this collision course of big data and predictive analytics and i hope i can show you that and you know one two things collide you can have good things or bad things out of it right usually have new things that come out of a collision so I'm hoping that I can show you that it's fascinating I mean everybody's talking about Big Data these days I mean we've seen quite a few of these speakers talk about Big Data but you know how does it differ from predictive analytics and in my knowledge or my view of big data pregnant ladies actually part of big data so when people talk about Big Data today they are actually talking about the techniques to also solve the Big Data calendar or the challenge of you know getting value out of big data so the outline you know i'd like to make deadlines questions so these are the questions I'm going to try to answer so we're going to be talking about about a little bit about particular lyrics and big data obviously what is pretty good lyrics give you an example what is big data what is the big deal about big data right can you show me a predictive model that use big data I hope I will do that and what now and then you know in this course of collision I'm going to try to introduce p.m. ml to you which the predictive model market language which is a standard represent predictive solutions big data or not so what is p ml try to define that what it's made of who supports it what are the benefits of using a language such as pml to define your predictive solutions and what's the kind of predicting what workflows that you can represent if there are no I thought the word workflows would be nice in the rules community but I'm talking terms of predictive word phones here and then I'm going to talk a little bit about the perfect analytic storm and I think we are really you know in that kind of scenario and I really enjoy being where I am now because you know there's a lot of interesting and cool things being done and then I'm going to attempt to give you a live demo so you know forgive me if it doesn't work at the end no with demos it's always dicey but who thrive and then talk a little bit about resources and literature so predictable na lyrics we've been talking about it the entire day here and you know analytics is a big field and it's usually can be divided thinking two into two fields and it's in the bottom of this light there it's you know you can divide into descriptive analytics which tries to answer what happened so for example the average transaction amount for the light less 30 days in an account how much did you lose for fraud in the last week or not so there are things that you look at date in the past and you try to come up with this values right and you can then have dashboards or a busy kind of season to show you and so you can give to the business folks for them to you know make decisions on top of it now predictive analytics tries to look forward so giving what I know how can I you know predict how things are going to behave in the future so I had that report icon and I decided to put the arrows there and that's really the idea of predictive analytics I have all this past data and then I can analyzing interesting ways and I can then learn patterns from that data and try to see if these patterns are going to repeat in the future and then try to predict things you know that way so in a way you're really using you know clever mathematical techniques on top of past data to forecast the future that's really what particularly exists so I told of an example and it kind of you know predicting churn is quite an obvious example for particular lyrics and it kind of similar to what colors was telling us about and I decided to separate this into two population so let's say Matt and Scott are represent those populations marry someone that turned based on attrition he was discontent and he left the company okay maybe telco let's say let's get an example as generic as that and then his cot is on the other hand I actually use his facebook to like your company so his loyal customer and you know so I have these two populations there now John I don't know anything about John so I'm going to try to predict what's going to be Jones behavior given that I know Scott and Matt's behavior okay so that's kind of an example of using particular lyrics is john going to turn in the next few days or not is he a loyal customer so here are some churn related features right and it is this is an example really but it's something that we see almost every day and this can come from ERP or CRM systems that you already have and it is kind of traditional demographic and information you're system you know this is the traditional data that peuta people help you build having been building their predictive models on so for example met he had three complaints in the last six months he opened to support tickets in the last four weeks you know sometimes money he spent not a lot he purchased two items in the last four weeks you know some demographics releasing la is called on the other hand had no complaints he opened a one support ticket and he spent quite a bit of money with your company and he bought 12 items in the last four weeks and is a may only be Chicago so like demographic information sharing information so they are typical of features that you can try to use for you know predicting churn there's many more other than this but I'm just trying to give an example here so turning that paging to big data okay so I can you know build a predictive model out of that data that actually works you know I can predict churn but do I want to infuse that data with something more do I want to bring my predictive modeling to the next level and the answer is yes right and you would infuse them now your typical CRM your IP data with big data reach much richer data so in a way you want to try to have a 360 degree view of your customers and this is of course for companies are dealing with customers but you can have you know different kinds of predictive models that work with sensor data or any other kind of data so the way I like to see big data is as ever-expanding ocean right and i will like the world map for that because it captures a little bit of the view i have i like to have image that define things so you know the ocean here is a topological met to consider pacific ocean is or the oceans in general you know they have breadth and depth and that's the volume i mean big data has 3 v's associated with it and its volume is number one right so it's really lots and lots of data and then you have a variety of data right from different different things and I actually have some here transaction records that's there you know most common one social media is in you know the thing that we all been talking about how to use that data but climate information can emerge the amount of data mobile gps signals everywhere we walk now you know our mobile phones are giving out or GPS location healthcare David healthcare is a huge feel here because you have electronic medical records you had you have x-rays you have labs network all kinds of data that are really encapsulating this big data narrative here and it's mark gray demeans smart meters voltage sensors all kinds of things that smart grid and then the digital breadcrumbs and that's whenever you go and everybody's leaving digital breadcrumbs now because if you are surfing your internet you're looking right well you're leaving all these breadcrumbs behind right and this is also big data and this can be used in different ways to do predictive analysis on you so in it's interesting that ninety percent of the data today was creating the last two years alone so that's that's a big deal right but I also would say that there is a big problem with big date and that's the other V but I didn't talk about which is velocity or speed big data is created extremely fast right all these things are creating data all the time but I will also say that it gets old very easily know very fast as well you know your customer is going to change very quickly the face facebook data is going to change very quickly so if you don't act on that big data very quickly you're going to actually lose the value on it or most of the value that is in that data it depends on the kind of data obviously right so if you're using big day to have to be extremely agile and fast and it's it's a big challenge right because now you have lots of data and you have to act on it very fast to get the value out of it the insights that you need so I was trying to think of some Sharon relate big data features and this is different than the features i showed you before so we're now expanding our universe of features here from the traditional CRM demographic kind of data into the big data rearm so if things that I can deduct from a big data saying well okay matt has 12 friends you know that share has not worked that alyssa i was my skull as my customers as well i have two complaints from friends that matt has in the last six months the average age of friends is format is 41 years so you see i'm using his social network to get all kinds information about my company and how it relates to all his friends and friends of friends let's say you can imagine that mature and i can look at you know digital breadcrumbs when was the last time that Matt visit our website in the last how many times in the last seven days how many pages were open how much time did he spend look at it how many newsletters did he open did he click on any links that were in my newsletter so as you can see here we are really expanding the traditional data that we had more toward 360-degree view of whom Matt years and who is colleen's now I can see well you know maybe matt is connected to a lot of people in his network and if these people let's say turned two weeks before i'm going to try to mitigate that their impact you know when it comes to map and i'm going to try to keep him right so this is really going an extra level in trying to bring big data to make your predictive analytics go to the next level right but it's at the end of the day you're really are infusing your original data with big data and using all these mathematical techniques to get where you want so this is the real they process the traditional process of building a predictive model you have the two populations than met and squat represents so the people deterrent and the people that are loyal customers and then you're trying to build a predictive model year training a predictive model so your beauty this mapping function between you know the two populations and the target values and you want to of course your model to generalize and I kind of list here on the bottom you're our most typical predictive mathematical algorithms for creating predictive models neural networks happens to be actually one my favorites but regressions for vector machines car car decision trees clustering Association rules and the list goes on and on and you know more recently actually people are not just settling for one predictive algorithm they actually using several and they are combining them together and it seems that the idea is that you know one plus one equals three actually can get more value that way so you can build different models for different populations for different parts of your big data whatever and you can you know assemble all together using a model in some more different ways to to put it all together okay a typical example of a modern assembly something called random forest and I'm sure quite familiar I mean random forest model is basically thousands and thousands of decision trees that work together to leave your prediction so but what I mentioned before is that you have to be agile right this big date is coming at you as a tsunami and it's really in a very fast way and the date is going to get old so you need to manage it very quickly you need to be a joy you need to deploy it very quickly and the idea here is that you built her model and then you have to deploy now your model you need to you know build your model past data but now you want to use this model on new people right on everything that happens with your clients of existing clients try to predict if you're going to turn or not so you need to move this model that you built in the your desktop let's say using data mining tools into the production environment you need to personalize it and this task usually are you know traditionally has been a hard task to accomplish and this is traditional scenario of deploying predictive solutions and the you know it's in this traditional thing in this scenario it takes months to move your model from the scientist the data scientist desktop the person that was responsible for bringing out that big Dayton and particularly is together into your operational system so basically and I here have Scott and met again it happens that Scott was a data scientist we didn't know about that and it was not in the big data so it was just living for later and matt is an IT engineer they live in different worlds they speak different languages as you can see it's got a nose knows our size spss a Spotfire Python and Perl in all these languages or the universe of techniques that a data scientist needs to master the attention year though lives in different world right java.net say quote so you know to have this model that the data scientist builds operation is deployed you have traditionally you have to write a requirement doc you pass it up you know to your 18 years they don't quite understand it or have questions there's questions back and forth and this will take sometimes months for a model to be deployed this is not acceptable when it comes to big data and pretty frenetic these days okay so something new needs to happen here something that will allow you to deploy your models in minutes not months okay and that's where P&L comes in they predicted model market language so pml is an xml-based language so it's not extremely exciting it's XML at end of the day but it's used to define a statistically they statistical and data mining models and since as I started you can share this between pml compliant applications it's a very mature standard has been around for more than 15 years now and it's developed by the dmg which the data mining group the mint is the company I'm with is part of the dmg and I'm the representative there I've been working with the dmg for the last four years or five has been a few years p FML really eliminates the need for custom deployment custom coding of that solution so there is no more requirements doc you just create your prediction ocean explorers p.m. arrow you'll pass to the IT folks and they applaud inter pml compliance counting engine and you're ready to go so it takes minutes not months to do that okay they're not the other nice thing is that pml is not just for predictive models what I try to say is predict pml is actually for predictive solutions the solution incorporates pre-processing like massaging your data as well as post-processing so pml can even have rules at the end of the day that simple rules that define thresholds and business business decision so the whole thing can be incorporated into a KML file predicted workflows what kind of components he has well everything that you can imagine that is used to represent a predictive model or a solution so you have input validation of liar missing values this is extremely important when you move your solution into production because bad things will happen to your data there is all kinds of data pre-processing elements and then you have a bunch of predictive models that can be represented here Emma obviously and data post-processing not only that you can have modern samples you can have motor segmentation you can have model Cheney or composition so all kinds of multiple models multiple predictive models together in a single unified or dispersed in different EML files that talk to each other so the benefits here besides of course avoiding custom coding of predictive solutions there are many benefits i think associated with open standard right but i think you know the really important thing here in there Charice transparency so I've seen quite a bit of companies that have in a large I data scientists teams that master different modern tech modeling environment so wanting like sales in the uses and other three ng likes are and these are another thing use Spotfire landing spot fire but you know if everything is representing p.m. ml then you have transparency you know for all these things of what everybody is doing right because pml then can be understood by all things disseminates knowledge so something that's being done by one group can then be understood by the other group forces best practice between data scientists and transforms predictive analytics into really dynamic assets it's not something that's sitting on a desktop for a while until it gets deployed right it's really agile in terms of support I may have missed a few companies here but it's really support across the board different companies support of court different parting parts of the standards of the standard I was just talking to jacob from open rules and they are supporting actually rule sets in pml but you know it's it's part of the standard and they support that so the rules community is supporting of course most of decision trees and rule sets which is more in line with business rules okay but you can see here quite a bit of companies that are behind the this standard so the perfect analytic storm here you have big data and then you have predictive analytics and open standards that can make this to verragio right and then you have also cost efficient efficient processing power and that's represented by Aeschylus mention Hadoop and cloud computing and I totally quite interesting to have actually a demo of you know a securing one of the sisters in the cloud because really cloud computing in Hadoop allow for all this processing to happen in very in them even in real time I mean how do probably not but you can press a lot of data in a very short amount of times so it's really the perfect analytic storm and we are we are you know in a very exciting period because i think you know let's say five years from now a lot of things will be already set in stone and defined so this is appeared to define and get the processing order so i'm going to show you the live demo which i hope it work and the idea here is that now you have p.m. ml and you can use any of this on the on the left side mean of this model building tools to create a predictive solution with normal data with big data and then you can export that solution to pmma oh and then you would import in this case into the main see scoring engine so we have our platform for securing models that leaving the production side of the equation adopo would be our decision engine that incorporates business rules we have it running on the amazon cloud and the IBM cloud we have it out for on-site and we also have a solution for any database scoring and hadoop ok so this is the season i'm going to be showing you it's a tapas or scoring engine it's completely son is based there is this integration with predicament lyrics and business rules so adopo has drools inside it so you can write all these rules and wrap then around your predictive models it can be very complex you can deploy and manage one or many models or one of many rule sets I provides real-time or bad scoring in this case via web service or the console or you know we can do all kinds of scoring and it's extremely fast and it's available on the Amazon and I game SmartCloud and this is really interesting because the cloud is pervasive right the clothes everywhere around the world and we have lots of clients in emerging economies that tap into that so this offers a service offered as a service and for the Amazon Cloud for example starts at nine nine cents an hour so if you are in India you know that really matters and you know you're more likely to use the clouds you know to deploy your models there or if you're in the US and you know that's not a concern then you would bring anyhow okay so let's see if so I thought of building a predictive modeling I'm so name is a model building environment that was comes from a company in Germany it's open source so you can download and play it with yourself you know yourself it doesn't cost anything and what you're doing I will define you know a kind of process flow and you have here the by the way let me show you first the data so this is my data set and this is all these features i showed you earlier you know my original CRM your RP data features as well as my big data features so I have all kinds of features here each record is for one customer let's say number of stop support ticket less for weeks and i have i'd say a percentage of friends that turned in the last 30 days okay so i actually are as you can see there is a lot of big data et al that goes into this to get to this data set and then I simply down from millions of customers to 180,000 I think that i use for training here so we have 170 180 thousand customers representing this year flat fire okay so let me go back to nine and the so line has all these nodes to do all this processing for you the CSB reader we just read that file in and then I have identification of variables or normalize all the features and I be in then and I do some other kind of processing here and I need to get to the end here okay and then i have here my neural network so i'm going to train a neural network to predict churn for this population and then i have a pml writer that means the neural net is going to be exported as you can see these blue nodes here are just pml notes so all this graphical nodes here night I actually export clear mail and they pass down the line so it really follow the process flow at the end of it the whole thing you have a pml fight represents the entire computation together if the neural network ok so to do that in I'm I would just say is a cute select me do that for the neural net and you see how this you consume the screen here they have been killed and they are you can see the first node there it's normalizing all the data and then it's going to be in it and eventually is going to come to your knee roll at work and is you know generating your net ok and then finally when all that is done so it's butan the neural net now I can just a secure the pml right or know that we write my entire process into peer Mel fire it's going to take a while so I already generated a kia rio5 for this flow so believe me comes from this in the interest of time I have here denying your net clear Emma and I can open it for you it's XML so it's not extremely exciting that I'd be quick there we go so here is my new Annette we can see all the neurons which the connections and all that let me go to the top here there is a header that says it comes from nine diversion and you know this name of course it was my name was unloaded then I have data dictionary that just really specifies all the input fields that come into my predictive solution and then I have a mining scheme in this case also i have all kinds of treatments for invalid values and missing values that i can define in- in a mining I schema and as you can see here my target variable that I used they want to use the train this model is called churn flag and that's the predicted value and then I have all kinds of transformations here I did all those normalizations but I also did some beanie and that's all representing the mighty Rimmel fire together from a neural network so once I have this how do I go around and deploy it's like now how can I use it to you know this car all those you know all those customers to see if they're going to turn or not in the near future then you know I can basically deploy that using adapt and I'm going to choose the cloud here the Amazon Cloud and we have a free try or futaba running on the Amazon Cloud so I'm just going to log in here to my trial what you see here is the server view of you know the ancient discovering engine that's running on amazon instance right now of course if you have this product it's private to you no one else has access to it I already deployed quite a few models for vector machines generalized linear model or clean means a clustering model I'm just going to go ahead now and upload that debt model so that I created with nine so it's uploading it and I don't see yo creo que here fun is this to be true yeah I it's not but I have a way to do this actually that works so if i create this it's because the screen is so small i don't see the button so if i do that again so i'm going to log in again so just to show you that the mod is there there is usually I ok button that you can click but it's true so here is the nine neural net it's deployed that means that process that took months before took minutes right I build my model and I move it to the cloud this distance could be running in Asia it could be running in Brazil I could access in real time using web services I mean that's really how I can be so you know that's part of the demo the other thing is we have actually created a probably heard of datameer datameer is a Hadoop based company they have this Excel product looking Excel looking product on top of Hadoop and so they have a plug-in for for for them and so i have here data you're running and i have data the turn date and I also have some audit data here so just to show you that the same thing applies you can move your pml file anywhere so let's see here I have actually deployed already somewhat it's the same way I did in the cloud I'm deploying these in datameer on top of Hadoop so I have a decision tree here and I can choose a file and they want to just built and i'm going to add that pml fire you log me out that's a part of life demos so here is with this is a plug-in for daytime here so you go to the universal pml plugging and then let's try it again not that I am so there we go my new net is already now available in NATO mirror for me to use and discover my big data on top of Hadoop okay so this is as simple as it gets right so this if I score this is going to be a function inside datameer is going to come up as the the predictive model that I you know then I'm not model that we building using your net so that's really how pml can revolutionize this you know deployment of predictive models he comes with all the benefits of open standards as well so going back to the presentation that was it for the demo I'm sorry for the hiccups back here some resources so i have a blog if you're welcome to take a look it's called predicted that analytics that info i've been writing for a while and we have all of us a lot of subscribers of the welcome you to take a look at it for we have lots of contents and are his resource page on our website webinars community forums what white papers mostly for partners pml examples and tools and just you know a few months ago IBM was looking for someone to write a series on premium analytics and was fortunate to be invited and I wrote for articles called this series called predicting the future and I promise you actually write better than I speak so I hope you take a look I look at it so there is a link here so you know it's a developerworks has a video you don't need to look at the video please avoid the video but the content is pretty good so if you want to know more about predictive analytics and how it can be used in you know not just in terms of building predictive model but deployment predict mode is all covering this four-part series and finally thank you very much you

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *