Predictive Analytics with Alteryx



got the song yeah yeah anyone can confirm it the solids working Longaberger right so okay can see this right yeah so good afternoon my name is Rachel and part of the ds4 my class today will be about predictive analytics sorry about the fact that I'm always putting the ass in the wrong part of predictive and now formalities but it I think that it's part of my Portuguese classes so sure about that what we do today is it's a quick introduction to predictive analytics then we will go to all tricks and do fill it the size and I choose two kinds two different types of exercise that will cover today one about grouping and the other about modeling I think that those are the main there are the most important kinds of analysis that you do predict at an ALICE that you're doing the census so before going to AA tricks just a few words about predict and Alex for those that like me confirm another area another background and are not used to this kind of things so what exactly we you do what exactly we do when we're working with predictive analysis so in a sense what we were doing is creating a kind of a statistical model that you let's river that will help us you understand help us to figure out relationships in North data and help us also to have insights we can use a predictive analytics to understand better things in the present but also to predict things in the future those are the both things that we'll be doing today we'll work with you open that is something related to the data that we already have and they will also work with a regression that is something that we will be predicting the future what will happen in the future so I got this this the figure and the problem-solving we've advanced it analytics it's a course in Udacity if you'd like to check and to say it's interesting part of the courses obviously only the basepart are free and for free but you can see there are interesting things about interesting basic things to understand better what exactly is basically this framework is showing what we need to do during the use of the predictive analysis so first first thing we use it we need to understand exactly what we're trying to answer so we need to have a good knowledge about the data that we have we also need to have a good knowledge about the situation that we're trying to forecast after that we need to prepare the data sometimes you need to get more data from all the source and then when you have data north that at least the data that you think that a is enough to answer your questions you start the part of creating a mole of the model validating the model and then finally doing the presentation or visualization quickly we before going to the models or for the toes in Alteryx it's interesting to remember that that we have two types of variables we have continuous and categorical it's important because as we see now different types of variable will be used in different models different kinds of predictive analysis today we'll be working with one option that is this in this line we'll get a numeric data in no case we work with sales and we will try to discover we will receive nine options to open a store and then base it in the sales data from other stores we will try to discover which store is the best option so we're working with this fund will have a file that we have enough data to print to do the model it's numerical it's a continuous data that we work with and we work with linear regression boosted model and a third model to compare and we will be also working with segmentation that we will be cluster to analyze gear so few examples I think that in Noah days we can think in lots of examples where you can use predictive analysis I think that we are in the sense all the time being guided by by predictive analysis kinds of predictive analysis so the for instance the when we go to the supermarket the way that we see the products displayed in a in a shop it sometimes they choose this way not another way because they perform a denialist before trying to discover which are the combinations so for instance you put I don't know spanking someone can help with British jobs something like this which is if you do buy together back so everyone very well much yeah no milk and bread in any other kind of for a specific beans it's an interesting because recitative for us in Brazil there is a combination that is rice and beans are products that need to be together because we need to rice and beans all the time together so when you go to the supermarket who expects to find those things together it doesn't make sense here because I think that it's not the same that you don't have the same hallux of consumer butter yeah it's interesting because groping in a sense doing a cluster can help you to identify not apply said but groping can help you to identify these kinds of button between types of function or things like that another thing that you can use for instance you can predict the consume of energy and I note that here in UK the companies do it a lot especially because having something fun with us because we use or to hit the house we use gas not energy but now when it started the winter not the winter but it started to get colder the energy company sent us a letter saying that they would be increasing the value of for thyroid during those months and then we needed to Cowen explained oh look we don't use electricity through the hall so it will not there is no difference between or consume of energy during the summer or during the winter but anyway they didn't know that so then expect that during the winter most the consumers will consume more energy than in the other animals so here I put few situations that we can use the prediction we can try to predict things another thing is for instance how a virus would spread it's more in a scientific Oh area less than a commercial area another thing that it's interesting for instance about virus that's because recently I did before coming to the ds4 I didn't ask oh that it was about the food and move disease here in UK and one of the problems that they had was they worked with a terrible model that's the reason why it was a catastrophe because the model that they were working trying to imagine how would be the spread of the disease was completely wrong so it is something interesting because the model can help you but a bad model I don't know if it's a good way to describe the model but a mother that's not well-constructed well built can also make you fail terribly so a little bit about ultrix why we we use ultrix to build models to to try to discover try to try to predict things well first thing I think it's easy to use you don't need to know nothing about code you don't need even to know how the algorithm works it's something like for instance me I can use it and I can show you something in the end again I can tell you something about the data I don't need to have a lot of the lots of technique knowledge e to use all tricks to do that you obviously you need to learn how to analyze the data but you don't need to be an expert you don't need to know how to code another good thing is that it's easy to test so you can compare results and then you can choose what is the best the best and for those it's a formula it's a technology based in now you see when we go to all tricks that you the reason now in some of the tools when you see that they are it means that you can click and you can see how the code works if you know how to code you can engi and you can create new things so really today we will be working I totally automated droopy and modeling but those are not the only ways to work with predictions in in ultrix here I put up an image just to give you an idea that we have all the ways to and other possibilities and all the predictions that we can also create using all trees so let's go to our first example that is about grouping I will start with that because it's faster than the second that we work I think that it will be more fun start with the study it's easier and then we go to the modeling so we'll be working here with cluster analysis cluster analysis basically what we will do is try to create a group based in continuous data that we have about or I mentioned about or what the thing that we are analyzing we also can create groups based in categorical data that is the example that I gave before about the supermarket for instance it's a categorical data you know the products but it's not a number but today we will be working with numbers so we will be working with a cluster analysis based in key means that is well I think that at this point of explained while we were doing it it would be easier so basically just for those that have never heard about cluster analysis before the idea when we create a cluster is to create groups and I think that if you go here just to see whoops there is something how you cannot see my are you saying I will not say in tableau nano no oops just a minute because my there is something weird happening here in the transmission in the screen not sure yeah and its attitude to this no way drop down here yep spam here about it well I'm not with the best but okay we have here a few bubbles so we can thinking about those bubbles in a sense what what the cluster will try to do is to create groups here trying to make the distance between the bubbles that are inside the group there is molar possible while it will also try to make the distance between the bubbles inside the group and the bubbles in the other group the bigger that it can so if you try to emphasize the difference between the data that we are working oops sorry so it is basically what we do well while we work with oh sorry I lost here where we are please you can hold down the shift key and I press f5 to start the thing from the just like you'll hit a bad part and f5 okay so now what we'll do is we'll create our cluster analysis using all tricks basically what we have is received data about several beers and our client is creating an application to suggest viewers to its users so you would like to create big groups to group those views in big groups to start to suggest you for instance if you say each you marking your app I like this view then it will suggest you another beer that it's it's classified as similar to that gear that you like it so we receiver the frontal client data set and it contains information about the BSI so the bravery they style the ABV and they I've you in Basel on that will select the data that we need to create the four big groups here I'm showing one of the possibilities but it's not what we were doing here it's a visualization using the same data but you can see that here in this visualization what they are doing is you have the average of ibu an average of ABV but if you think about for instance beers that are here can you see where I'm Hank yeah if you think about beers that are in this area they are more similar to be as that are here then Tuvia that are here in the other side but if you get only averages they will be grouping it in the same group so that's the reason why we try to create another way to group this another way that we ignore this averages and we'll try to get the similarities about the views so let's go to our friend all tricks and what we have here is the file with the beers we click see here we have the beer it will read the location they style the size the ABV and the Ibo working with the cluster analysis we can only use the continuous data so unfortunately we cannot use a style for instance we need to choose the continuous data that we have here I use a bvn ideal because I think that size it's not very important what do you think without size yeah because I think that there is no sense in putting the sizes one or both so the first thing that we will do is let's do a select here oops in select what we can do is we don't need this number okay this first string but we can also in this select change it the ABV and the IBEW because as you can see we are seeing them as strings and not as numbers and you need them as numbers because if they are not being inspired by Oh drugs as numbers they told that we use to do the analysis not work so let's put them as the boss okay and here we go now we have the same data but without the first column and ABV and I've you are being identified as numbers what do you do now we will start the analysis and twister then I will discover this beautiful new part in all tricks there is a predictive group in this green path with grapes it's better to to analyze wine than beer I think but today we're analyzing beer so what we'll do we will get here this toe it's the center of key centroid is cluster analysis and we will create a few colored group where you put your groups that will be created will select the fields that we will use and we are seeing ABV ni view here only because I changed it to doable before because you cannot see that the other data that is marked as string you can see that we have things here we have more data here but they are strings and not numbers so we cannot see them here inside the kids enjoy these cluster analysis and okay we are using key means instead of comedians it's more for a reason of best practice than any other thing it's more use this kind of analysis and comedians and we will choose an or briefing we had it was to create a vision of four beers in four big groups so we choose here for four clusters it will be the number of clusters that you have in the end okay it done we have the number of clusters we have the key means market all we need to do is to run it and now we will have the clusters created if this toe is creating the clusters however as you see here in the end when it is finished running we need to be a bit patient because ultrix is analyzing the data so it needs a bit of time to do that but while we wait we can still analyze it because what do you have here is the creation of the groups however we will need to append it again because look that is the data that is in my out and here is the part where we can see there are code you cannot see that we have the oh and they are yes everybody follow me yeah the R is the part where you can see exactly what is happening in the path of the in the coding part of ultrix and here in the o you have the output that is what we need to connect again we for data to to put the groups together with the information that we had before the object cio get oh yeah the object is built would also know that I use that object oh yeah it's object a States for report yeah yeah the report is not very well obviously if you notice that it is probably you love it but I can show you just for cruises if you would like to see the how the report is people from maps it's the t-shi answer you like it it's not so who meets a lot of numbers that it's okay that's fine next life maybe you can try to understand here we go you see here in this area or beautiful report that is the report here is ha you have the average distance here you have the clusters it created four clusters you have the size of each cluster the average distance between the points the maximum distance between the points in the separation between them oh yeah it's great because now that both told about that the place where you can see the code 18 realities here open macro and then if you open the macro you can see here is what is actually what what ultrix is doing with the data so if you would like to explore after and you can do the same thing click on macro and then see what you're doing ok so now that we have for here or object load what we need to do is to connect it to the data because it's not readable not for us so we get now the append cluster ah absolutely here is let's get rid of those things here we don't need this we don't need this and hmm okay my object is thinking a little bit about life but in the app ain't true what we do is to get this data and now connect back with the other data that we have in or select here before you need to wait a little bit because I think that my computer is dying so a little bit a little bit more while this we can we can talk a little bit about the the part of the because I had I didn't explain at it but well it the name of the tool that we're using it's key central IDs in the key is related to cluster so in all phases is that it's really the reason that we there is this key in the name it's it's related to the name of the process that we are doing that is the clustering and with – with my ultrix not work you broke it yeah let's see no I will interrupt I will stop sharing my screen for a minute yeah it's still not working I think that my computer is old crazy okay I wait I'm back in few minutes I will try to to figure out what's happening with my ultrix question my desk manager is not opening of okay tree I will try to get something that is already done you can sprint sorry yeah yeah okay okay so we are in this part when my computer decided it it wasn't for my computer says no but now who says yes so after this part of the analysis know we were paint and the result of that been basically is oops sorry I will need to connect all the data again because I opened the ultrix again so just a minute let's drink beer ok cool tricks now we need to wait you to hunt it running again and if it all works will finally see or data with the groups if you have any questions while Oryx is thinking about life and they will write well you can house it has this differently the custard in table we can see after the results are slightly different I don't know exactly why but we can compare here after we can show it in in tableau and do the same cluster in tableau and see the results how do you know the reason why Balaam said so class is always built by throwing out random two points and then trying to work out the distance between minimizing innocent and I all remember keep yeah and just all trucks and tableau and up a fraction in the front random seed start point so your classes are going to change no I usually got significant yeah you can see now the prices in this example we can see how the groves will be different and will broken by ultrix in the broken by tableau if my altar exercise that now it's time to work so let's say here I mean that when you get to look at data sets having your customers up engage before you close to tableau speed up the process yes comments you know so yeah so I also teaches them a random iteration today we do the same in tableau you can check that how many you want right place it you can pick what it's used to as in how many clusters Yangtze Coptic how many times you want tea at Rio Mary Ellen biome and then and you don't have as much tuning power to the slope also probably because this global answer now or why the custard why they're number of costs this choice of chosen which you do getting Botox but yet you know more flexibility and control yeah and you can use cave medians rather K means you can use different clustering algorithms as well neither push comes to shove you can go use complete different clustering the programming tool for bar bring that up if you want to a lot um well so let's do the test in tableau now we can save this you Lisa and show it in tableau how the groups will work and then we can do the operation and you see that there is it is slightly different if you do with with the ultrix or with the toggle function to create clusters come on maybe yeah hmm ah theory now we can open okay so we were comparing ABV and IPO oops oh yeah yah I will already I will duplicate this then after we can do the conversion and here are our clusters oops sorry or pluses and but it can be all the way here it's that is the analysis made in ultrix the clusters that we created in our tricks now let's do the same thing creating clusters here in same number of clusters four clusters here is as you cannot say it's there is it is it's very different specially because tableau for instance created not clustered thing here and for instance this cluster is very very small compared to the others and here in in ultrix we can see that there they are more populated in a sense the pleasure have more beers each one okay so second part that you do second analysis that you do now is showing how to whoops how to work with models and in the case of the models different of what because what we did now was we had data about beers and we're only trying to group a data that we already have with the models what you try to do is to get an answer for a situation that is in the future we are trying to predict something in the future so usually based in information that we already have you try to to answer a question and to create the models you have two basic things one is the target variable that is the variable that we're looking for an answer and you have also the predictor variable the predictor variable are the variables that you use to answer your question so for instance we we work with sales and sales will be your target and we'll try to find predict you'll very Bo's that can help us to discover how the sales are affected basically what you do we follow these three steps first thing we will give to the overage the algorithm is already gone and now in ultrix what we'll do is we will feed the overage move some data they operate and then we will do a training and we'll learn from this data that we were given to them to it after that after this first result we will validate that this this process so we'll see if the algorithm is is telling you something that is okay something that can be real or not it's very that we will find in a very strange result and then only after this part of training in the part of the validation only after these two steps we will apply the results to tour data so a new question to work with models we have now clients that has a list of nine possible locations for its next shop the client what they would like to discover is from those nine possibilities which is the the best which is the shop that they should open first so to help us he sent and to other files with data about demographics of the shops that are already the stores that are already working and the sales of those stores that are already open so what we try to do with we doing no tricks is using this data about the stores that are working we will try to predict which of the shop is which of the possible stores should be opened first so let's go back to all tricks I need to open everything again because my ultrix closed all the things that I will read here so I will get here to input to input the data about the source in the data about the sales so my browse here is we have here the data about stores let's open here we have the data about sales let's open also and the first thing that we need to do with this is we'll need to here you can see we have lots of the Moga demographics but we don't have the sales in this and here we have on the de sales so the first thing that we need to do is to join those things to start to work with the with the result so we have here a store ID in the order authority we don't need to have stored it twice we run and the result we have now both things together that is exactly what we need because after we we need to have both informations in the same and at the same time because the way that the two works is if we get from this data the target variable and the the data that we're using to predict so it needs to get both things at the same time so the first thing that we will do is to create samples we will not analyze our data or full date at the same time because as I told you we need to first test the algorithms and then validate if the algorithm is working well so it's something we can thank that you are first make the algorithm exercise a little bit with the data no the day it's a little bit more and then after we with the other part of the same data we you test if all those things that the algorithm learned from the data are real or not in reality not if they are real or not but in this situation we'll test more than one model at the same time so at the end we'll see which model work it better so here we go we need to create an estimation and a validation we don't have here a huge data set so we can use all the data that we have here as you cannot think we have estimation and validation it cannot be bigger than a hundred because it's a person of the data so it doesn't make sense if I put here 90 in here I put 50 I'm working with more data than I have it's not possible it means to be less than 100 and it doesn't need to sum the sum doesn't need to be a hundred because for instance I can I could if I have a very big set of data I could opt it to work with let's say something like 70 and here 20 and I would have 10 percent that are not being used here they will in the end if you look the the tool they create samples you see that there is a need there is a V and there is an H the e is the the part that we used to estimate the V is the part that we use to validate in the age is they hold out it's a kind of a leftover the part of the data that were not using in our case what we will do we are using all the data so we have no data here in the age so let's run and create your sample and once you have the sample now we will test one thing that Paul told me today that is a good option if it doesn't work we'll ask for help here but because what well I think that I'm more with let's say spit yeah that is the the thing that we're building on despair in the NT 3-1 and as you can see there are a lot that's the reason why I came with first weeks before because I saw that yes you would be very visible after seeing that but the question is as you can see lots of things will be and at the same time so the problem is that ultrix estas you're on the beats slow so we'll do this best practice that for teach me today I hope that if you work if it not we do the other way and the hand wait so what we'll do here is I will put a container because this way I will put all my models inside the container because the big problem is while ultrix is running the models that test in the models it spent a lot of time it takes time so if we put the models inside the container after we can simply kill this part of the workflow and then it will stop running running line it will not work anymore so we will use three different models to to see which is the best one all those models are used to work with a continuous data so basically they have different ways to analyze continuous data and what you do here is just check which will give us the best result so we put here the linear regression the boosted model and the spline model what we do now is whoops here we go we have the treat each model needs the first thing that we do is to feed the model before estimation the data that we split before so I will give estimations for all the models and then I will now get each one of them so here I'm working with Lena and I will configure the models so the target variable I need to first start first thing I need to tell to my model which is my target variable you remember the typed variable that we were working here is I forgot for reading and here we need now to select the predictor variables we don't need I put all here but obviously we don't need all prisons the store ID if you're not interfering my data the site postcode also the site name also led to the longitude also and we need also to remember that sum of sales shouldn't be in our predictor variables because it is our target variable so now we need to do it for each one st. into the said no Target Field you put sales again and not get rid of store ID site postcode site name let's do the longitude and sum of sales every stance speaking in things in Portuguese this way you pay attention what I'm doing why I'm doing this for the third time so now I'll get rid of it a delusion the setting the Norma dollars the letter to the longitude GE allows vendors the commute isn't a dome a traditional procedure yeah I forgot it in graduate I said this but in poor to be okay repeat after me watch it oh gee yeah see everybody here is fix for to me it's amazing Chris that's the most important for to get in the wood there is no reason to speak Portuguese from Portugal story in Portugal so yes I can you say that people people watching I'm just joking my husband is Portuguese it's a kind of thing that with your beautiful home we try to compete to see which port is more important so here we have all the models we did the configuration next step is to run to see what you have it and now you see the big problem it's been signed yes you did time now yeah no I can carry on now I can I can start my part with Brazilian jokes in Portuguese something like but yeah you can see that that's the reason why we're using the container because we will kill this process after this box so what would happen if you missed out a sample step over again do you do now he'll write you can you just want to be our chick up over this for example the Penguins you get a process called overfitting so your model basically you're going to find a way to knit to each individual point run try to predict where the point should be or could be which way it's going to create a corgi under work silos on Cleveland air it's test it's okay there is quite a number of people our creator model it's a quite cold air balls also put it out in the real world it doesn't work hmm again generously there's more those have limitations one of the limitations that not ours and I show that I think that you're saying here stop f it and put an input because this way oh sorry an output because this way we'll do two process at the same time and then we will not need to wait again and again and again for the spot because that is the big big question because first time I was doing without saving and the problem is that each time you need to run you need to wait minutes and minutes and minutes so if we do that it will save this time after and as you can see here again you can see the report same thing it's all all those tools are also building are so if you go here again and click for instance worries or linear regression and click here you can see that you can open the macro and see exactly what ultrix is doing inside this tool if you'd like to read the documentation you can also click in help in the find more information about how the the little work and with what's the meaning of each part of the tool that we meet with you to configure you can click and see more and if this because if it works I will be very happy because this morning and I waited lots of time because of this finger friend and brandonandryan ended one problem we need to run again because I'm saving STD e instead operand saving as the right format so sorry guys we need to do it again eyebrows we need to save the format that we use after in Alteryx so I actually be it's way we up the can only we not just need to open this and we can use it again well Oryx is thinking about why if I can show you here here are few links where you can find more information about predictive analytics so here is the link for the nanodegree at you the city that is well you can part of the introductory part is for free the rest of the course you need to pay here is the part in the site of Alteryx where you find more about pretty predictive analytics benedita wrote lots of things about this file so you can go in our website the date school and you can find more here a she did one it's a part one about clustering in the depart two is that that we are doing that now about sales so you can find more there is also that's very cool it's an article that I recently read that it talks a little bit about the problems in use and over it's because it's as sad as I said before if your data is not very good you have problems with your predictions and here they are discussing that kind of things for instance algorithms that are not showing the best ads of jobs well-paid for women so it's a problem because someone created an algorithm a way that men are preferred to say this kind of information there is another thing that they discuss here about policin that is also interesting because if you use always the number of crimes and this kind of things you have the race call for a start to over polish an area bit and obviously if you are over policing probably you start to arrest more people that not necessarily committed a crime and then you create a problem because then you have more people arrested and then you need to polish it more and it's another problem that can happen so back to our beautiful ultrix that finish running what we have now is the model but the same way that when we we did the group we needed to connect the results that we had from the group in the cluster to the rest of the the data here we need to do the same thing again that's the reason why I saved it because if I don't save if you don't do this part what do you happen is that you have the workflow this way and then each time you need to do one of the following process if you need to run again and will be all your life waiting for Alteryx run the three modules so this way you don't have this problem anymore we can only input your data we will put all again and then we can close the container checking it so you haven't disabled to each other not yet see if they look on first ah the postage in here you stay careful don't fight okay no let's go back to hello the container again how does it so that little thing you got selected okay no yeah that's always clutter oh thank you say well yeah Wally and then say well cool let me put off no that's what that's fine just has been beta plot so if you can hold someone's bicycle the tip down or cooler your plot it okay we don't need to say that yeah but it happen again my containers appearance well yeah okay hopefully will not need the container because if you would do pretty deep we need to find where the container is that's a way to stop it the models to something running all the time that we need to because what we need to do now is okay we need to connect the data again to see if the the model is working well so we'll get this beautiful or and we'll put one score for each one and what we doing this score is exactly that we'll get this this data and we'll connect here with the V do remember that minutes and minutes ago we've treated or data in the path to use to estimating the parts use validate so we go back and get the validation and connect to or is for and again my ultrix is talking to run there is something not not very cool happening today no it's big okay not yet yeah my computer is pirate sorry and here we go on three minutes be patient please it's it's tranny you can see that that's a song but uh you can ignore it there it's only because we didn't run so what is that you in stress score and then I could you school out the modulus then we'll compare they score to the reality oh it's not easy buddy yeah after we there is a second part that you do not that is get the difference between the score and the sum of sales and then we can compare it one of the models that you use it and to see which of them we will have the best value in the end that they smaller difference yeah oh yeah because we know are predicting infinity oh yeah we not you know we did it the right in the series you're dating really with the bus exactly exactly for now we're only working with the best with our first and saying how the model works using best data and then after all the process we used it the best model to apply to the other file that I will put here with the new stores core now we're working with things that we already know that happens just to test the model just to see that it's great or not so let's run this course okay not okay yet I need the girl of the problem that the old program here in UK that has a computer says no you know that yeah I need someone with this while my computer is thinking about life a computer signal here is no see hit one it's in the last one and as you can see now we have a score for each one of the stores it's what we're doing here we are comparing this for the score is the sales predicted by the algorithm comparing the score with the reality to see if the algorithm is working well within this situation so for each one that we used to here we have the same thing we have this for sum of the sum of sales that was the data that we had for any score okay with all the scores what we need to do now is to first thing we need to calculate the difference between the score and the sum of sales so get here a formula that will apply for all so the formula let's color formula difference because you are calculating the difference between sum of sales and score and let's get here sum of sales less score however we have one problem because if we do sum of sales – is score sometimes I can have a negative number and the problem is that the following thing the following step is you need to absolutely yes we cannot work with negatives because the next step is we will sum all the results so if I have a negative it's it's also wrong for is that I have a minus sign in that 10 if I sum both you have a zero and now you have the impression that yeah this model is perfect it's working very very well but it's not true because I had both things were with a different so I you do here this way and doing I square a difference I will always have a positive number what will correct the result two apps there's Korean athlete your abs before brackets hmm I don't know say thank you very just wait another one of them my men is that what are just ABS hey but brackets grown-up about it I do it healthy yeah yeah man – it might get posted I don't know would you like to test yes right I haven't said there's an error yeah royalty right yeah it does work um you don't need sorry yeah we're quite good thank you sir – hello Abby RW yeah it's just a normal way of doing it yes it's another we can do this also work good there's not just let's not feel double if you be old double extra-textual it's not there is no well that's also it's a different way of doing it completely so so to be taking that to the hour of what if this after if I hat after aspiration after school Homer I am and then watch a tag it ^ so you go into the power of to square it so be comma – no all right so it's going to take that value okay it's the sport okay well so we need to do the same thing for all don't really just very fit so we're not getting that so don't we're going into square root is that we're not getting to each other that in an extra power then little half the things that's ultimately second kids are what type of value you got if you just own the night value that's just weird again what only right Lu average can you remember what it stands for but it's a type of error calculation just renders ya mean absolutely the centage error yeah well it's the r-squared value just hello yeah so the residual sweet me as long as you're consistent with waiver Nero check you are using doesn't really matter what any reticular using for this point comprising to say which is got the biggest variation from day on action so the digital square so they're all positive numbers and it's eating here yeah then you say what's your average error then you're done and that's rose by a hard yeah so the residual is what we call you squared xx erotically and we will not really use the results will use it only to compare a just three that you use some Western average of average each one yes so oops I think that okay I think that I drew more than one process because it's taken time to run so I will do this also because what we are doing here first we created only squared values so we have no negatives then we summarized all of those values I will do both things at the same time too because it's saving too much to run so difference that is the file that I created in the formal I will get here also it looks different there's some of the all of them I think that if you do the averaging with the same IP exit again you're all over dinner yes you next step and yeah the next step is only to compare you only need to compare I think that it you have the same result because here we are some and see the smallest value yeah but I think that if you do the average and get a smaller will be the same thing the important here is that we're creating a way to compare the compare the models and so I think that is you can use both both things okay we will run now and then what to do following this part and because if you run for a long time now yes once you have this values now all I need to do is to compare them to see which of them is they smaller why they smaller and not the bigger and one who'd like to exactly exactly that's the reason why we we need a smaller number here and a cool trick you didn't find the original objects just found the operation if you've an average of the whole square oh you don't want the the largest one which would give you this they would be the same as I'm looking similar to smallest sum but it does just that yet the difference between money knowledge some of the you can't average are giving you ascribe value which explains the amount variation in the model so if say they are square because I'm nine to twelve seven that means that the amount of variation in the modern aesthetics in the data and explained by the model 94 percent correctly that make sense now yeah so yeah the largest you want smallest any kinda do some little artistic economy average hmm yeah so where are they when you're if you're if you're averaging then you're gonna divide them by the same number one and number two okay so if you've got small about you on top and the same denominator you want the smaller and that will give you very good access is it the r-squared value that if you if you find the average like then get maybe a square oh you know just square e others yeah yeah and then you are extend but then the awkward dead yeah every just forget by okay so you won't need s'mores it's not that usually the R square R squared it's the amount of variation in the model which is explainable so what my large arse work is kill by demon now the closer to water Asperger's yeah yeah but if you take an average of lab why is it going to be a number so it'd be any amount to be I'm thinking about degrees because if it's beyond password is 1001 always okay so any mask where it hasn't been normalized for so you need to divide that value the residual needs to be divided by the actual yeah so yeah there's a step in their senses okay so I'm just don't want we don't same thing so many helplessness by the difference in size it could be any map dollars yeah even any we need to fight to go buy the same young birds yeah yeah I think even is but yeah yes but it's backed or sent what we need to do now is to compare to compare them is a way to do that now we can be we can we can check it just look but let's do something more professional and compare them in a more in a beautiful way so I knew here create our field colored model what I'm doing here is only creating that : colored model to join all this data after and then we can sort it by this ended ascendant anyway we can stop by anything just to make it sorry guys it's one of my main problems because usually I use the keyboard in Portuguese so the fix are in orbit place and then all the time trying to figure out where check the things line so same thing model we will see that on our output field model output view model also because then after it's easier to join here we are we do boosted model oops model and here is the okay on through heifer is created we can u n– in the data I'm doing two steps again at the same time just to make it quicker ooh oh we forgot to put that they are numbers and not and not oops no need to wait to fill that then same thing good thing to remember their numbers so they need to be described as numbers and a formula that exact letter is same here so sorry in reality opposite they are strings in this case and they are being described as numbers so let's drink your same thing like string yes because what will be inside water creating is this may be posted and linear and spine so they cannot be defined as numbers and they were as doubles that's the reason why it wasn't working so now we need to wait again here we go and then after all this process is when we in reality use the result to define the best sort so only after all those things only after playing with the best we will play with the future we first need to check if the algorithm is working it it's or needs or not so here we go we have four models compare it and we can see that the best one is the posted model because it has the smaller value yes everywhere agrees that there status or best model so what we do now it's to use the oh now we have a problem for Cal connect contain ctrl alt you not oh sorry why is it boosted my ctrl alt only way to go sorry guys are containers a little bit disappeared we'll find it bring on the other view to attacked the container face no you know what it was let's go that great interesting you know your containers called pain talk each other okay very angels that we talk people and all they can do is for something that they needed you all the time I can you talk books over Thank You Heidi oops I was thinking about with all my things inside the container but my ultrix is estranged again so I'll cure those things here and we'll keep only Wow delete delete okay because what we need now is so basically to get the same information that we we had before and we will connect it with the before the new data that is not a nor workflow yet so first thing input data here in just now we will see the data that we have about the prediction that we get now the data set with the stores that they are evaluating the new store census and as you can see the new store census in reality oops here is we have lots of demographic data and obviously we don't have a sales data because there is no sales in all new stores so what we'll do here we need the newest sorry try to organize it in a way that will be comprehensible but yeah we will need the score again the same thing that we did before we will get score but at this time what we'll do with our score instead of using the validation that the data to validate the model now we use the model data data we would like to predict something about that so now we'll connect the data about the stores that are evaluating with the data that we created to generate scores and ultrix not working again here is okay so run it and if it stops you can go to this the resultant see because the result is water exercise that it's okay it is that it's working here also no it's a very fall in what we are doing right now the connection of the estimation with the value of the new stores okay it finished now what we have here is a new color where we have the score what is this is for it is the prediction of sales for those stores that they are planning to open but they are not reality yet we are predicting the sales based on the same data that we the same demographic data that we use it to calculate it before the the same data that we use it to train or algorithm so now that we have here this is score the only thing that I need to do is to order they score to discover which is the shop that which is the store that would have the best value of sales so the only thing that we need to do now it's too short eat bathe aim oops the score descending and then with over which ether the best one why lottery is doing that you can start to do the what's the name of this an English when you put money to 2004 raises right collects no collection this big you guys you guys is that which story do you think will be the first one we have time or we have all the time it's on the big score yet so it's in 12% we have a lot of time – that's what you told Rick they're talking about bringing cash anywhere but in small so you can do that but hasn't made it yet yes they know but so they kept a container in construction yeah in relative at one that's because I the first one was to disconnect the container and now the thing is that because I didn't finish now the idea was to put everything again in the container and then close it because this way will not have those things in interfering you yes and the first time it that's sorry Gopal true now your explanation is better than mine well if you disable a container whatever is in it run all right so yeah when you take all your our tools that you're not using anymore or you need to situate your work that you're finished with Tucker salt container disable it and awkward don't run so you're out and drink you might need it later because we're probably looking at some stage do not just use you know when you when you're in the container you create an output and then the next bit you use the input you know as possible as I do not use the inputs that'll be a cruise sorry say again you know how before you create a container and created output and then add the rest of it you used in Gibraltar back in with an implement use they get it doesn't know this but it uses but again instead of the two Zapata uh the birth of modern rotate I got it again just to use it to pumper it was it was also in my container instead of taking the Gusev on lots of the container but you know taking the same inquiry use early use it the Grusin Y that's the ability based design yeah it makes all sense yes I could have done it use it only this base booster because it's exactly the same thing the same research because this was the result calculated with the estimation off with the validation so yes I could have done I could have used in reality this and connected it again the other thing and connected directly what that's exactly the same user so oops now that you have all stores sorted by score we know that the best store to open is they store number eight yeah and okay better was demonstration with it modeling as I as from so we have time so I only talked a little bit about this part of the limitations because I think that it is important those limitations I got from this the article that is reference it's referenced here so some limitations that you have is that the quality of your model or the quality of your prediction depends on the quality of your data another thing that it's very important is to to have it clear what are you trying to answer so for instance you're trying to answer something related to sales so you need to think well which kind of data is important to to answer that because sometimes you have a lots of data but you don't need to it's all in our case we we didn't use all the data that we had we didn't use location we didn't use the name of the store things like that you need to think which kind of data is really important to answer a question related to date to sales for instance another thing is this part of the training when you are training your algorithm you need to have data that is representative of what you're trying to answer after in here they are saying exactly that that usually use it data from the past to try to predict the future that is exactly what we did in this example another point is sometimes it's very difficult predict something because the thing changed a lot during the time so not all predictions are easy to to do I think that data goal here in UK can answer this question better than me because in the past elections they failed not only in the past they fit in the past two elections and they break sit and also in the past elections they were saying that the the what was the the thing they were saying that the labor would win and then in the end the Tory won with majority so it's not always easy to predict things there are things that are very tricky because you have variables that are difficult to to measure or to to have an idea and there are things that change over the time also so not always you can predict something it's it's not something easy to do sometimes and the last point is that I told you before about the policy things that the predicting for instance predicting in the way that you policy an area because sometimes the predictive analysis can lead to an ever increasing focus in something that was the example of obviously if you send lots of guys to do the policing of an area they will start to raise more people in this area even if the people are not committing a crime and then in your data you see lots of people being arrested here so I need to send more people to course so those are things that are limitations of the of the predictive analytics so it's it obviously it's very helpful especially when you're talking about business and kind of stuff but I think that it's also important to know that as everything in life there are limitations in it so I find more if you'd like to learn a bit more about it and there was my presentation if you have questions I don't know if you have someone watching us and I think the attendees are all here it's know if anyone would like to something no otherwise I'm finishing first any twist

5 Comments

  1. ARJUN S said:

    Hi Can you provide the sample dataset (beer.csv)?

    June 29, 2019
    Reply
  2. Herbert Dt said:

    was this an undergraduate group project presentation.

    June 29, 2019
    Reply
  3. Bridget Costigan said:

    The person is very hard to understand – every word is a struggle

    June 29, 2019
    Reply
  4. rohit somani said:

    Thanks for the instructional video. It was very useful.

    June 29, 2019
    Reply
  5. Moller Toma said:

    Can you provide the sample dataset (beer.csv)? I want actually try it myself using this presentation as guide.

    June 29, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *