DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics

hello and welcome my name is Shannon Kamp and I'm the chief digital manager of data varsity we'd like to thank you for joining the latest installment of the day diversity webinar series data insights and analytics brought to you in partnership with first San Francisco partners for today John and Kelly will be discussing the series are discussing descriptive predictive and prescriptive analytics just a couple of points to get us started due to the large number of people that attend these sessions you will be muted during the webinar for questions we'd be collecting them by the Q&A in the bottom right hand corner of your screen or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag di analytics as always we will send a follow-up email within two business days containing links to the slides the recording of this session and additional information requested throughout the webinar now let me introduce to you our speakers for today well-known industry analyst John lively is a business technology thought leader and recognized Authority in all aspects of enterprise information management with 30 years experience in planning project management improving IT organizations and successful implementation of information systems he is the president and chief delivery officer at first San Francisco partners also joining us is Kelly O'Neil Kelly is the founder and CEO for San Francisco partners having worked with the software and systems providers key to the formulation of enterprise information management Kelly has played important roles and many of the groundbreaking initiatives that confirm the value of VI m to the enterprise recognizing an unmet need for clear guidance and advice on the intricacies of implementing AI m solutions she first founded Stanford for San Francisco Partners in early 2007 and with that I will turn it over to John and Kelly to get today's webinar started hello and welcome and there's no hope everyone is doing fine out there good morning good afternoon good evening where ever you may be this day today we're talking about analytics in a bit of detail to be more precise descriptive predictive and prescriptive analytics and today's offering will be a bit on the educational side a bit on the side of putting things in perspective with our annual topic this year of data insight in analytics before that we have a couple of our poll questions which we like to ask about this very dynamic and busy discipline so first of all get your Mouse's ready or your mice whatever what type of statistical analyses do you use or plan to use and you can choose multiple answers so we will allow this about 30 seconds for this to go and we'll take a look at that and we will move on to the next one Wow everyone is typing in their answers there today you don't have to be a data scientist to listen to this today in fact we kind of hope that you're one of those people that will be supporting them or thinking about supporting data scientist or peripherally involved or looking over the cubical wall at things and going I don't understand a lot of what's going on but sometimes you feel like you should and that's who we're kind of pointing our little session at here today we have I think that we have an answer yes we do and we have no answer is the biggest one and we have a dead-on tie between descriptive and predictive very very interesting findings they're not too far away from I don't know ok let's go on to our next question then and if you don't want to answer one at one of these weeks we're going to ask you why you don't want to answer then you're going to have to answer all right how frequently do you use too difficult analysis in your work so if you don't use it put it there or less than once a week once or twice once a day whatever please answer that and we'll let the time race by for for that one as well and then we will get started here now when I used to be in radio this is where you did the weather forecast but it would be a whole lot of different weather forecasts at one time so I think we're not ready to go here on the result and we'll take a look at that and away we go and we had some questions on the first server some ask you some questions for you Shannon on the first question of hole didn't seem to work right all right do we have the second one ready to go here I don't see the answer I'll keep an eye on it but will we have looking at go we have a lot here there we go most people don't um don't using them now okay so that's very very good so we're in an educational mode today and that's kind of what we were we were thinking so we're going to start with an overview and a definition and we're going to take a slightly deep dive into all three of these things which is what we're supposed to do and we're going to follow some examples we're going to pick on the retail industry today because that's something that a lot of people understand and then we're going to hopefully give you folks some takeaways to take back with you from the end of this webinar so let's get going here the price let's just talk about the process first of all of statistical analysis because we're we're when you do this type of work what you're really saying is we need an answer to something but we don't have the resources to get all of the data and have a comprehensive understanding of the situation so we're going to take a subset or or and we're going to make statistical inferences and there's a lot of different ways to do that no matter which way you do that however there is a sort of a method to this and that is to have an die hypothesis and there is two results of verifying your hypothesis the null and the alternative we need to have a key excuse me a data source and then we need to prove our hypothesis and I'm going to go over this process here just briefly because the thought process is very important if you're trying to understand this line of thinking and what data scientists are doing I think well I think you'll find some surprises today versus some of maybe the common exception of data science and statistical analysis so our first step here is the hypothesis and the null means that well we think this is the answer but we find out that any any thing there's a lot of chance in our answer and then of course the alternative is that we do have a regularities but they are going to have a bearing on our conclusion so what we're really trying to do here and we're going to use this example of a retail chain and the example we're going to use is we have a we think we think our hypothesis is if we train our sales associates better we'll get more sales so we're going to have a sample of data to do that and we're going to think about how what might happen here now now if we have the experiment one with a null result of no difference in the amount sold between who trained and once trained then we would say okay well then it doesn't matter but we might notice in one experiment or with our data that the difference in the amount sold there is some difference in the amount between the people that had some training and not but but at that point we know that there's a difference we don't know whether there was some type of whether that was good or bad but we know that's a difference in the second experiment we kind of shift our our view our hypothesis that the salespeople did better and we focus our results in our analysis on on the fact that if we did train and they sell more on average so there's a difference between more sold and not knowing just doing a comparison and then having some type of indication that they sell more on on some type of average type to compare them so the takeaway from this slide is with the hypothesis is you have to understand exactly what it is you're looking for it isn't just that the answer is going to leap at you you have to understand the type of answer that you're looking for secondly is what are the appropriate data sources we don't want to devote that first and foremost yes we're talking data insight and analytics this year the theme is big data ish but it's not necessarily you can do this with and it has been done for many many many decades without big data this big data was preceded by many many years by statistical analysis so it's also the data that you don't need there is a tendency to collect every possible thing and then dive in and hope there's an answer but the data scientist isn't going to look at everything all the time there is going to be some setting aside of data to use there might be consideration of external data the key here like any other data insider and analytics or bi or data warehouse is what do you need it for what condition is it in what's it going to cost us to handle it and all of that and all of those kinds of things and that's kind of that's kind of an easy one for most of us data folks and the last one is we have this hypothesis we understand the type of null or alternative result that we're going to get so now we have to figure out how we're confident and this is where we get a little bit into statistics you can have some incorrect conclusions if you don't have a realistic view of your confidence so we're going to look at our example here we're going to take a 95% confidence interval now we don't ever pick a hundred percent for a confidence level because if we needed the data to be a hundred percent certain we wouldn't be doing statistical inference we have to look at all the data and then have an absolute firm conclusion so what we're going to say here is if we can get within 95% we're going to accept that and that's going to give us a bit of a margin of error or a margin for error and we'll have you know we'll be able to work within that we have a couple of errors that can happen that we reject something that we shouldn't have or we don't respect reject something we should have that takes some a little bit of training to on how to do that but you're going to hear those terms with the data scientists and you know the key here is which type of error is more detrimental to your investigation and then you study your data occur accordingly what would this quick look at our example here and we'll just keep moving along is is with our confidence level and there's a lot of numbers on this slide all right you don't need to understand this whole slide the key things here is you have point zero five that's something called the outfit notice that's 100 minus 95 gives you point zero five or five percent all right and then the sig 2-tailed that is true through the analysis we if we don't exceed that point zero five with whatever this fig or difference is in in the run of the variances here we know that the data because of some statistical theory is is proving our hypothesis now again we're not going to dive into all a lot of this in detail today just to show you that there's an example here that says that it looks like wow you know if you look at the bar chart if we train people we're going to get we're going to get an uptick on sales and that's that's pretty good now this doesn't take out common sense common sense would be well at the same time we did this there was this wonderful new product and everyone flocked to the store to get the product so you have to use common sense with this kind of stuff but this is the beginning of of understanding how this types of analysis work so I'm going to discover the top three real quick and then we're going to start to dive into them one at a time very very briefly descriptive analytics so it uncovers insight from things that have happened so what you know what happened we have up some data and we take a look at the data and we look at what happened and have some deeper understanding there and we call that descriptive then we have the next one which is predictive which is helps forecast behavior and that is what could have now what's interesting here is when you look at the market and the literature everything is called predictive analytics or advanced analytics right now and what we're finding is that they're pretty much referring using the term predictive analytics many places to cover all three it's important for you to know that there are shades within this advanced analytics world predictive is the second the third is prescriptive and as what should be done so we do some analysis and some actions or activities are suggested by the result of that analysis at this point I am going to hand over the descriptive topic to Kelly catch a little beverage a little drink and we'll go from and then we'll pick up from there Kelly take it okay sure so descriptive analytics are really the backbone of analytics and although as John mentioned they don't really get much credit these days but really descriptive analytics are generally a good starting point for further more complicated analysis like predictive and prescriptive and while the findings of a descriptive investigation may not be as exciting as a complicated model you wouldn't actually be able to complete the more complicated models or even know whether they're appropriate without descriptive analytics so wanted to spend a minute talking about descriptive analytics because they are still very valuable and if we look at the example regarding the salespeople training that was a type of descriptive analytics it was dealing with means or averages so what we're going to go through here is the two primary types of descriptive analytics and then we'll walk through another example again using a retail chain so there's two main types of descriptive analytics measures of central tendency which is what we saw in the previous example and then measures a dispersion so measures of central tendency most people are familiar with and used commonly the mean or the average so that is the average in the previous example it was the average sales per person the median is the most common or sorry no the median is the middle of the road answer and then the mode is the most common answer so highest frequency so the second type of descriptive analytics are dispersion measures of dispersion our range so what's the minimum and maximum and the difference between the two this essentially tells you the raw spread of the data the variance is the difference or the average degree to which the points differ from the mean so how similar or different are the representation and the data and this essentially tells you the difference between your mat not between your maximum and your minimum but between the average and the tails and then the standard deviation is the most common way of expressing the spread of the data standard deviation is found by taking the square root of the variance and this is the most common way that people measure and just and discuss the spread of the data set now if we just take a quick example on the right hand side of the screen what we're looking at here is a buying analysis again using very simple data but as a way to express what this descriptive analytics example might be so in this instance we are looking at customers the number of items purchase in the amount spent in an effort to get to know our customers overall and so we here we see that the mean or the average amount spent is six dollars and 50 cents we find that the median amount spent and then the number of items that people purchase which is most commonly people purchased one item so within this retail chain example we can see that there's quite a big spread of the number of items purchased and our conclusions of our customers might be swayed if we look just at the average amount spent and we don't understand the median or the mode so in this instance there's if we're looking at the number of items purchased the number of items purchased are quite high on an average basis but the most common purchase amount is just one item and so this is where you can see where the measures of central tendency can give us a wide view not just looking at the average items purchased but looking also at things like the median and the mode to learn again even more about our customers and their buying habits within this retail chain example so with that I am going to turn it over to John to talk a little bit about predictive analysis and then we will continue to take example of predictive analysis and then lastly prescriptive analysis as well thank you and I could have predicted that this one was next I'm sorry that was a little bit of when I took statistics in college we definitely needed levity and we didn't have it but we didn't get it there either did we okay predictive analytics so we some folks because of the name go over predicting future events and that's not really what's happening here it's more answering the question what could happen based on the factors that were that we're putting into some sort of model and so the models can be very simple by looking at the factors are very very complex and crunching through a lot of things but we're looking at what could happen the the example here of a sentiment analysis so and we see this a lot now with tweets and organizations going and finding out in the morning that there's bad tweets or good tweets and and and and getting all that so you can you can start to say well if we do something this way we're going to get a bad social media impression things like that lots of models can be used here forecasting simulation regression classification clustering and there are many many many more we're going to take a look at just a handful of these just to give you an idea of what's going on within these models remember the key here today is that here's the kind of scenario we talked about we put this together that you're a data architect or a data engineer or some store and you're sitting in a kickoff meeting and all these words are flying around you and this helps you figure out what what folks are talking about because maybe the data management person or a data governance person or a bi person who's starting to look at an environment where a lot more sophisticated things are going on you might have something and contribute if you understand this a little bit better so that's kind of our mindset if you have any questions don't forget to enter them we do leave time at the end as we try very hard to get to the questions and then write our answers and get those sent out the last week we broke our record we had things turned around and about four or five hours okay forecasting the first one here is just you know we all have seen this kind of taken some points on the line and dot them up and then kind of extrapolate what that line would look like and we take the means of some various periods one four four would say give us period five and then two through five would six we when we have kind of a rolling set of means there and then it smooths out notice our curve the brown curve before the blue part is kind of jagged but then the predicted part is smoothed out well because some type of smoothing occurred there now interesting with any forecasting if you can pass data to give a rough projection of the future well the more data you have for one thing here the the more variable that this can this can become when I'm sorry when I said more data it's like the period of time that your your forecasting okay I mean if I were to take this line and say what's going to happen in 2050 we pretty much as rational people go to not just going to happen between now and then so again you have to use common sense here but we're using past data to do so what prior example here with a retailer we kind of took some sales for store C we plotted them we did some type of smoothing a technique and there's a bunch of those you can use and you can try those out in Excel if you want and we we got our little line pushed out there into the future into the years 17 18 19 and 20 the next technique here was in predictive analytics is simulation now and again there's a lot of ways to do this but what you're trying to do is in a simple sample data fashion run a model as it is what they say it's simulation now most of us are familiar with simulators like in airplanes or or climate simulations and we see the result of really really interesting models all the time on the evening news or weather channel or something like that so this is this is in that ballpark there the queueing model is a really good one is used a lot because it's wait time queue length anyone who's been in a a Costco or Sam's or a grocery store of any sort and wondering all of a sudden the lines go from five deep to two deep and they turn things on somewhere someone has done some queueing modeling and a little indicator when off somewhere that it's time to open up some more lines and then there's been some analysis behind that next is the discrete event model where we can when you can't use Kealing we can go look for bottlenecks and then the last one that we'll talk about is the Monte Carlo simulation which is a very sophisticated type of simulation scenario type thing but it is used a great deal we're just going to look again at some examples here here the queueing so to take away here on our on the highlighted things we have a situation here in our retail store where where customers are arriving and in certain scenarios we have a wait time of 30 minutes we oh my gosh we that's that's too long this might be say service in the layaway department or the automotive department or something like that and we find we got four people an hour we can handle six an hour with two people well with one person people are waiting we had a person but now the utilization goes way down so you're paying people only want to work on third of the time well we that's another thing that we really can't handle so we say well what if we just trained people for more throughput and drop Morada out of less people so if I up the service rate to 10 an hour for the one server that's scenario 2 utilization does improve the wait time does go down a little bit from eleven point three to ten the probability that a customer waits does go up however that time they're waiting does go down so you can see this is kind of way to help you make your decision notice this is what's the difference with them where the name predictive is a bit of a misnomer it's not telling you what's going to happen is telling you what can happen based on the data you still need to make a decision based on this data it's not a cut and dry answer you're going to ask yourself a lot of questions like how much do I invest in this training to increase the service rate you know will this person quit if I make them work harder uh there's something else people can do if they're not helping a customer if this this utilization rate of 60 percent or 66 percent what can they be doing something else there's a lot of again this is where common sense comes in but this is a pretty kind of cool way to break down and help you make your decision in and again that's what the queueing type models do or really help give you a lot of alternatives of course that's what predictive modeling does it gives you a lot of these relationships that tells you what could happen not really predicting what will happen the best example of this is the Monte Carlo simulation where you have a lot of variables and here we have a an example of we're having our retail thing again and we have all kinds of we're going to maybe build a new store or should we built a new store and here we have a old maybe 20 18 or 20 variables normally when you do these there could be hundreds of variables and we put in one of the ranges that we can tolerate and the simulation runs all using random number generation around all kinds of permutations of this and comes up with what things would look like if all these things happen well again it gives you a bunch of data to look at but then you have to look at a particular scenario and say oh that's what's going to happen if all these other things happen this is a really cool one and this is so cool and so in the men and some tools it's an upcharge and other tools it's a much requested feature for statistical tools to have within them the last one is something we've got two more to go here it's a real quick regression regression analysis it's it's one of these understanding your independent and dependent so that's the old statistical conundrum of causality and correlations so now we can start to get some things to maybe help us work through that so for logistics it's a good one you know if our stores closer will someone shop there well they go to the competitor you can look at linear relationships for example daily store revenue by the number of customers and enter the store you know in other words if someone enters the store are we going to have more revenue can we tie the two together and then we can we can learn how to see with things tied together with this type of of modeling you know and what you'll find in the case of our example here is that you can actually if people do come into your store we can predict revenue so I do have a correlation between a head count and a revenue forecast you know it's really important to understand accuracy here again use common sense here there's lots of ways to do regressions but that's the general idea of this type of of model the next one near and dear to my heart because I have had in my professional life a lot of uses for pacification and costing and that's that is where you you stop group things in the common characteristics and then see how they relate to others for example for armed the marketing team you want you want you want your marketing team to pay attention to social media for your grocery store so they might do some type of sentiment analysis and and then classify the content of various post to positive or negative and then put those into some type of model so we can the colors would indicate perhaps the sentiment or a category of something that would take a look at the grouping of revenue numbers and then against the cot you know the number of items purchased and here you can see well you know of course everyone looks at the upper right hand corner or lower left-hand corner things like that again remember this is showing us how things will relate now I have to look at this or the data scientist or the business person have to sit down and look at this and then they make their decision front from here clustering there's a lot of ways you can look at clustering for our example of the retail changes supposed to be look at the data from a customer rewards program and you see customers items purchased in the year and then a total money that they spent you can you can take that data look at if the rewards program is working you can see if they're getting some type of rewards feedback did the targeted promotion work should you do more targeted promotions and all of us who have shopped or have a credit card or anything like that we we get things in the mail we get little messages on our phone we log in to Amazon to buy something it tells us something this is the type of analysis that is is is making those types of things come to mind with retailers or anything and we actually we talked about something that we use and we actually use a form of this in our practice in terms of really large organizations with lot and lots of data requirements we will cluster organizations when we have hundreds and hundreds and hundreds of information requirements or uses of data we will cluster them by the categories to see where they sit because we might know where this has helped us in the past as using statistical analysis in doing data architecture is in a really really large enterprise you you will always be challenged that why do I have to standardize things when my area does its job leave us alone and using clustering you can say your data requirements match 60% of the rest of the organization's data requirements therefore the burden of proof is not on us the burden of proof is on you what makes you so different that we have to spend the extra money to have a special architecture for you and then and that's the exact same thing we've done that type of analysis in our architecture work so that's why this one's kind of a pretty cool pretty cool example so from that one we're going to move into the really cool one and we say that one for Kelly and that's predictive analysis prescriptive analysis did I say predictive again he did well I thought mine was cool but actually prescription is prescriptive is much cooler because that that pushes a lot of people's thinking to when something says you ought to do something right that's that that's kind of a neat neat oh there's a neat cultural aspect to that which we've talked to in our other webinars but take it away Kelly on prescriptive I got it right that kind sure okay so if we go to the next slide I meant to be advancing these sorry okay great thank you okay so this is our last category we are going to take a similar approach as we've done in the previous categories in terms of talking about the most frequently used types of analysis and then providing an example to give you a feel for what this could look like so as John said the prescriptive is really what should happen or what should I do realistically you can get some of those answers via descriptive and predictive analysis as well so it's not to say that a prescriptive analysis is the only way to figure out what you should do but it is a way of using statistical analysis to base it on past performance as what should happen usually prescriptive analytics it answers explicit questions that you're looking to solve to improve your business and usually it focuses on maximizing profits or minimizing costs many of these are done through programming such as the four examples that we have listed here but some of these you can do more simply in a more manual way so let's just go through and define the four examples that we've got here and then the next slide will actually show an example of linear programming and linear programming is used to minimize or maximize output again usually minimizing cost maximizing profits based on multiple variables so for example where you have a limited supply of resources such as in the next example our limited amount of resources storage space maybe it's manufacturing time assembly time 80 it's a limited amount of parts etc and each product uses a different amount of resource and each product provides a different amount of profit so in this instance linear programming just as its title says is that all of the variables are actually linear in the way that they are represented nonlinear programming is one in which at least one variable is not linear so it's just more complex complex way of looking at how those different variables relate to each other and therefore what the prescriptive output would be integer program is a subset of linear programming in which at least one of those variables must be an integer or a whole number so there's some typical use cases for this for example capital budgeting is usually using integer programming where you can only order you know five or four of the five product options sometimes when you're looking at warehousing locations again I'm going trying to continue with this retail example where you must minimize costs associated with transportation between a warehouse or store location given a specified route this can be used in scheduling so let's say you've got ten salespeople who live in various parts of the city and five stores in different locations around the city where do you want to put those salespeople not just based on their location but also based on their skill set their training around how they've been trained to sell particular items that are represented etc and then mixed integer programming is a subset of integer programming where some variables are constrained to be integers and some are not again for typical types of prescriptive data analysis so we're going to take a minute and go through a linear programming example and in this example what we're trying to solve for is that I guess rust-colored row at the top we're trying to determine what is the optimal product quantity to order across our five product lines in order to maximize profit so our goal is what is that total profit and what is the the maximization of the profit in this instance we've got some constraints so you'll see below those constraints that we're trying to you take into consideration each item takes a different amount of storage space each item has a different degree of spelling effort so the storage space can be very specific the selling effort is probably a scale that has been created in a pro and agreed upon and approved by your stakeholders minimum order is bound by the provider of that product or the manufacturer so in this instance we're looking at product a provides a has a profit per unit of five dollars storage space is 0.05 has a low selling effort so it's 0.25 versus we see that product e has a high selling effort of 7 and it's got a minimum order of 100 so you can see how all of these relate to each other so within a linear programming example you can actually solve this by hand or graph to create a graph and you can then see how that graph extends in order to find the profit the maximum profitability per order however it is of course much faster to do this using software so using linear programming software we can solve for the problem below and you can see that the linear programming software has given us a solution of ordering the minimum amount for products a and B and C but for product D we want to sorry I'm getting this wrong the minimum amount for products a and B we want to maximize our order for product C and then we're looking at getting the minimum amount for e and D as well and what we find is that we have a maximum profit for this order of depending upon the scale either 14,000 or 14 million dollars now one of the things that this also tells us is that we have some unused storage space so if we look kind of at the bottom here on the towards the bottom right hand corner the output of this also tells us that we have used up eight hundred and fifty two point five of our storage space when there's a thousand available so one of the benefits of this type of programming example is you can actually get additional information that may be helpful in other sorts of decisions so if you have some leftover warehousing space maybe there is a different product line that you want to use for that storage space or maybe you want to use your human analysis to modify this order just slightly to take advantage of that additional storage space so again there's the output from the linear programming software that solves for the profitability and then also taking advantage of some of the additional information that is found as a result of that linear programming example I think that's back to you John Oh Oh already already so I we went through those fairly quickly uh what I wanted to just revisit on the predictive and this is what I mentioned something culturally I wanted to come back to that and I think you you mentioned it was you know this type of analysis is the one that's going to say well you know this product is not the most profitable it takes the most space up or your order quantities are wrong so you might want to consider things like cutting the product not selling the product outsourcing the product all kinds of things and these are the ones that this is the type of analysis that can you know raise eyebrows with a lot of people the other ones will say well you can consider this you can that here's the ramifications this is the analysis that goes oh really and people will will open their eyes ends and be and go I'm not sure we want to do that excuse me so let's kind of just revisit these and do a comparison here for a little bit and we have a few questions that have come in and please remember that you do have some questions we will do our best to answer them and if we don't get them all answered we will put those those answers to paper and get those to you in the near future so the most common of are three types here is the descriptive and I know one of one of the questions that came in have kind of well we can answer it while we're going through this one here the best practice here is to perform this at a time understand your means modes standard deviations this is this will help with your hypothesis this will help with your null or your alternative answers that you're looking for this is that initial pass someone's going to make at the data and if to see if the even the data is worth using so this is a very important type of analysis that is going to be done then after that is also predictive maybe you want to know what could happen if we understand all the variables again or what should we do once we have all the variables understood and we're looking at what that's saying we run some additional type modeling and it tells us based on certain assumptions you should do X or Y is there a precise step between all three of these no not really they can it's like anything else in this world there's going to be indistinct borders at times between these but but there's three distinct thought processes with these really how much time do you have obviously when you get into predictive and prescriptive you're going to have to run a model you're going to have to stage some data in whatever tool might run that or you're going to have to make sure that the data that you have position is visible to some tools do you have the right people to knows and understand those is the data accurate enough to support your hypothesis not support the answer support the hypothesis and support the analysis you're doing don't forget not every the same data set can support one type of analysis and totally not be able to support another type of analysis and that's again something that the data scientists or the analyst is going to be considering and when you're supporting those folks or talking to them a good thing to bear in mind that you know one SuperDuper great thing they did a few weeks ago might not be repeatable on another data set then how popular accepted is the model this is where you know if you have a model that says don't sell products see anymore but products see is the product that great-granddad founded the company on you've got a business dilemma you have a conundrum now because is something that might be your brand or your visibility isn't really doing anything for you so subscribing to the that's how we've always done it may not work but on the other hand then make sure that cake holders are aware of what type of analysis is going to be coming out of some of these models these models will give you answers you did not expect and they will give you answers that you may be uncomfortable with we've seen this in many many disciplines those of us that travel a lot are experiencing the I don't know what word for the ala carte ization of air fares and you're going to get charged to put your money or your bag in the overhead if not you're not getting charged at all ready this is a model that came about based on a real keen analysis of data and spending patterns and what people will their pay will not pay for and again the results little uncomfortable for for the stakeholders but we will see what happens the recent election in the United States had one set of data scientists say it's going to go this way another model in another country said it's going to go another way one model turned out to be right and the other model turned out to be incorrect so again common-sense reality those are the kind of things that we want to bring to bear on on all of these so to review the script of what happened predictive what could happen based on a bunch of variables and then prescriptive what should happen if all this other stuff happens now the difference someone one of the questions was why is clustering not considered descriptive well because clustering is taking a lot of variables into play it is one of those simulations where I have to categorize things and then and then I can try different categorizations and different permutations of the data whereas with the descriptive there's pretty much you're taking it as is where is and in repin describing some or less why it's called descriptive you're describing characteristics of that statistical sample so that's why it's it's that's why the clustering is not put in the descriptive category why it's put in the predictive category let's see here what Kelly I think your time office go ahead yeah so before you go on to this slide I think another thing just to consider is that you don't have to use these in isolation and so we did talk about how descriptive can be a foundation to validate and what you're what information you have can you do a more complex model but predictive and forecasting could be fed into to the output of a predictive model could be fed into a pre scripted model right so it's not thing you would be using either or it has to be one versus the other but there's there might be benefits in the way that you integrate these different sorts of models to either provide that additional information in terms of you know what code versus what should happen but to get to that next level of granularity so that's one thing I wanted to highlight on this slide and then the second thing just to add to what you were saying John is that this last bullet point I think is very important and it's not just making sure that you have validated with your stakeholders that there is a resonance and a consensus around the output but the model that you're using or the way that you're getting to the output is accepted so this could be because of a level of understanding this could be because of a level of inputs that go into it but it's just as important to ensure that you have agreement that the model is valid and meaningful in the same way that once you get the output in order to action a decision you need to get agreement and consensus around the output of the model as well that's all I want to add very good well let's just stay on open here and we'll give our audience some takeaways and thanks again everyone for listening here we do have time for some questions and we probably have some room for a few more I think I haven't looked at the let list in the last minute or two but it seemed there was and and thank you everyone for hanging in there that's a nice really good turnout here and we really appreciate it and we hope where we're helping key takeaways today you have to plan this out telling what is kind of talking about that you have to have some awareness and inconsistency on how you do these things it isn't there's a perception out there that the data scientist just died and some miracle occurs it's know if there's discipline there's there is science going on here there is not a replacement for common sense so don't don't just take everything for face value we had a client a few years ago who and this is an example I gave in one of our first talks the initial recommendation was close all the brand's job close some huge percentage of branch offices and it turns out that was driven by an incorrect assumption on date if someone didn't use common sense so you know you have to be careful you have to use some some common sense here there's a lot of resources out there on this stuff but the key when you're looking to help understand is the word applied statistics if you run down to the local university and grab a 400 level statistics course book if it were me my head would explode you need to look at how some things that are applied and gives you some good examples big data is not required data insight it's an analytics on the name of our webinar here it covers big data but you can use this stuff with little data one of the questions that came in or Kelly is someone would like some examples of tools that's the three types of data now any of your long-term statistical tool suites which are well-known so I'm not advertising for anybody here spss that's right they work in big data environments they work in middle and small data environments they have the ability to do all of these things and our price to configure in various ways but your ear your Alteryx and other types of tools also have big chunks of functionality that cross all of these as well when you're looking at them and picking them looking at the type of models you want to run is a big criteria for your evaluation of these things a basic understanding of statistics I get that Excel is a useful statistical level – believe me yes acts you know if you wanted just to start to play with this stuff yes Excel you you can smooth curves you can do exponential smoothing logarithmic smoothing you can do all kind of plotting you can do clustering in Excel if you really get fancy you can download or purchase a macros that will go on top of Excel to do some even more sophisticated stuff there's a lot out there that you can you know and I'm in the old days and I'm dating myself you know you could you could put too much data in your linear PC and and things tools wouldn't work nowadays there's a whole lot more data you can stuff in there on your desktop and and go crazy and I get Kelly you can weigh in on on this one here a basic understanding a statistic goes a long way as we were putting this material together Kelly and I had to revisit some some stuff that we had done in the past and had done a little bit of it it's been a few years since she did it and or you heard it in college and you never got back to it or or I worked with some people for example that did all of that and listen to them and saw their results but it's been a long time since I myself sat down and did something like this and read the brief refreshing our awareness was invaluable for for this particular topic and for the whole data insights and analytics topic right I mean it really helped expand the vision of how this stuff is going to be used and then how do we support it and how do we get the data in the right place to do it it was a great refresher for us as well here are some examples we have a bibliography of Statistics in plain English when predictive models fail detect targets of topic out there in a podcast on statistics proving that there's a podcast on just about anything I guess nowadays so there's some other sources for you to consider telling thing to add here before we take a look at the questions no I think that this is all really good and I do think that we did cover a lot here so there probably are a bunch of questions that we could go through and of course anything that we don't get through live we will answer in written format sure well here's I wanted to just pop in data analysis or data analytics is there a difference and which is the correct one that's an interesting question I think analytics data analytics is more a label of the discipline and perhaps data analysis is a label for a process that's how I would view it tell me a thought on that yeah I would agree with that I think that that's a good way to think about it yep yep I thought implication is that analysis tends to feel more simplistic versus analytics is the bigger fancier picture but I think that that's accurate John one is more of a practice Mia there's more of a process yeah and I think and this is a really well-timed question because there is a and it just happens in all human endeavors you know labels get assigned and then there's an impression and with the label you know someone who is a data analyst or someone who does data analytics the the impression might be that the latter is much more sophisticated smarter more highly paid person but the reality might be they're identical the reality might be its reversed in the reality it might be yeah is a much more sophisticated to just depend on things the takeaway there is a look underneath the label and look at what they're doing are they running these types of models are they just moving the data are they maybe just doing descriptive models and coming up with some relatively simple understanding of the data versus moving it up the curve so good question I think it is you know keep the functionality and what you're doing and what the intended results and what you're going to do with the results are really the important considerations before before the label there let's see we have another question here data-driven versus data informed big deal or not I'll take you get out of that one Kelly this card I think that those well my opinion are those are levels of maturity and some organizations need to pass through data informed before they can be data-driven so in my view the difference between the two is that the first is that we are using data to make decisions we are using data to identify trends we are using data many times to validate a hypothesis that we're making without data data-driven is we are proactively incorporating data into all decision-making processes and we are not making a decision until we actually do some data analysis so it's a it's just a difference between using data to inform decisions versus using data to truly drive decisions so it's a nuance but I would say that there is a difference in it's mainly a cultural difference unlikely a maturity and progression I I think it's a good I think data-driven and data informed are kind of shades of maturity both of them require a certain acceptance of models it's similar to the predictive versus the prescriptive acceptance to that but it is a shade of maturity and not to say that you have to I'm not so sure about going through one before you get to the other but there might be a way to go directly you know from point A to point C but definitely a good way to subtly the difference between someone who is really really going to build data into everything or just will consider it as they see fit at that point we're out of time for the questions we're at the top of the hour here and on the moment I'll turn it back to Shannon but please join us in a month for the next webinar building is flexible and scalable analytics architecture and in that one we're going to be throwing up some punch lists for architecture and some ideas for you for a reference architecture that is broad spectrum all the way from your traditional type bi all the way to the to the more sophisticated big data type things so we look forward to presenting that material to you Shannon on telling anything to add or we'll turn it back over to Shannon all right well thank you John and Kelly so much for this end of a great presentation I just a reminder for everyone else in the follow up email by end of day Monday with links to the slides links to the recording of the session and anything else that was requested it looks like we got through all the questions that we'll just do another quick scrub of those and thanks to all of our attendees for being so engaged and everything we do and just asking all the great questions we appreciate it so much and I hope everyone has a great day again John and Kelly thank you so much thank you appreciate it talk to you in a month bye bye Joe

One Comment

  1. Reda said:

    19:04 bad joke predicted

    July 12, 2019

Leave a Reply

Your email address will not be published. Required fields are marked *