Quality and Testing of Predictive Analytics and Big Data Applications

an example of what amazon.com how many of these amazon.com Oh miss American Pie it loves what you are doing Louis what your behavior is low and say everything so in real life it is kind of neat the beat in your everyday life so you don't need to go anywhere look at amazon.com okay so a little bit about me so I am a PhD student we don't rightly be bangalore and also directories where I do such shopping I work in the travel industry in extra time so no coalition ICP and cannot modulus okay so a little bit of inspiration for you to start with this is a Roman century my company called Boston Dynamics watch it carefully does a quiz question so I need to give buddy Steve okay before we go on the video how do you test this can any district here technically test this robo just sees an obstacle it has a camera and we just as you can see the test data was a normal building 2015 now we've seen a change I'm just asking about how we test this you're not going to take this real world testing in the lab for me let me ask you a simple question what is the path you know what is the equation of a project okay so it uses the path of the project types as a tester you should know the camera phone which is the distance in front of you you do not take an angle and it goes onto the top of it that's a real piss this right and coming to the data patterns which we patterns after I would walk the walk I'm good extend it a bit because he made me a little bit easier so there it is going to be equal to the critical applications and then going to be discussing about key problems and challenges three case studies based on the time that we have and when you walk out of the room what you really walk out of this so what is predictive applications so it has lot of data data is the beta book so you have lot of data it's complex complex in your later four weeks that you know this velocity and more importantly you need to know what are the patterns with other systems with it right so we have our patterns and you need to be a powerful anthropologist what is anthropology observations say for example you watch something you neck top zoom Sandra Brown is a person who sits there and observe the thing and understand what really matters and second is a processing how many of the only tradition VA applications the data warehousing reporting kind of applications okay so you might ask what is different in particular vacation in my view despite the same and additionally we do data mining etc etc and we never paid this kind of data values now we have huge volumes of data so it's pretty much the evolution of B I need to predict death penalty publications right and one size doesn't fit all please understand as a tester when you open a loop when you go community communication you cannot use maybe 30 days on may be the same but without they have their separate oh okay the third thing is about the insights or you have is intelligence negativeness so you have three types of learning right one is fun supervised learning reinforce learning on and super Western I completed a bit with an example quiz question if you give me a right answer probably given name category so a bit of the learning types so this is what the Robo City from the computer what they learn about this let's take an example of this WTC your New York based here we face that if this is even that happens before the couch on who gets to network so in this case is ready to bite back so and you can read a book of the signal and noise which is where my silver you can not know it's an excellent book the usual election predictions of Obama etc etcetera okay so this is a form of supervised learning when you have prior theta and you have second thing is cycling okay no guess for this so I'm going to talk about when you met the cycle there are set of rules and actions for example either hand hold the handlebar straight connected like this when your body axis so rules if you pick violate the rule there is an action taken and if the consequence so for example if it's learn by adjusting itself and they fell down a surprise all right so this is for reinforcement learning reinforcement learning is about routes actions and consequences so if you break the root part of the route this what you get and this other consequences are let's move back this this unsupervised learning where you have a covert Aggie photo tagging is about it tries to recognize your face where you are where your friends where they are and applies to do something like the analysis to find out the square where your betta where there is no sex attraction crap and believe me Facebook does this in 1 billion pages per day and it's all like it doesn't do any of this and tena koutou tena kata said ok this is Harry jib all the data and then you don't know he doesn't do that he uses something called neural networks some of your quarry comes attached complex neural networks anybody so they do speak nerd deep networks variation where the networks to do this so this is a really critical thing on the machine learning now let's understand a bit of what straining and words prediction right so we give example with this training is basically very big dot theta and if the Machine lurch by itself that's what is called features and uses algorithms after that when you do the next in the second would say that probability they attack I'm not going to be attacked so training the fishing in very weak with your data sets and the algorithm automatically takes care of you I am very intuitive example anybody can think of real-world learning by a machine which you use Alvarado consider reading any learning that we do it is almost at home some papers in the car work 70 percent of very good office for you alakazoo events image you know how image spamming works spam when something you need to be elsewhere as a spam it is based on history how did he touch this the first is set of events are given then it builds the spam keywords like off world are three and whatever five and then it learns from it and then learns the probability of a particular word a drink together with some other wood then it leaves a saucepan that's awake works and all this happens in okay nobody's going to learning so now this is going to be the first dataset that you have will you take the application you can split the data into two parts and then build it any purpose then the piston box okay so now I want data now right I want a time that is what the cool thing is we don't take that you can't do anything without data analysis there's nothing but this thing is pretty important right now let's look at this example how many of you recognize this gentleman gentleman is good why this light and soft we ask where the prediction existed in the world so the prediction existed even during this basically sundial so the data is very good data the data is finders of the Sun and there is a model which is a dial and there is something like the calculation they do and it's a ninja training model in my view right and how to predict time so again again is that making intuitive a bill to you that what do you see when you go out it has a prediction part of it and that our ancestors have done this part pretty good the reason I'm showing this slide was it's it's simply amazing how they built this tile using the simple mathematics calculation and then you still go to Jaipur very you're exactly see the sunlight following exactly the time that you wanted to see and it's a prediction model and the model is going to be what that is right now these are the problems associated with it one is it is not trigger as I said it's not going to be enjoyed as pink or anything of this picture is what we not so okay and the last is tutus it's not a package it's not a UFO it's not a sexy part of life it is the most unsexy part of life you deal with fightin lot of packet work and lot of spreadsheets if somebody has been output you know you have maybe a draft issue the most irritating for someone like the manager or someone to come and look at your work because they're not in the demo on a TV show and more importantly understanding and model is not easy for example when you talk about gentlemen it's not easy to understand how they break it is much easier to explain it's a simple model but the model is very very complex these are the challenges somehow able to do it I tell you okay so I'm going to take the first case study how many of you faces frustration of opinion is degree C and SpiceJet and I will say offer indigo how many of you know why it happens first is traffic of course many people make this circle and other thing is some people availability for example when you are booked in the same room somebody it's also looking that the same route and booking availability so my first pay study which I since I'm working traveler comes from that time okay so we call it search quality monitor basically have a system before I put it in second good he goes to this person what is the attention span today in seconds as well research what is the attention span in seconds just given number seven he says I'm not to say seven that's right what's your name sir so basically attentions back from our previous generation so Korea is reduced from 890 towards six or seven thanks to mobile right so let's say you put a website I created for my Twitter Parata and then the search takes let's say three minutes 15 you will be decided they probably look at some other side so what they do is to give you a huge cash data set can be access very fast and second responses right so when we do that I give you a very interesting example we've another miss there is a person named Samuel reputation and writing an email to us say that your product is appreciated it doesn't work it is not working right we used two parameters to compute this cache one expert popularity popularity is about how much group this popular for example daily bonus buy because by only computing this and they put into the cache right but it shows the price so another example which a compacted matter they were searching for I mean our team sitting in France configured all your sushi you know they think this is the fastest case was go out to Berlin and go back to Berlin was giving you 50% of us see what do you see is 300 but when you book income 600 right they even make these pieces and our response was very companies our product is 90% accurate so can I get back saying that yeah that's great but the dead person is what we use okay so go up well in the great example so this is for search quality monitoring that we load the slides what you see you cannot book that is what does much quality so we need to search whatever you're searching it should be able to book it right and the KP is that we use this findability finding the same recommendation same flight same price for example I cannot show you he did a Arabia and offer reject and show spice later now range ago and third thing is availability all this okay but I don't give seeps next patient personation you just click it's not available right so how do you want to monitor this system so we build up it's a more of a loaded slider station so you see file system cache and there is a monitoring system the booking system which you see on the top is the system which you have the real price of the source of fruit and we monitor this virtuality right and i have only one need to do bus with all of you it deals with point seven million messages per day run to your customers use this when i come so i should be able to tell this and use my travel agent three products how do you invest this I chose the product and let's understand what you got search quality monitoring as you see ready just here for a particular route you see this ECG kind of a graph right how do you test this I have got lot of selenium developer selenium to inspect this element it is the copper goodies in the email it's not going to work that way you need to really understand what it is whitefish and understand some because smash something in to understand this sampling technique right Santorini is about how could millions of data that you have you pick one percent of the data and test and one percent is what really separates to the truth right so I mean you get from statistics advanced happening techniques stratified sampling techniques between two routes or whatever it is that's the way it works and tell you something after hearing this so okay I cannot say okay I come with a little bit you need to test this but you should have a do not do the testing something like that is exactly the data rate because what have you production should have a exact look-alike coming up so that you can spot these letters and up the Tetsu this open interval Institute so come to the next case study you're going you're going preparation to Delhi are you three options first option is but still when we do reach all the time let me give this it has me right question right question to ask is what type of travel and what type of travel I am okay let me give an example the first option will somebody like a couple going on a vacation they don't mind getting a 6 o clock and maybe going to put is like 2 hours away from in from our place you get up at 3 o clock and we reach there for that six and it's a mess okay so the second option as you see is about family which is much easier for business like a speaker here who wants to give us lecture in the in Kolkata and reach Delhi so you need ask what kind of customer it is that's what we deal with every day right and this is for customer choice modelling CTM people and customer choice modelling is about if we given pie options which option we'll pick and fire option speaking these are probability because you cannot stay I'll give you an example made it very heads made an interesting problem with of competitive stable company the competition is that number of cookies that you do on a 10 percent traffic let's say they load up the server for you for ten percent to say 10 percent camellias the number of bookings you make on that month you get the 80 percent traffic for the next month which means that more number of movies you make you can get better B's or better you know things with a customer and multiple times we used to make the joke I say produces a particular they are now network of bitch we rejoice because we are going to beat that way right and it happens we can even try and you'll probably put terabytes of data to track terabytes of data coming into the system to track they give you search logs they give whatever you want but we are the traffic how we tracked it is I mean this is a little bit of the slide on dated reference which we do not a parametrization that as you can see we do with analytics to predict the customer profile what sort of things that we need etcetera I need an example so the first one is convenience factor knowing that our profit which I talked the book we can get a based business target which system is allowed to leave on Monday only one is written back on Friday you know did you price to them right and no cash barrier options like for example Lufthansa has a low-cost carrier called German which pretty much coordinate free penny recommendations people take packages so you defects recommendations and testing all bookable combination is compress it's impossible right and it is real time and also predicting the already destination is something so this is the slide which I talked about the left hand side whatever you sample the pattern is the fastest and the cheapest most of us wonder the fastest and cheapest given a choice right and you see the number of recommendations that is very very less in the second one you see the pattern about to pass middle passes like to thank fastest part is its price leap right so after we do this customization analysis and so okay so we have you rent and by the time saver quits already so I still need to make it and to it that we use is hot wake up MATLAB Python Hadoop basically just for you do little bit understand what tools we use and as a tester you would like to go more towards it these are the tools to learn this will especially the last one is for greatness fly in India a red plus y is a concept where we are trying to do combination of ready under flag combination very fly from a dear one city to another ta one city and pony show has a good life network for example but let me do an example so we have a cable city Mumbai to City here which city is Chennai there's a place called fanboy – where I study there's a theatre city ok so the first option is to fly to Chennai and then take a train the options are it's like the price is around five thousand three US flights are available and you need to come pick a transitive Mona – then you go towards you see the purple Christ and airlines have exploited this and they have high price for a thousand rupees they ask and you see the convenience factor and given a choice let's say it's given to you how do you even model this I get this problem right so because this is real tricky problems for you to solve because not everybody knows we can operate Thompson yeah so we're interested in this kind of problems and network analysis this is the recent innovation of recent means like the airport rail if you have a wait list for first-class ready to go these renovations coming right in front of orange the other one is if you come to Bangalore you get a straight bus to myself you know need to go to the city he let's take the bus to myself flying and bus and you get booked where you're getting your ticket right so this is a bit of dynamic pricing how many of you use ranking dynamic pricing the price changes can be pricey so the red and pricing is based on research for piety Mumbai done in 2007 2007 and implement a dynamic pricing one year where the price changes at the last minute okay so my my intro to appeal to you is how to even test this group dynamics how do you ever get the demand and how do you model and rationalize such networks it's your talk about network problems the first problem we should any such problem the second problem which I talked about is customer convenience problem the third problem is about network these are three different problems and in my new I think I have a feeling that unless you understand the model behind the underlying principles of the mathematics behind it it's very hard for us to test how we do it but as a tester in such world it's going to be a bit hard for you even to go and look at 89 of the model how it is coded because now when you walk out of this room I'd like you to think about these things and very importantly to walk out as a tester and you want to remodel your opinion of big data applications this what you need for this what the traits that you like to have the first one redefine those different asset you cannot have one match and say that ok I'm going to do only a regression testing and I'm going to wait for the bill to come it's better you go to go get that we'll build it by yourself right your predefined your roots the second one is intelligence and intuition there is a study it says that robots will take over few months you know what the estimated time they have there is a project people are robo robo and you know the promo robot group is certain time period any guest composition that is correct yeah something that you need or something like that where everybody merges to the Machine what is the time scientists how many years party I mean for example today I made under a pile right in front of us any guess what it's not party somewhere it's 25 years from now take 25 years okay so the thing is when intelligence and illusion machines are putting breaking down the problems further okay but human mind is putting wouldn't putting this problem together we find big picture understand what it is so human mind is good in doing that so when your tester our programmer in this space you should have good sense of intelligence and put self inclusion and you should back it it's not possible for you to say only by rational it works but if you say I think eat your dinner and do this experiment of a tech sector thank you so there is no subject of a DVD so in this point innovation in that part about it's all you if you are creative will modernist boot you can prove anyone wrong or ruin even right is that the guy and an actor Malini you should have the flexibility to learn new things are theta whatever I tools I told you the problem and the mighty speed which we talk and also the timeline is the data that you see has a time value you cannot say that before date and say that I want to do this because a lot of data that's good and that's the value what are the things that you've done the experiment five years before does it really work we'll put it no value for prediction time value for theta right so many people ask me after I talk this way to study then if I should go and start the first thing first Maxim probability was there with us okay he's too good at it okay and we lost it and any grad exams that you get GRE Chima Indian top it it will stop it not even Chinese Indians topic and we're equal to ie X bar today when I talk to put in phase projectile some basic things that we learned so we have to learn the basics a little bit we learn about the machines because we should learn from us right read a lot so the lot of intelligence that's coming from Facebook ki a crunch mashup boost Facebook is good but it should be properties so these are the things that they recommend and if you want to start really about machine learning you can take Stanford Coursera and more importantly open government data sets that is by you know recent government initiatives you get lot of data from 800 govt lot in please note it down you can go and pick one problem and solve it today it is quite data that you can request a license you can ask the government I need this data for example the rate and my case study first from dinner or TV or in I didn't get any other data so it will say a base data and the gentleman there is a great example and there is a bit rate by cattle calm if you do hax basically the data science kind of a website you can go and look at problems where you can solve this full of cattle you can learn then you can experiment the lot of tutorials available for you and more importantly last but not the least you should have a passion to solve a problem and it is required for you to solve the problem that's the way you say the fundamental quality of mathematics probability and problem solving combines with the data is going to make you powerful and I constantina I take pride in what is well being it being part of the community weights like your mats and other things were part of your ancestry and we kind of provoked it I mean when you talk to people who have mental or someone they feel like this is what I learned in the school I said so far what do we do so basically the Forgotten either male model that way except enough so you should have the thirst of learning match and this is the last one I don't know another third one price is there but I mean that attic which things are giving there are green dots with yours they I want you to guess x-axis y-axis and what is the ground how close the look at it there are three dots and there are yellow dots with years and something very close to our likelihood in any case what are those green dots what are those indoor ducts and what they represent x-axis y-axis patience even honey that's nice clothes and he is given a clue it's related to crafts why minute drops for what we need exactly so what is the x-axis y-axis in that way okay with you you can give your name to that Nexus to take a hoodie so it is basically the monsoon prediction media this is not my slide this slide by a team America sir he said I just open it so it is since this why this complicated is as you can see the elbow dots represent drop okay and the blue dot rotation flood and three dot represents normal as you can see here 1982 Eleanor splint was very nice 1960 you so the monsoon was very bad but you take am 1997 El Nino was high but the monster was good so if we I can put everything the climate change do not and this problem is still unsolved and this is a research by IAT Mumbai on how to predict a monster when the climate change of El Nino happens and this is the tight in front of us affects half over and seventy percent of our country and monster predation is no meaning or trivial and I leave you with this thought problems like this are open for you to explore and this is right thanks for you if you want to change to take a scientist on this meteor white party okay thank you [Applause] so two things can happen in a presentation if it's completely clear no questions is complete and here's sort of anything that comes to mind I stay curious to know what it is they say we are very good in part of that this is what we did and this is research then he called is greater scientist team leader later scientist and say that can you look at it this gets predicted ten percent more accurate and homegrown – right so you should read this book nation and how they are so the modern database for the data initial path video having one terabyte of data use path so you need to keep your non-relational groups for the language party spectral vector so basically when I went to France two years before he asked me honey there are scientists who as a 30 member t he has to be hurry how many people knew no pattern let's take one I said I know you're challenged within one year I need five Python face so what we do is that come for Python no it's now it happens that way the language also need apply the same language into the problem so basically in the basic field exists remix you cannot say that okay finish because the new technology for much faster is the faster than the discus thanks for your time your attention and thanks for taking issue walk away walk away with pride that we are good in maths [Applause]

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *