Data Analytics Tutorial with R, Excel, and Tableau Part 1 | Business Analytics Tutorial



my name is a kill signal and I am developer and a heart trainer I have another 15 years of experience in the IT in languages like C++ Java databases like Oracle sequel having some background about this data analytics so it requires to be a good data analyst or a business analyst you need to have some understanding about the mathematics beyond it if you do not know are is fully capable of you know doing all the analysis and throwing out results it's just that we need to build a skill set on how to run those commands and how to interpret those results which we learn during the course of the syllabus so let's get started so session for today is an introduction there will be minimal hands-on we go through the basics and introductions data analytics with our excel and tableau this is the agenda for today what is business analytics applications use cases relevance of business analytics stages of business analytics sources of data what is our model EPS in problem solving what is data science inside our data scientist machine learning why data science and how to become a data scientist so what is business analytics so in today's world there is a lot of data being generated and to analyze that data is becoming a mammoth task you need software you need a mammoth processing power to understand that data and with the advent of IOT and digital technologies a sensor driven data is being generated at you know voluminous data is being generated with a very high velocity so just to give you an example in the world of IOT Internet of Things sensors are emitting data a free week a millisecond what happens is to analyze trends in that date our patterns in the data you need beat it you need processing or you'll need analytics and and no you can't do all of this in an excel because Excel has its own limitations right so because you know their number of rows the column and then if you worked with you know voluminous data in Excel you will see that it slows down if there are a lot of formulas in Excel it will take tremendous time to give you results that is where products or technologies like Python or you know packaged software's like SAS or Minitab SPSS come in handy now so this is what sly this is what this slide talks about then there is input data and presently that input data is being you know what happens is humans or the experts in their domain will try to analyze that data and then give business insight or inference and that is why we want to remove this remove all right minimize this human intelligence and that is why you will see that there is a lot of news articles about automation taking over and jobs getting lost all that comes in here that automation is is leading to a scarcity of jobs or or jobs will become in redundant in the coming few years with the advent of machine learning rows if you see machine learning and data analytics can be automated using these scripts using our our PLC all this okay what is Business Analytics so there is a problem we always you know whenever we start doing anything in our life there is a problem to be solved if there is no problem to be solved then we are going out would we achieve deeper problem exists we looking solutions so there's a business problem and there is a mathematical statistical so the business problem can be qualitative it can be quantitative if it's quantitative then there is a mathematical or a statistical problem as it stays here and then we solve using mathematical statistical solution and if it's a qualitative problem oh you know what we see in the industry is reven business problems to be solved with the backing of data if there is data that data can be analyzed and solutions can be given so I can in my function where I'm working right now we do a lot of analysis and suggest solutions or policy related decisions to the line function say to the HR for the finance for policy making decisions so if in our company a lot of expenses are getting incurred or us travel in a sense people there use lot of taxi taxis and there are some claims which can be bogus we do a lot of machine analytics business analytics for arriving at the correct position and arriving in policy related matters so this is what this slide talks about that a problem is required to arrive at a solution otherwise you know in our or anything it's very important I have a problem a stated problem before even attempting a solution why because otherwise you can arrive at multiple solutions data in through a god for data say of comprising of thousand rows and say 11 or 12 columns or even lesser than that then if you if you don't have a stated problem you can you you know you'll be wandering in the forest because you will try to find solution to everything that this variable is related inversely related to this the other variable is directly related to this but what is the problem at hand so it is very important to know the business problem which needs a solution this is one of the important factors so what is business analytics this is from definition perspective so you know all what stated here is PD understood out of business and annotates however there are different dimensions so let's just look at this use of data and information both structured semi-structured and unstructured along with human intelligence so now if you heard of databases like MongoDB right or no sequel these all fall under the unstructured DB that is the structure in structured structured and semi-structured data bases like our traditional database is like a DBMS of Oracle's sequel there is a data data table where the fields are defined right but in cases of Modi be no sequel structured data like Twitter feeds if they have to be stored in a you know in a data table then that requires unstrung that is unstructured data and a lot of you know intelligence can be derived from these unstructured data and that that's why this point use of data and information along with human intelligence I would on this point I would just like to tell you that there is a term used d I K table you may like to just write down T ikw so that is data information knowledge and mister okay I just eat this be I okay these four data eyes from information case for knowledge W is for wisdom so what happens there is data available for Jade bracelets you get from data we have to get information will raw data is the flow use replicator information from information we have to extract knowledge so knowledge is a notch higher than information and from knowledge repeated your usage of that knowledge can help us lead to wisdom so that's how you know people grow in their careers that when repeatedly the same kind of data is being thrown at thrown at them they will instantly tell you okay yeah please look at this dimension of the problem or look at this as you would have seen your seniors you know instantly they will tell you okay this can be the solution for this so that is one that's wisdom because for so many years they are deep they know that from this data set this could have been the issue then grown that ladder they've grown up that Claddagh data information knowledge that's called the dik loop it's just for information purposes okay so coming to the second one in combination with information technology software's frameworks and put some business analytics to be able to be able to conduct business analytics you need software technology tools so tool one of the tool is our another one our is open source so it's free okay and you will be downloading it downloading it in the sessions on your machines active laptops will also download IDE integrated development environment where you will run the commands and there are packages like as I said earlier those are licensed products QlikView with you heard of click sends ok that is SAS as Minitab there is spss there are licensed products which are expensive okay so so analytics can be done with the help of Technology software frameworks amicable triggering informed business business decision making so that business value can be optimized so when we do if you have used Excel or any other you know data if you've done any data analysis so business analytics is triggering what is business magnetics it is triggering informed business business decision decay it's like decision support system business another way I would like say the decisions of old system because to make any decision you need to have a table or prior experience prior experience would also come with some prior data making the choice the choice can be mailed phased of previous experience and the previous experience is based on previous data business analytics is a very structured thinking process the statement in the same structure is because there is a new chain of thoughts it just it is not random okay today if you are looking at somebody I'll take this decision that decision has to be backed by theta because position making will not here you know unitary or uni single decision-making and you'll have to have consensus of your seniors peers then you only then you can you will be able to implement those decisions so which is what it involves a lot of brainstorming sessions with all the stakeholders so who convince the other stakeholders who will get affected who may get affected or all the interested parties or have to carry out the decision they have to be convinced about and they'll can't convinced only when you share data with them the data can be shared only when you've done analysis of that data giving them some means insights into that date okay and it will be any form that has no defined source so while many of you may be IT professionals here we think of data in terms of what we get in our deals as an attachment but you know we are always making a choice as humans we are always doing analytics in our mind and making a choice so so when it says data can be in any form it can be in any form in the sense suppose tomorrow if you are buying a home or a car right suppose you are buying a car you are not sitting down and doing anything on Excel that probably other than you my calculation also but then in the back of your mind you have already gathered the data from your friends parents anyone and somebody will say ok this car is better this value for money in this car is better mileage that is some other input you will get all that is theta just that we are not putting it down in form of her you know Excel or a word rot point but all that is Peter okay so application use cases there are lot of use cases and application and data science has picked up is a very very big practice or revenue source for I Indian IT companies all of entire if you see news articles business magazines are talking about artificial intelligence data analytics data science so artificial intelligence is nothing but you know of the basics of artificial intelligence built or no data science or business analytics so at the core of artificial intelligence machine learning or neural networks these data science or business mentality and here are listed some of the use cases or the applications or digital market marketing how to assign kou-kun to digital marketing manager send an email so today if you see a lot of whenever you go shopping or to any mall or anywhere in the world they will take down your email id they'll take down your phone and then they will try to core they'll build a database about around and try to figure out your spend from so if you go to a high-end restaurant or you know if you buy an Apple product they will know okay if if the person has got the ability to spend then they will you know bucket higher-end products to you so there's a lot of analysis with this consumer companies are deep doing and I'm sure you are aware of this even on your Facebook and Facebook's and all you will see that if on a browser you search for something then at same search will appear as an ad in your other browser and windows so that stock it targeted marketing and anyways in the background click analysis is being done stole digital marketing is being promotions so companies like you know airfare companies Amazon's of the world they do a lot of this data analytics or business analytics as you say and if you just tell you here and if you see if you buy say a camera or even a CD or a video CD or how proudest people don't buy it but to buy any product or any of these ecommerce website it also says you know people who bought this have also bought this so that is nothing but you know market basket analysis there is something called a priori algorithm so they do an analysis that people suppose I buy something and I also buy another associated thing with that then that is why those recommendations come then healthcare what is the probability that a person will get readmitted to the hospital after 30 days from the date of procedure so healthcare industry is also betting big on the analytics and you know a lot of these medical engineering fields which does blood tests there are lot of advancements in the earth care industry which are built on data analytics if you now are keeping you know abreast with the changes happening in the medical field from now come up with machines which you just have to cough cough in a mask and then it will tell you the capacity of your lungs whether you have any respiratory disease so non-invasive techniques are being developed products or equipments are being developed by the healthcare industry and analysis is being done so you know right how we get a blood test done they draw some sample of blood and then give you your diagnostic parameters but the latest advanced are that you just cough into a last kind of stuff how did you tell you all the other parameters later normally invasive technique would have given and all this is possible through analytics I am saying inheritance is because they would have taken a sample of same 100,000 of people and studied the patterns of people who cough so you know what they would have done I don't know what they ideally done is to start with a true person coughs how much moisture is being coming from the you know the person coughs and then they would have check the heartbeat and then they would have made a pattern out of it using analytics and that and they would you know extrapolate those studies insurance is a very big field so actually analytics is being extensively used in the FSI industry TFSI banking financial services a mature tree so who's going to buy my insurance what should be the premium about there is a complete science rounded called achoo aerial science and actuarial science is deals with the laws of mathematics oh no just to give you an example when we pay we've not thought about it but NVP insurance so suppose today if you go to buy term insurance or Hindu in policy or a life insurance policy how do you how do they coat the in premium amount there is some science which is going on behind that premium amount a calculation of the premium amount and no very retired in each field there are qualified actuaries who do a lot of mathematics to arrive at how those premium amounts are calculated and while we understand that we if you want to buy a 50 lakh policy your term insurance premium may be less 5000 or say 4,000 rupees or three eight to one rupee is rupees 3821 but those numbers are rather I read scientifically using analytics and this has been a feel for long now that analytics is being talked about in the greater domain it's getting prominence the next is customer relationship so a lot of customer satisfaction indexes and customer relationships is being customer patterns customer buying patterns are being studied and when people post their comments on Facebook or you know other website they do analysis of that and revert then their manager reverts to the customer to ease their pain point so now people are going digital they know voice their commands on the public platform and then if there is a growing number of people who complain then surely the management takes notice of it and then takes corrective action so these are the applications these are very very few application users it's also being used in the telecom industry and that is there that is where we see that you know a lot of automation is creeping in there you would have seen that job loss is not happening maximum at the telecom industry because it's a highly process oriented industry and automation can be buying used in this industry ok a few other cases stock market so no stock market this has been a favor or years together because it's one of the highest you know process orient industries why I say process oriented is because data is available and captured every second millisecond and when you have data in a structured fashion it becomes much more easier to do a data analysis so stop now you world of algorithmic stop training so you know there are BOTS being built for doing a lot big stock trading and now it is being also regulated because what bots – they are running 24 by 7 across the you know stock exchanges and they find any arbitration arbitrage opportunities and you know make money out of it so and human brain RI is not so quick to capture those opportunities now and it is unregulated so people the regulator's are trying to put a stop to these stock trading sports now you are now with these APL's and the IPL and the comedy league no commercial commercialization of sport a lot of analytics is being done in fact if you do whenever those cricket matches appear there is a lot of trivia that is given by the commentators sometimes men will say oh in that pure this is the highest record so how where does it come from himself does not do all that analytics there is a recipe they are powered by ACP I don't know we would have seen at the bottom it's written sometimes that the analytics is being powered by a CPR now although this is serviced to the commentators by these engines this do a lot of analytics in the akram entertainment industry is being you know extensively used in the not just film industry but for these concerts please though the penetration is still less and you know why I would say less is because entertainment industry is not fully organized it's not very process and whenever there is less process orientation for digitization heretics is less pricing decisions so product pricing then order firstly product pricing will depend on what are the components going into the making of the product and then there is a lot of other variables like market variables competitive product pricing so what happens is with the help of models mathematical models lot of variables can be programmed and prices can be arrived at so elements of be a relevance of a business analytics helps in formulation of right strategies in right time so that's been what I said that you know policymaking decisions require data is analysis that's how you can formulate right strategies smart efficient making or it can act as a decision support system helps in achieving the business target in time driving operational efficiency tamson driving profitability it's in getting a clear picture about the data through better data visualization yeah the last three points here much more because that lay encapsulate all the other points for all of you who are operating in IT peels or IT enabled services or you know product based companies you would see that there's a lot of fake focus now on profitability or operational efficiency and you know why this is being driven because the industry or the economy is not as it was that is used to so in the sense that the growth picture is not rosy and that is why you know senior leadership's and management are focusing on driving better profitability or operational efficiencies and you asking for more and more data from relevant departments and trying to see where how costs can be optimized peace Olympics can help a great team very great abilities of I think somebody is request I can hear some noises okay yeah okay so our and Python there are a lot of visualization techniques so we all must have used do you know Excel charts and graphs but those graphs are you know slightly limited in the sense even though we may not have used the entire repository of graphs available in Excel but if you explore our graphs and the capability of visualization in R or Python they are like amazing you can in fact build a graph right from scratch so there is a in fact our cookbook for graphs altogether complete open our graphs their visualization techniques are far superior than the plain vanilla versions of our we will see some of those Thailand was a material to us well okay so data visualization itself the last point in this life is itself of big science so there is something called exploratory before building any model you can derive or you can arrive at a lot of inferences just by looking at plotting the data in form of a graph that's a science in itself so these are the stages of business analytics the first stage is just um like how we have CMM levels CMM level 1 2 3 4 5 and all five I all that this starts Business Analytics starts with descriptive analytics diagnostic predictive prescriptive and cognitive and you will see so discreet we'll see each of these and what is the different descriptive analytics you may not you need not remember all this but it is good to know these words at least is cryptic analytics and predictive analytics oh I would say these two are important words from the slight descriptive and predictive so what is descriptive method of finding the historical trends helps in reflecting that as a scenario shows what has happened in the past shows how the business was performed in past can be delivered using business intelligence tools basic statistics easily interpret business metrics and what happened will be analyzed so Q this is definition perspective so suppose are you descriptive analytics to give you a very very simple example which will stick in your mind and memory is see when we have done basic statistics in your school days you know like mean median mode range minimum maximum interquartile range all that is descriptive statistics so given that as you mean you are analyzing some data of your company say see sales person why is region wise employee wise then you compile what is the minimum what is the average what is the most frequent of a year and number B median what is the maximum what are the minimum that is all descriptors right so it is just analyzing the data as is this is the as is and actually that is what we do in our general functions it just so why is it called descriptors just because it describes that data the range the median remove the average maximum minimum in describes the data has is right and then if decisions are taken on that it is extrapolated without any scientific measure its diagnostic anomalies key factors will be analyzed root causes of the problems will be analyzed their indifference different stages of the problem analysis fortify the problem applet details of the axis of the problem lot of reporting where is different reports and involves some degree of correlation and Association so to give you an example though who did not remember but the medical the doctors asked you for tests and then a lot of s results are thrown at them and then they read out those results and tell you yes this parameter is out of range and that's what is diagnostic analysis so if you see all these points are getting to that but you know cause of the problem with analyze key factors will be analyzed like if you do a CBC count we will check whether your white blood cells are high and high this human there is some disease which is manifested the WBC's which are the fighting cells should be within a range right or if in in a company if some salary is very low or very high than the median average salary for that grade or band and they say oh there's something wrong or something amiss either that guy has not got 70 Ike's too long or he's an exceptional performer in that branch we need the promotion right that is diagnostic inheritance predictive analytics so as I said in the beginning this is another important word which you should know so what is likely to happen can we predict the future events given historical data or well we can predict depends on previous stage noticed artistical model building statistical models requires is a probability and probability distribution and future protection a pretty feminin axis is another domain so I like to generally cannot classify an allocation to respectable predictive predictive is only the latest things are done because we might have been doing predictive but without any scientific basis so so some data set is given to and you present the descriptive analytics to your bosses now based on that data future plans of pattern of future predictions need to be made so those predictions are to be scientifically based based and not based on some extrapolation Anderson somebody may give some predictions somebody may give some other prediction based on their individual experience or mathematical models say their opinion should converge so logistic regression or linear regression or scientific method there are many methods there is CPM for there is trees okay there are decision trees there random forests principal component analysis there is neural network all these are scientific methods of doing predictive analytics and probability and a basic level of understanding of probability is required for that so the two words to remember here is descriptive and predictive so predictive goes a notch higher or a stage higher than descriptive and a little disk RIPTA just describes the data as so you know to give you an example we may say that if we study the heights of people in a particular region in India so what we'll do we collect the height of those people and say the average height of women in this region is this much I don't mean this then they may also study the height of the chili but if you want a Kiwi if average height of a man and a woman a child born out of a man and a woman with this average height how much will that ideal grow to be of the sight of this weight they're given Alex maybe you for stock market predictions please don't previous trains one is just to say Oh 52 week high is this much for free to be who is this much past seven week trend is this match but what could do way the data you know it's good to know that date ever how can you use that it so using all the data comes in predictions predictor dynamics now that may turn out to be right that will turn out to be wrong and that's where probability comes it that okay this talk is likely to go up by this much percentage in this near future this time period what there's a probability value associated prescriptive speeches prescribes with chrome apps prescribes recommendation business printing predicted to go down what recommended action types of business informing strategies alternative strategies that's in preparing plan be reactive analysis said action always trigger be is wrong prediction so if you know model there are many variables now water is variable so if you studied basic mathematics like you know y is equal to MX plus you know why is equal to m ax + m m 1 X 2 M 3 X 3 so X 1 X 2 X 3 are the variables so in terms of Excel if I am saying I am analyzing so if I made a if in the columns plot getting values like no region so it's City age th number of children these can all be variables they are called features lingo in data analytics is features variables okay which are generally put in columns so based on those if all those variables need to be factored into a model a mathematical model will be big and then it will be figured out which variable is most important in the mathematical model and which variable is not important which variable is supposedly going down what can be the corrective action taken mathematical models are not right but some are more right than others mathematical model is just an attempt by humans ooh pain or – name – do you know have site visibility to the future but how the future will behave is not entirely known or can never be predicted it can be caught up with some degree of probability or degree of accuracy so I may arrive at a model and all the participants here may arrive at a different model but the level of accuracy and all these models will be different some may be more accurate so may be less accurate and that we will know only when the actual event has happened so please bear in mind that all predictive modeling for predictive analytics are wrong but to what extent or degree they are wrong only time captain yet we do predictive modeling is because we want to have a lever or a handle on all the factors that can go right or wrong and we won't predict with some accuracy what is cognitive analogous this is an artificial intelligence and machine learning layer if you know the prediction and recommended action to be taken can any machine take decisions can we do what if analysis Commission's decide what recommendation to be given cognitive function based on reasoning in logic and all machine learning models set of cognitive computing cognitive means cognition there is a world called world called combination recognition recognized prediction means to recognize so when you see that in today's cameras or phone that is come print activation or facial recognition or retina recognition that is our cognitive analogy so biometrics is nothing but cognitive analytics so biometrics in the sense when you go to say your suppose somebody else is trying to open your laptop with retina recognition or first put this thing or any such thing that is all cognitive now there are google google api is which can say which can study your faces and then tell whether you are in how what kind of mood you are so what they have done is they are they have made machines learn they have thrown millions of pictures had their machine learning algorithms and then they have studied the facial expressions so if the lips are expanded you know they're generally in a somber mood you will be looking down your eyes drooping this will not be expanded right so if now today you throw a picture or upload a picture it will tell you the mood of the person so the person is happy sad somber angry fearful all that comes under cognitive analytics even your truth serum tests Oh lie detector tests all comes with recovering our cognitive analytics is machine learning so what is done in machine learning is a lot date machines are trained that you know if so how Google does it they have uploaded pictures of thousands and millions of faces and then they have pixelated faces and understood and seen what is a happy face right what are the symptoms of or expressions of a happy face happy face is like when the lips are you know expanded horizontally teeth are slight teeth are showing right eyes are brighter or opened up more so that so then they have trained their machine said that these are the symptoms or traits of a happy expression what are the traits of a sad expression eyes are droopy their dream okay the facial face is not stretched so these are you know love sad expressions or more somber expressions what are expressions of fear there is slight force on the forehead a person is looking just staring into the bland blank so what I'm trying to say is cognitive there is a word called cognition it means to recognize and machines are made to learn and that's how what is cognitive analytics here so you know well in in the past there was a machine called deep blue Challenger which defeated Garry Kasparov in chess but he did not come up in one go the deep blue machine the pro challenger was played by the best of chess players on there is patterns of plane or rather high level of training which goes on before these technologies are rolled out for colloquial or common use that is cognitive analytics another example of cognitive analytics is you know this self-driven cars which are now being which is the talk of the town so there's a lot of cognition happening cameras you know study the behavior driving patterns of the other cars the cameras have to keep in mind the other objects in vicinity lot of training which happens or we know this active in cars so which is what it says involves a set of rules sources of data there is structured data semi structured data and unstructured data what I talked about structured data is spreadsheet so anything that you can put in an Excel a structure traitorous equal tables is why because there are eight articles in a database and tables however we find no fields are there there are values in the fields so the structure or the template of that table is predefined semi structured data is XML and JSON so a lot of JavaScript applications music Jason object so Jason is JavaScript object notation XML was also a technology which preceded or rather was being extensively used what reason is the new standard and much more widely used data exchange happening through Jason now an unstructured data is because if you see a lot of theta is now being generated on whatsapp and Facebook right X just their image is there audio/video which cannot actually sit in or can be analyzed through our traditional DBMS is like sequel tables or spreadsheet if you want to analyze data having videos or images in an Excel you will be handicapped which is rare no sequel normal TB comes in what is a model if you can see my screen math formula stat formula and human jet lint is this is a very simplistic view of defining a model a model generally we will know in a Kimura in my our beliefs are our models so you know if you say that you know sometimes when you are you belong to a particular region and you go to another region or to a foreign country you say oh people of this ethnicity or people of this region will have a particular trait that is also modeling so we say people in South India they eat pour the rice loving people so rice is consumed that's a model people in the North may consume or me read that's also model because human judgment now that human judgment has come based on experiences right you can put it in a model but the attempt of data analytics for business analytics here is to do all this using scientific principles and not by involving human judgment human judgment is still involved but to a lesser and lesser degree and that is why automation can be brought in if like in the diamond industry what is happening right now by a diamond industry the gems are graded by humans like this chain belongs to this category is hello or this you know human involvement is very high but attempts have been made to automated right so a simplistic model is equation of a straight line y is equal to MX plus C right M is the slope and sees the Interceptor it is a model why because when you do draw the Cartesian coordinates of X and y axis and you say y is equal to 3x plus P then when you plug the value 1 for 3 in place of X to get a straight line cutting the y-axis at points whenever you plug Y is equal to MX plus C so if X is 1/3 X plus 2y will be 5 this is really simplistic for your model but we are actually in our mind constructing models and those models are actually nothing but their beliefs and some models are more accurate than the other one and that is true for our belief system you have believes those beliefs can change over time but we do with great difficulty some beliefs are more right than other than everyone has their own way the attempt is to arrive at mathematical beliefs here in the world of analytics so steps in problem solving and I said in the beginning definition of business problem is a mask without a problem to solve we will be struggling at cortisol and be made from a lot of solutions which may actually not be required so if you read this define business problem we food out capture the business problem feature Construction translate to maths at problem identified trains get a match that solution is the limit this is generally what happens unconsciously enough my interest bill and this is how we have always been solving problem but just this is put it in for visualization refuel solution so what happens suppose you have to solve a business problem say give me the hotel state trends in our country so that is the business problem which are the business problem okay I wanted for my tender feature construction what are variables you need feature is nothing but variable all the columns you was to the relevant so retractable in our DB mean will have several call but all columns or fields may not be of significance to you you want to evaluate only a few parameters at is feature construction when you do a descriptive analytics on that same theater when you identify trends and if you are equipped with statistical knowledge we will go a step further and try to figure out why is the median low than the mean is it right skewed is it lifts q what are the outliers you get of math or statistical you will build a model about boundary and then you will interpret the result take decision right okay four hotels in Chennai maybe slightly cheaper so you know only people with certain grades are allowed to stay in 5-star hotels or people certain Mumbai may be expensive then you know Chennai Delhi maybe Delhi hotels will be even more expensive so we should reduce the hotel's pain all that analysis can be done so this we are doing unconsciously but this is just a way to tell you that this is what is going on and what they are doing so what we generally get is in Excel we do something on that excel in to the results in a PPT show it to our bosses our bosses give us some solution or try to analyze that it out this is this year sales have gone up higher five has a gone fire and they you will see the entire day to say okay okay this customer has not given me this customer is not grown so then we look for this customer has grown so let's focus more here the customer which does not grow let's put more sales and marketing effort air then big decisions so what is data science it arises a multidisciplinary blend of data inference algorithm development and technology in order to solve analytically complex problems this is a Venn diagram let's just look at this so when we do our introduction to at the end we may understand that some of you come with domain expertise right I mean expertise means somebody who is been working on saying come industry he knows the ins and outs of their domain or the banking industry so those are called domain experts some people are experts in computer science okay in the sense we know a programming language better so math and stats that is also part of domain expertise so that just at the domain here is that sense that somebody may be good at physics that's also what do you mean so now if you see if you are an expert if you know if you're a domain expert if you try to learn something on the analytic side then you will become a techno functional guy so right since you've chosen to be a part of this force I am given to understand that you are trying to make a career shift from knowing some other either a programming language which means you are in the field of computer science oh you are our domain expert right and I am very glad that you are you make taken an effort to learn something as complex as the science and you are willing to explore something out of your comfort zone to become data scientists learn a new technology and you know nothing is impossible there is that ten thousand our principle if you order by Malcolm Gladwell that you know the more and more effort you will put more and more expert will become P as humans are you know we learn everything through practice in be mates call of us Selena child's first learns to block it just walks aimlessly without a goal because he enjoys doing that so when you are trying to do this are our data science we'll love to run some commands in our and the more and more effort slowly you will migrate to solving but if you have passion and passion and the willingness to put in hard work in effort I can guarantee you that you will become experts that should be a goal so the whole objective of this diagram of slide is to become a techno functional right so if you are experts in the field of computer science or not or processing industry kpo SBP use or write industry you're looking at only the control design school or if you are a domain experts in the domain expert in the field of real estate you can actually pick up some computer skills and try to become a software engineer or uni calls for a text to functional graph those people who are good at math so stats will not know the tool are people who are good at R or title they may not know that basics or the underlying statistics field so an attempt is made to learn or marry the two analytical data science domain experts but know how to approach high-level challenges with a clear eye on what is important employer G do you make use of this communication till your techniques and discoveries intellectual intellectual curiosity and industry knowledge so a domain expert in short complete knowledge of the domain or the industry in which he is operating so if you go to a bank and you ask how Eller has been at the teller days or years together along with your challenges which I ll kill experience huh he may be very world apart from the field of science well he's a domain expert whereas a person who is a computer dad of an engineer or what I shoot here as a computer science graduate or a worker he may be knowing nothing about a teller he just knows that you know he has to give the check or the token at the teller and he give you the money but the teller who is the domain expert eNOS what are the regulations or the processes he needs to fulfill before giving out or handing out the money oh you're attempting to become like functionally with all this talks about my expertise and statistics of the world now is moving them around usage of more and more data and data can only be analyzed so risking the descriptive heretics can be done mining you have no simple command in our and you will get all the descriptive evidence and I can guarantee you that for the ability lies in interpreting that data and we predictions based all right so if you read last line here as a data scientist a pusher we could have matrix algebra linear algebra basic math so this should not act as a deterrent for you let me restate on who dad with traits for linear algebra basic mathematics and statistics are will come to your rescue because running commands and all that is okay it is very important during the course of sessions we'll also we can attempt to understand basics of statistics linear and matrix hands over the computer science is an ever-expanding field unlike maths and stats mats and stats the principles are well laid out for use all together a very crystal clear rules even though maths and stats as a field is also expanding but it's a very niche field and the expansion happening there is beyond the purview of our common understanding it will be used by the military militaries of the world and in very scientific fields when computer science is expanding at a rapid pace so the technologies of yesteryears erps but is on premised technology is earpiece used to be on right now everything is on the cloud which means everything is on the Internet you can do your leaves on the internet you can approve things with taxis so everything is being available made available on the phone on the plow right so computer science or technology which you may learn today you may become obsolete in the next couple of years so as I said if we learn the basics of stats or mads then learning any tool is very simple so if we see what is we then we learn here in our course we'd also make an attempt to under understand the basics of absent stats because then doing running that command of any programming language will have a syntax learning the syntax is very easy though what is going on behind running the command in design what is be happening to the data is more important will make it attempt to do that respect just to read out and read out of your coil reading the second boy another data scientist of course after the idea was a jack of all trades but master of none so you can have your own view on that what you want to be whether you want to be an expert in a couple of languages one or two language or you want to move all the languages that call you can take I take is that you're trying to become an expert in one language it is better machine learning is that branch of the design of the system gets better as it turns new patterns over a period of time specific to a particular ah and the data scientist a person should be thinking towards the lab scale machine learning machine learning is a very moral field today and being extensively u and V as end users are getting to you know getting more and more usages of machine learning in our lives as I said retina scanning on your laptops or thumbprint scandals a lot of examples of machine learning what does the data scientist do a data scientist tries to understand the business problem availability of data that they may need it a scientist home village hypothesis from the news runs various experiments pin data by applying mathematical statistical techniques potato discovery and pattern recognition and it is and it conveys the element with stories to the stakeholders about their business problem and possible recommendation so you must have heard the word cork ons ons is a field in its account is nothing but a de design quantitative know us a lot of the insurance company is banking financial services industry the employer Road of cornets who do number crunching and communication or articulating whatever the quant or a data scientist or a business analyst has figured out is very important so today if as a seasoned analyst or a business when I saw the data scientist you have come up with some recommendation or some fighting game it's very important for you to articulate that worldly men so that this email understands it otherwise you have to speak his language crazy not you know be not able you will not be able to have him on your sign so articulation of the solution and the problem as how the end-user under standards is very important in any field not just for data scientists but in anything if tomorrow you go to a lawyer analog lawyer tells you oh section this article this how does it matter to you what you need to understand your mind will work only when you understand it in plain English it will not understand we will not will be confused if we are given to understand long legal sentences with a lot of commas and no full stop so as a data scientist or primary job is also to be able to articulate in layman's terms so this is about again what are the skill sets of a deterrent is near algebra multivariate calculus graph they were probability it's okay even if you don't know this okay you would have at least I would say that you know to get started we should know five things but those five things should be enough to confuse or convince a lot of people talking no there are a million things in the world which we don't know we should know only five things which can confuse or convince billion people it's okay cuz the world is full of knowledge and we cannot get all that much we'd see we try to expect eyes we'll expertise on a few things and depending on your passion then ability to put in hard work you will be taking all this and learning a lot of things by yourself multivariate calculus just to tell you is nothing but multi variable as I said very ones are nothing but columns and whether you want to do data analysis using multiple variables or single variable linear algebra this is nothing but simple equation which I said Y is equal to MX plus C which is used in linear regression or logistic regression correlations on that stuff also trust me it is going to be a lot of future statistical description distribution the Gauss curve f there is the T distributions the latest technologies being used are Python SAS Ruby pearl and then this deployment happens through a spark elastic search Buddha words among these words are in Python are open source with me they are freely available cells as I know is the licensed product very expensive license Prada Ruby pearl is also a scripting language with its open source but not used extensively now Ruby on Rails is being used but I don't know the abilities capabilities or the challenges of fruitland range there's a language called Ruby on Rails which one is better are of Python I myself being an avid our user am more fond of our because it's a very concise scripting language just by writing two or three commands you can do a lot of stuff I have not used Python extensively but I am a half an if you serve the internet code read things on you see there is a reaching debate which one is better so being an ARF am I say are has got much more you know it's like saying which actor is better or somebody is learning but of that ethically or some other course nothing you say different people will have different opinion for me I was better and I find you know for all others who know Python we'll be able to tell which one is better I personally find out much superior than the retailer his language data science pipeline class questions which is nothing but business problem solution Ruben search hypothesis trainer model communicate results this is the general lifecycle of her our model is built we'll see this when we actually build a work what is machine learning computer program is said to learn from experience with respect to some class of Tosti and performance special P if it's the task T as a measure by people here you do need to remember it said saying by some data scientists TM is okay but what it essentially says is that machines cannot learn on their own humans have to feed it a lot of data to make it learn and that data is generated experience we are also constantly learning through our experiences and that machine learning in any human happens through the brains but now machines are being made to learn so that our lives become simpler this is what this definition says that experience is the mother of all teachers so this is again machine learning pipeline identify features please remember features is nothing but columns or variables pre-processing acceleration validation implemented so there will be various versions okay somebody will say anyway you get data try to visualize at first visualization it first means not to close your eyes and make a mental graph of it the visual I don't mean use some graphical thing in our to see what is the pattern is it directly related is it inversely related do you see a correlation okay then start towards waiting upon validated and then improving representation is nothing but visualization and evaluate it and optimize it because any more mathematical model cannot start the test of time it has to be continuously fed with previous data in outcomes which is nothing but optimization Oh learning trees neural nets representation this is nothing but representation of the data our decision tree Nets Markov models pack of chains right they represent data and then trophy probability position recall I can see that is evaluation and then gradient descent will see this when some words to take away from your is Markov chains decision trees probabilities precision why data science smarter computers because no human brains can process for me certain variable if two variables are three four five variable but if you have to build a complex with some mathematical model then you need the help of computers and then of course data analysts or cons commit very expensive prices and profitability the if you can deploy computers to do those tasks cars can be trained for control liveries your cluster of models in a single model so a single model cannot give you accurate result quick could give you multiple models you can choose which model can be evaluated for making more prediction that's auto selection and then auto select you know machine learning models so based on whichever model has given you the most accurate results which only time will tell we can fine-tune that model or select them how to become a data scientist so basic prerequisite school level maths some components of linear algebra and analysis of algorithms databases ok l don't need not be to me too much of it in your algebra good to know and this is what we will see next if you have not done installation of our or art studio and if you are computer savvy please do it we will still do it tomorrow and do all this distillation of our on the desktop laptops I presume you would be using Windows machine back installations will come in be slightly tricky it'll be good to having those machines what the installation when we do can also be done and self it is available from wherever we down right so that brings us to the end of our today special and the key takeaway for from today's session is what is Business Analytics just to know simply what is descriptive and allocates what is predictive analytics okay these are the languages or the buzz words being used in the industry there is you know machine learning artificial intelligence data analytic data science warns descriptive analytics predictive analytics cognitive analytics so it is good to to be made to be aware of all these buzzwords and tomorrow we will do some things in our hands-on when you will find it more interesting a cat killed average is dead

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *