noc18-cs45-lecture 01-Introduction to cloud Computing



introduction to cloud computing preface content of this lecture you will discuss a brief introduction to cloud computing and also focus on the aspects such as why clouds what is a clone what's new in today's cloud and also distinguish cloud computing from the previous generations of computing system that is the distributed system scalable computing over the Internet so the evolutionary changes that have occurred in distributed and cloud computing over past 30 years are driven by the applications with variable workload and last data sets these evolutionary changes are happening in the machine architectures operating system platform network connectivity and application workloads the distributed computing systems uses multiple computer to solve large-scale problems over the Internet thus distributed computing becomes the data intensive and network centric the emergence of computing clouds so in straight the demands of high throughput computing systems build with distributed computing systems has led to the emergence of the need of the cloud so in high throughput computing we will see that the systems which appears they are basically the computing clusters service-oriented architecture computational grids peer-to-peer networks internet cloud and the feature of Internet of Things let us see the hype of cloud which was forecasted around 10 years back and using that particular prediction we will see how far these predictions are towards moving the cloud further Gartner in 2009 has predicted that the cloud computing will soar its revenue much faster than they expected at that point of time and will exceed cross huge revenue mark and it will represent the 19 or 20 percent growth of IT spending in terms of the cloud computing by 2015 similarly IDC in 2009 also predicted the same thing that spending in IT cloud services will triple in the next five years and Forrester in 2010 has predicted also the same thing that is the cloud computing will go as far as spending is concerned now in 2010 to 2020 crossing several that is fivefold increase so companies and even new federal governments are also using the cloud computing so with this particular forecast let us see what is the current status how many key players in the cloud is available as the providers the first one is called as imagine web service which is a most prominent cloud provider so it provides the cloud services in three different types the first one is called Elastic Compute cloud the second one is s3 that is simple storage service the third one is called EBS elastic block storage the second cloud provider which is also well known is called Microsoft Azure and imagine and Microsoft Azure normally they provide similar kind of cloud services the third one is called the Google compute engine that is also known as App that is also a cloud provider besides these three prominent cloud providers there are several other cloud providers such as righty scaled Salesforce EMC Giga space engine data stack Oracle VMware Yahoo cloud era and there are 100 many more in this particular arena so there are categories of clouds which are available they are called as public cloud and the private cloud public clouds are accessible internal to the companies and they are also managed internally by the company and all it's basically like energy usage its maintenance all are owned by the company itself hence it is called a private cloud it's not available it's not accessible by outside people the other type of cloud is called a public cloud which provides a services to any paying customer that means it is open for any one who want to use the cloud by paying the cost that is why it is called a public cloud the cloud provider has to basically maintain then we are the cost of energy and so on and so forth only the customers who want to use it they have to just pay as they use it the example of public clouds are the Amazon s3 Amazon s3 is a simple storage service will store arbitrary datasets and the users has to pay the money as per the amount of space which are basically rented or used for the storage that is in terms of GB per month the second kind of public cloud is given by the imagine as ec2 that is Elastic Compute cloud and this particular Elastic Compute cloud will provide the compute services to the client so as far as any user can upload and run an arbitrary operating system images and based on this several operating system which are basically given and different applications can basically be used we run on this kind of system this particular different operating systems images and the applications which runs will require now the CPU so therefore the user has to pay how many CPUs they require that means instead of numbers but they have to pay as per of the CPU hour which is being used by the applications similarly the third kind of service which basically is categorized under the public cloud is example is called Google App engines here the users can develop their applications within this App Engine framework and they upload their data that will be imported into their format and it can run so it will give you that Google App Engine gives more flexibility in terms of the directly the programming customer can do and they can use the entire framework to solve their applications and they can pay the money accordingly the use as far as the customers are concerned in this cloud scenario the customers will save the time and money how that we are going to explain over here so if AWS is being used then basically a new server can be up and running and within a fraction of minutes that is three minutes compared to several weeks and months to purchase the server and then basically put it into the service all this particular cost of invoicing purchasing and installation will be now reduced and only two or three minutes are required to install new server and run so time is saved if the cloud is opted as the method of computing instead of owning the own servers another example is regarding with the online services will reduce this operational cost by around 30 percent of the spending of the internal company why because operational cost is not at all required the only amount of money it has to be paid whatever as for the use therefore it is a saving of money also if the cloud is is used for computing purposes a private cloud of the virtual servers inside the data center has saved nearly crores of rupees and only because company can share computing power and storage resources across the servers so again this is also going to be very cost effective usage if even if the private cloud is maintained inside the data center we will see the economics when it is required to be go for the private cloud versus the public cloud also there are various startup companies they can harness large computing resources without buying their own machines this also is based on the economics calculations whether to go for your own private cloud or on or basically use the public cloud but as far as we will see all these things why so most many options are available and these options open up new arena of a cloud computing so what is a cloud now here we will see that the advances in the virtualization makes it possible the growth of Internet clouds as a new computing paradigm that is there is a dramatic difference between developing a software for millions to use as a service versus developing a soft and distributing it to run on their own pcs so the architecture in a cloud computing is now slightly changed where the software will be given as a service to the millions rather than the software is to be distributed to run on their PCs so the cloud has changed the new paradigm and let us trace back through the history 1984 John gadge of Sun Microsystems gave the slogan that the network is the computer similarly later on in 2008 David Patterson of music Berkeley has said that the data center is the computer recently Raj Kumar bhaiyya of Melbourne University simply said the cloud is the computer so just see the the way the paradigm is shifting the definition of the computer is now changing from network to the data center now it is the cloud so some people view the clouds as a grid or a clusters with changes through the virtualization these clouds are anticipated to process huge data sets which are generated by the traditional internet social networks and the future IO T's so we will see the inside what is there in the cloud so if we go inside we'll see that there are two kinds of setup you will see inside a cloud the first one is called single site cloud that is within one premises that cloud is called the data center which comprises of the compute nodes which are grouped into the racks which are shown here these are the servers which are grouped into into these particular acts these compute nodes are there sometimes they are also called as a servers then comes the switches which are connecting these racks so every rack will have a top of the rack switch which is basically mentioned were here and they are eating all the racks then comes this particular network topology which will be of two level and within this particular rack will also be the storage back and nodes connected to the network so that means there are nodes also within that particular rack which are primarily meant for the storage purposes they are having the SSDs within it so basically use primarily for the storage now front end will be there for submitting the jobs and receiving the client requests so often this is can be treated as a three tire architecture here there is a course which which will connect all the different top of the rack switches so this particular hierarchy and there is a software services which will basically will be used to run the applications on this kind of structure which is basically nothing but a data center with the with the clusters within it now there may be there are different cloud providers which has deployed this kind of setup at more than one sites they are called geographically distributed clouds and which comprises of multiple such sites that is multiple such data centers which are connected together and each side perhaps with a different structure and the services running within it they can communicate over the first network so that was some of the internal the description of the cloud what comprises of the cloud now this cloud what is basically the computing paradigm which makes the distinction as the cloud computing that we will see so there is a wide overlap between the cloud and the distributed computing distributed computing comprises of multiple autonomous computers having their own memory and they communicate through the messages called message passing as far as the cloud is concerned cloud can be built with the physical or a virtualized resources or the large data centers that are distributed systems so basically on this particular distributed systems the virtualization of the resource will create pool of virtualized resources and this pool of virtualized resources can be allocated to the applications and therefore the cloud computing is having an overlap with a distributed system and also a flexibility or elasticity which is called in terms of computing resources so the cloud computing is also considered to be a form of utility computing or the service computing let us trace back the history of the development of a cloud system so 1940s we will see we have seen that here we have seen that the ENIAC the bigger the big data centers like ENIAC and UNIX system was installed and these particular systems in 1940s were housed in a big room and that was the data centers we call those big rooms which are full of CPUs called data centers but primarily with a slower slower computing facilities which we have right now then afterwards then came the time sharing companies and the data processing industries were transformed into time sharing system their terminals and the PCs terminals were used to access those systems and also the the data if it is quite large then it is given in the form of a punch cards and that industry was called data processing industry then there is a slight change after 1980s and the systems were now become the PCs personal computers and personal computers were given to the people directly to use it and at the same point of time the grits were also evolved clusters were also evolved then using these particular systems the peer-to-peer systems were formed they are the precursor or precursor to the current cloud computing systems now the current cloud computing systems are basically hiring the same set up that is called data centers which were there in 1940s so in 1960s so the same cycle is being repeated but with the different notions of the computing so that more data or a big data can be computed into this kind of data centers so this is what is being summarized over here that the the precursor to the clouds are basically the peer-to-peer why because many pcs were available and they were they were collected together to form a computing and the cloud is basically a further advancement of these kind of systems now as far as the the amount of data and the flexibility in the applications for these resources gets to the scalable computing trends and the technology is also evolving around that so we will see how the trend is in the technology perspective has taken up the shape and has given now the birth for the cloud computing so as far as the hardware is concerned in the hardware we have seen the scalability or the growth that it was and besides that the storage after every 12 months it was and we saw that it will be doubled similarly the bandwidth after every nine month will be doubled similarly the compute cpu compute capacity also every eighteen month will be doubled doubled in the sense with the same cost you will get the double the capacities double the speed and double the bandwidth that means the doubling phenomena so what is the law we hi this doubling of the periods so Moore's law indicates that the processor speeds doubles every 18 months although there is no basically doubling in the terms of his speed but horizontally the development is taking place so number of cores are being packed more in a corresponding chip that is the trend now similarly the Gilders law indicates that the network bandwidth has doubled each year in the past similarly we see that earlier the bandwidth was in KB's kilobits per second in 2015 we'll see terabytes per second was a link speed of the same amount or the same cost so there is a tremendous increase and that is being followed the principles of a doubling period similarly the disk capacity today's species have terabytes far more than 1990 supercomputers so this all is moving towards the reality of utility computing let us see what do you mean by reality computing so aiming towards autonomic operations that can be self organized to support dynamic discovery major computing paradigms are composable with quality of service and service level agreement SLS in 1965 mi t–'s fernando corporate o of multics operating system and we sourced that a computing facility like operating systems like a Power Company are basically work like or like a water company so power company or a water company works like the plug-and-play that means the power is available in the homes in the palm of a socket whenever required can use the plug and continue to use it without having knowing the problems or the production and the power stations similarly the water company's problems without knowing it you can open the tap water will come out using the same concept of utility computing so computing also should be provided in the similar manner if it is then it is called the utility computing that is the thin client can plug in into the computing utility and play that means and can use the compute and also can't run the applications so cloud in some form is realizing the hope of the utility computing so utility computing focuses on business model in which customers receive computing services from a paid service provider so here all the grid Oobleck the cloud platforms are regarded as the utility service providers features of today's clouds so there are four different features which will categorize the cloud or the applications which are basically the cloud the cloud problems the first one is massive scale very large data centers contains tens of hundreds of thousands of servers and you can run your applications across as many servers as you want and as many servers as your application will scale that is called massive scale we will see in more details of what do you mean by this massive scale and in terms of the cloud problem the second aspect or the feature which will classify as a cloud problem or a cloud computing is called on demand axis so on demand axis means pay-as-you-go pay as you use that means it is different from upfront cost upfront cost means you to pay in advance and then whether you use that up to that level or not that is not that is in contrast to that so on demand axis the pay-as-you-go so this particular mode also classify the problem into the cloud problem third feature which classifies the problem into a cloud problem is called data intensive nature so what was megabytes earlier now has become a bigger size that is in a terabyte petabytes and zettabytes so these size of data is growing and if the data size is quite large then those problems falls into the cloud problems examples are daily logs four and six reports then web blocks which will continuously generate the data that becomes of that size data and it has to be solved in a data intensive nature that is the computing of that category of that system is required that is called a cloud computing fourth feature is called a new cloud programming paradigm so the problems of the big data or a large-scale data or a data intensive nature of applications require a new cloud programming paradigms for example MapReduce and it's open source version is called a new programming paradigm which is used to solve these particular problems so this new programming paradigms also are classified as one of the features of the cloud problem another thing is called the key value store if it is then the system's like Cassandra is being used similarly if the database is in the form of no C coil then MongoDB is basically the programming paradigm which is being used so newer programming paradigms are available and if they are required then basically the problem is categorized in this particular today's clone now if we will see that if one or more of these above features are available then only we can classify the problem into a cloud computing problem let us see in more detail what do you mean by the first feature of a cloud computing that is called a massive scale take for example the Facebook application Facebook application as of 2012 we have seen this particular data that there were 30,000 servers were deployed in 2009 which has grown up to 60,000 servers in 2010 that is in one year it was doubled number of servers and in 2012 it is 180,000 servers are deployed so the scale is basically keeps on changing and it has become 180 thousand servers are used to run one application that is Facebook is a massive scale similarly the Microsoft Microsoft in 2008 were using 150,000 machines and that growth rate was 10,000 machines per month and we can see that 80,000 different servers who are running one application which is called a Bing similarly in 2013 Microsoft cosmos application required 110,000 machines and those machines that many number of machines were deployed in four different regions so that is basically called as the massive scale and this massive scale is required to serve these applications which is termed as the cloud similarly Yahoo in 2009 has 100,000 servers and that splits up into the cluster of four thousands so that means there are different sites of at most four thousand servers and they were together if we see that the becomes 100,000 servers which runs this yahoo service so yahoo is basically providing the Google are using the cloud service the next one is imagine ec2 we see that into 2009 40,000 and machines were required to run this particular system or application ec2 and each machine was basically of eight core systems similarly eBay required 50,000 machines to run the applications HP 380 thousand so as far as the Google is concerned it requires a lot many number of servers the total numbers are not disclosed by the Google but it is basically known that it is quite large than any of the above companies which we have so so it's basically a massive scale so this is the first requirement of the cloud problem now what is there inside the massive scale that we can see what is there inside the data center so at one side and at several sites so these data centers will house lot of servers racks and they are all connected together so that can see in this particular room this is called a data center it has a lot of racks full of racks the entire room is filled and all these racks are connected on the right side you can see this back side of the rack they're all servers which are interconnected with each other and within inside every server you can see the boats their servers are nothing but they are basically in the form of a blades or in the boats all are fitted within this particular rack and they are being powered they are being communicated they are being connected through the network such a data center requires huge power to basically run and and also this power is being generated through the power stations and also when so much of power is required within one rooms a lot of heat is being generated so how to cool it so all these are basically the requirement for the maintenance and it requires the cost how to reduce this particular energy is one of the challenge here in the cloud computing data centers so the water which is used to cool down and so the annual water usage that is water usage is also measured as annual water usage divided by the IT equipment energy which is being used if this particular parameter is low then it is good similarly the power utility is also measured by the total facility power divided by IT equipment power if it is low then it is good so Google has shown that his particular data center is achieving one point one one less than one is not possible but it is very close to one so obviously it is trying to use as much as its power drawn for a computing purposes without any much energy wastage so the cooling there are different type of methods are used to cool this data center and some of them are like air setting and water is also combined with purified water and water moves the cool air through the system there are various methods of cooling which are used in the data center of a cloud now the second feature of a cloud is on-demand axis so in the industry terms it is being classified as a classification so that means Hardware as a service ie s means infrastructure as a service then Pass means platform as a service SAS means software as a service so on-demand access is one of the important features of a clock problem only one means that you are not buying you are renting or you're paying as you use so for example AWS elastic compute cloud ec2 that means you can be pay as you use the CPUs for it how many CPUs are required how many CPU per hour is required for the application only that is being paid off by the customer similarly AWS also provides another cloud service that is called storage service s3 aw a simple storage service so it has to be paid as per the use that means GB per month how much space or a storage space per month you use and that amount is to be paid that is called on-demand access to the resources like compute storage and therefore it is being classified the first one is called hardware is a service that means you can get access to the barebone hardware machines and do whatever you want with them that means your own cluster can be owned by someone else as a service but it is not a good idea because of the security risks therefore the another one which is called infrastructure as a service is more popular than hardware as a service and is being provided for the public to be and many companies are you are basically giving infrastructure as a service to the customers so get access to the flexible computing and storage infrastructure in infrastructure in service and this is done through the virtualization so virtualization has achieved this infrastructure in service and infrastructure in service has subsumed the hardware as a service so hardware and service is not being used up in the industry but infrastructure as a service is used and that also covers Hardware is a service within it for example Amazon Web service AWS ec2 and s3 is example of a infrastructure as a service OpenStack is also an example of infrastructure service eucalyptus is also an example right scale Microsoft Azure and Google Cloud they are example of infrastructure as a service now another kind of as classification for on-demand access is called platform as a service now get access to the flexible computing and storage infrastructure coupled with the programming platform so often they are tightly coupled so that means the programming paradigm is given and people the users can use the programming to basically run their applications examples are Google's App Engine this is not as flexible as infrastructure the service but it is easy to use platform as a service another way of S classification is called software as a service so get access to the software services when you when you need them and this will subsume the service-oriented architectures so given that software the service is available so so the service-oriented architecture is being subsumed the example of a software as a services are the Google's doc and MS Office on demand these are the software is a service the third important feature for a cloud problem or a cloud computing classification is called data intensive computing this is in contrast to a computation intensive computing earlier there was a computation intensive computing so the data was small and which has to be computed very fast so therefore MPI is based high-performance computing cluster or grids or form and also the development of supercomputer was to compute the data very fast so that is called computation intensive computing but the trend is now changing here the applications how a large data and data cannot be moved where the compute nodes or where the computations are but the computation has to move wherever the data is required for its contribution by the the size of the data is too big and that is why it is called a data intensive computing so with the typical data intensive computing that is one of the key features of the cloud computing systems requires to store the huge amount of data in first hand at the data center the second is use the compute node nearby that means the data cannot move because it's the big size the compute nodes nearby can compute it so compute nodes runs the computation service that is called data intensive computing so in data intensive computing the focus shifts from the computation intensive computing to the data intensive computing so that is the focus shifted from computation to the data here the CPU utilization is no longer the important resource metric but instead the i/o input output that is the disk and also the network is important why because the data intensive computing IO is very important similarly the network is important now the next important features of the cloud problem is called new cloud programming paradigm it has to provide the easy to write and run highly parallel programs in the new cloud programming paradigm for example the Google provides a new programming paradigm which is called MapReduce and so our Sol so imagine also provides elastic MapReduce service where you have to pay as you go Google also provides the MapReduce in the same manner so ma'am Google also uses the MapReduce in the form of indexing for indexing it requires a chain of MapReduce of 24 MapReduce jobs so just see that the MapReduce solves a big problem or the data intensive computing similarly the Yahoo also has used the Hadoop that is the open source version of the MapReduce and its own version that is called P Facebook also uses the Hadoop and the hi we're in 300 terabytes of total data size is being basically computed or being processed another thing is called a new programming paradigm is no sequel which is in contrast to the my sequel which is of industry standard similarly the key value store that is called Cassandra is also 2400 times faster than my is sequel so we can see that as far as the cloud is concerned there are two types or two categories of cloud public and private so as you know that the private clouds are accessible only within the company and public clouds basically are being provided as a service to the customer now the question about whether to use the private cloud or to use a public cloud is basically a matter of economics so for example if let us say we have take an example of a medium-sized organization which runs its computing services for let us say a month so the services requires 128 different servers and 524 terabyte of space now if it is outsources using imagine AWS services on a monthly basis so let us compute the cost so for the storage s3 will cost let us say about 62k dollars and cpu will be costing around how much that is comes out to be the 1 0 to 4 times 24 times 30 that comes total comes out to be 136 K dollars for the outsourcing of the entire computing infrastructure now in contrast to that if you want to own a private cloud and let us understand the cost of that storage if you purchase 349 K dollars divided by the total number of months you are going to use it similarly you have to add some more cost for its maintenance in the terms of the manpower system administration and so on if you compute what you will see here is that this particular outsource is if it is equated to this particular purchasing it over a particular period of months you will obtain a break-even analysis and in the break-even analysis if you see in the slides that if the number of months is more than six months then it is better to own the infrastructure that is go for a private cloud and if the number of months is more than twelve months for the overall then only you can go for the private cloud if it is less than six months and less than twelve months for the overall then the break-even says that you should not go for the private one you have to go for the public one therefore the startup companies which uses such a infrastructure but for a little period of time may be less than one year for the experimentation all it is better for them to go for the cloud that is why clouds are becoming more popular for these companies conclusion so clouds build on many previous generation of distributed systems so obviously you have to see that the deployment of a distributed systems with the virtualization has built the new generation of the cloud computing systems these cloud computing systems have the features or the characteristics in the cloud problems they are called massive scale on-demand access data intensive computation and a new programming paradigm thank you

One Comment

  1. RAJAT KUMAR said:

    sir i want to notes of this lecture can i get it?

    June 29, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *