Rethinking Big Data Analytics with Google Cloud (Cloud Next '18)



ladies and gentlemen good afternoon please welcome Sudhir hasbi hello everybody I'm Severus Bay I'm the director of product for data analytics and GCP thank you for coming to this session I know it's just after lunch or around lunchtime so I hope I am NOT going to bore you too much in this session we'll keep it exciting so let's get started with the session the key thing is most of the folks in in the audience or outside know Google from this search box and the first experience people have with Google is what are the search box search for a term and you get some interesting and and results that you're looking for actually behind the scenes when you search for anything on on on the on the search box there's lot of infrastructure there's lot of analytics that's going on we are one of the largest organization that collects massive amounts of data and analyzes it and uses that it's not just search though if you look at it we have more than seven products and I know I heard in keynote earlier today we may have an eight to one with Drive which will hit 8 billion users active users monthly going forward the key here is big data is in our DNA we leverage data we leverage machine learning for giving those amazing experiences in all of these products and the way we do that is through internal technology that we have built if you think about dremel which we use internally for all our analytics bigquery is actually an enterprise version of that same piece of technology that's available for enterprises so what we are doing here is bringing the technology that we have invested over ears and making it available for our customers in the cloud if you think about it data across the world is growing it will be about 163 zettabytes by 2025 and as datasets grow within organizations you want to have infrastructure you want to have analytics capability that can actually process that amount of data just a data point one of our customers really when they started their their data collection and their streaming analytics pipelines they used to collect like 50 million events a day now they are up to 5 billion within 18 months so what happens is as you start seeing value from your data you will collect more and more but you want ability and infrastructure that can actually seamlessly scale as your needs grow within your organization's similarly there was a survey done MIT survey was done about machine learning and AI how many customers are using it how it was going that the key thing was the organizations that are actually using AI are able to make 2x faster decisions 5x faster decisions they are able to do 2 times more data-driven decisions within the organizations and also 3x faster execution on the decisions that they are making so overall if you think about it machine learning AI is going to be critical for all your organization's and then the key point though is if your organization is not Gruder at analytics it's never going to be great at AI so the first thing the foundation your you have to have great foundation on analytics data how do you process data how do you analyze data and then you can think about how you go ahead and do machine learning on top of that data leverage AI for for differentiation if you look at the numbers though less than 1% of world's unstructured data is actually being used for analytics or analysis analysis today less than 50% of structured data is analyzed today within organizations so what is our approach to this if you look at Google what are we doing there are four key things one we are focusing on infrastructure or or solutions that allow you to go ahead and focus on analytics not infrastructure we will talk more about that the second is developing comprehensive solutions so we know customers need the whole portfolio of solutions to go ahead and do analysis we are focusing on end to end all the components that you need into an ml lifecycle and we'll look at that quickly and then innovative and open our being an open cloud making sure you have options with open source so so that you can go ahead and and run the workloads the way you want to run is super critical for us and we have lot of investments that we do in making sure we have a promote that let's talk about what does focusing on analytic means and not on infrastructure if you think about us if you're doing analysis with bigquery which is our data a cloud scale data warehousing product you can get started within seconds you basically can bring your datasets and start analyzing instantly the key thing is if you're not using a service product like like bigquery or data flow you will have to worry about monitoring you have to worry about performance tuning infrastructure how many nodes do I need what kind of cluster size do I need how do you performance tune none of that is going to be a problem if you're focused on server less so that's what our focus is we want to provide you with infrastructure that automatically scales gives you ability to do analysis and you don't have to worry about anything just bring your data start doing analysis on it let's talk about the second point right who end-to-end comprehensive solutions the big thing is if you think about analysis it actually starts with ingestion how do I get my data the first step is how do I get my streaming data we have lots of customers using massive amount of streaming events that are coming to them and how do you scale this infrastructure seamlessly so cloud pub/sub is our solution that allows you to do millions of events per second that you can collect and do analysis on them similarly lot of our customers use different Google products like for example Adwords and double-click and all of those for advertising purposes what we have done is we made it really easy for customers who want to use Google Cloud for marketing analytics within few clicks you can literally go ahead and get your AdWords data your double-click data into bigquery for analysis seamlessly similarly IOT is super critical you saw some amazing announcements today morning with HT PU and on cloud IOT core we have cloud IOT core so if you are interested in collecting IOT data you can seamlessly collect that and actually leverage the whole platform from that so we have covered the ingestion if you think about reliable data processing and streaming pipelines we have multiple options for customers right one is data flow with beam so beam and open so is an open source sdk for you to build batch and streaming pipelines with the same programming model you can use data flow which allows you to automatically scale build large-scale data processing pipelines great for developers but we also realize that lot of our customers have in-house capabilities which spark an Hadoop and they love SPARC I used to use SPARC before in my previous roles so I love spark too so for that we have managed Hadoop and spark environment with data proc and then for analysts and we know a lot of our analyst community which is familiar with data also wants to do data wrangling also wants to do data preparation so that they know best before the data is used what kind of analysis they want to do and clean up the data so we have cloud data prep for for those those audience after that once your data is ready you want to do analysis at scale you want to build your data lakes you can actually use GCS Google Cloud Storage to go ahead and put all your structured unstructured data and then process it or you can use our cloud scale data warehouse with bigquery to put all this data at petabyte scale and then do analysis on top of it and once you have analysis platform ready for you then for advanced analytics you can use ml engine you can use tensorflow for visualizations you can use data studio we'll see some of the new enhancements where we are making available on that and also sheets lot of our customers especially G suite customers who use sheets every day we are making some enhancements on that to easily make data from bigquery and other places available today so that's there if you think about ml lifecycle this is the whole lifecycle right you have ml lifecycle is you start from ingestion you have to explore you have to prepare you have to pre-process then you start the process of training hyper tuning testing and predictions there's a whole lifecycle that has to happen and what we provide is a whole suite of products that allow you to go to every one of those processes but what we are also doing it make it very easy for you to do machine learning and you heard some of the announcements we did earlier today and I will go a bit more detail into that and actually we have an amazing demo for you later in the in the session from a customer momentum perspective that's our portfolio we have seen tremendous growth in the data analytics side with our customers lot of customers using the whole portfolio across industry verticals from financial services to retail from gaming to media entertainment all all across the board manufacturing all across the board we are seeing tremendous growth in our data analytics capability being used in different organizations and it's across the different sizes of data sets – so you heard earlier today Twitter Tata talked about moving their large-scale Hadoop deployment I think he I think it was mentioned 300 petabytes of data being moved into GCP and running that scale of cluster and the highlight was like our network and our capability that we provide with a networking stack allows you to have this decoupling of storage and compute that really makes it easy to manage the whole environment reduce the costs and all so we are seeing tremendous growth with folks like Twitter Yahoo's of the world but also lot of enterprise customers that are that are using using the platform so with that let me invite Irene Omar who is the deputy CEO for the AirAsia on the stage to talk more about this I already know you hi thank you ah can you just do a quick introduction about you your role and tell us a bit more about AirAsia sure area is the largest low-cost carrier in Asia so we started back in 2001 we've just two aircraft we carried about about 20,000 passengers and now 16 years later we have over 230 Agra off and over the years we have carried over 500 million passengers and this year we're looking about 19 million passengers that were carrying per year so we've grown very fast we have bases in Southeast Asia Southeast Asia ASEAN is our backyard and the reason why we're focused in building that market base because it has over 600 million population base the third largest after China and India and it has a very young population base with about a median age of 28 29 years old 50% of the population is under 30 70% is under 40 50 percent of the population is live in urban areas and it's one of the fastest-growing GDP in the world and one of the fastest growing middle income earners in the world so this is where as a low-cost care is fantastic opportunity to grow as a population and if you look at the geographical landscape of Southeast Asia it's surrounded by water and and that's where we feel there's a lot of opportunity to learn about the population to grow further and build other business opportunities apart from just running an airline so tremendous growth rate within two to three years from two planes to 230 I guess now what were the key challenges you're facing and then tell us more about what were the business challenges and how are you using Google cloud for some of those I think the key challenge is because we have operations in various countries Malaysia Thailand Indonesia Philippines and recently in India and also Japan and we're looking at getting data from all over from various systems or so forth basically so we have data coming from our booking system 80% of our book goes through our internet and our mobile app unlike other airlines which is the other way around and then we have data coming from our aircraft and from our engines and we use our aircraft in in in the most efficient and we maximize a protect the utilization rate the a320 that we use we fly 14 hours a day and we have 25 minutes turnaround so that we can fit in as many sectors as we can so if you look at the whole group is about departing flights is about 1500 per day and we're looking at departing passengers about almost 300 thousand today so there's a lot of data coming there and it's really important when you are running an efficient operations you need it to be precise and you need something that's scalable and accurate so that we'll be able to understand those data better and be able to focus more on serving our consumers better so the data that we need is really more on how do we improve the consumers experience and the revenue that we can get from them and be able to provide the right kind of products and offerings for them and how do we use this data to improve the overall operational efficiency of our operations and so that we reach productivity in the most efficient way and be able to focus more of our efforts into looking at the insights of not only just our operations but also the behavior of our consumers so we can provide better products offerings and so forth got it and I know you use bigquery and data studio and all the other tools in Google cloud are there key metrics you can share with us that have been having like where you're seen really growth or savings that can you share with some of the things so I'm also in charge of digital transformation so the key thing is for us to integrate all this data coming from various sources and to be able to combine those data and make meaningful algorithm out of it and what we have found even though we only probably use less than 20% of the data that we have already combined is that the rate of the revenues or consumer has doubled and that you know every one percent conversion rate actually increased a revenue by about fifty million u.s. dollars and so forth and what we have seen also because we're able to predict better in to some operations in terms of maintenance and so forth we reduce the number of aircraft on ground and that means it's better experience for passengers and so forth and we have seen that the cost has probably reduced by at least ten percent or so and that's actually quite big in our operations of running an airline that's amazing especially in airline where they as you said the operational cost is heavy so ten percent saving doubling the conversion rate and you're just using twenty percent of the data yes so probably a little bit less than that because we just started only about a few years a couple of years ago and that's a lot to do so it's very key to be able to streamline all those in bigquery and it's a powerful tool that allows us to be scalable and be able to work faster and be more focused on the requirements of our consumers and so forth yeah that's great thank you thanks a lot awesome results and I'm looking forward to what we can do together as you get to 20 to 30 to 100% of the data as you start analyzing it thank you thank you very much so that's about now AirAsia let's talk about there are four key areas that we have we focus on normally when we talk to our customers when when they're using the the different portfolio of solution that we have one is a of course modernizing the data warehouses and we'll talk more about that analyzing streaming data which is super critical as organizations are collecting massive amounts of events event data from different places clickstream to IOT devices streaming is streaming data and processing streaming data is super critical in organizations running open source software and of course visualizing and and using the data in a visual manner is critical for organizations let's talk about bigquery for a second bigquery is actually cloud scale data warehouse it's a natively built if you haven't read Dremel paper you should check it out it's ground-up build data warehouse from the scratch it's cloud scale you can do petabyte scale queries within seconds it's support standard sequel you can actually get started with it at no cost there is a free tier that's available how many of you actually use bigquery here great that a lot of people who don't so my recommendation would be you should go check it out it will take you couple of minutes to go actually bring your data in and start analyzing as I said completely server lists you don't have to worry about infrastructure bring data in and start analyzing that's the key thing it's highly secure it's we encrypt the data at rest it's highly available and then real-time streaming is native to bigquery you can stream hundreds of thousands of events directly into bigquery and then actually analyze it in at the same time so that's super critical so one of the announcements you heard today morning was Rajan talking about bigquery ml the key thing in this was the two big challenges we started hearing from our customers was it's great to use bigquery massive amount of data bring all the data in but if you want to do any machine learning you have to move that data out and then if you have seen some numbers like 80% of data scientists spend time in data preparation moving data around and doing testing of the of the models and also our thing was how do you reduce that time by making machine learning available in the data warehouse instead of moving data to machine learning engine why can't I move machine learning engine closer to data so that was the whole premise of that the second thing was skill set gap in industry we just don't have that many PhD data scientists to go to advanced machine learning so our thing was can we just leverage the skill our audience already has which is sequel and then make machine learning available to them and sequel so that's exactly what we have tried to do is now bigquery ml is nothing but a sequel based like machine learning model creation within bigquery if you have bigquery you're already using sequel to analyze data you have queries ready you understand your data just write two lines of code on top of it create model what type of model you want we can Auto detect models if you want and then just give us the input and what you want to predict and for prediction you're just saying like you know select ml dot predict and you can get the predictions out so that's how easy it is that to do to machine learning in within bigquery one of the things if you if you saw earlier today was 20th Century Fox where they talked about how they were able to predict what audience would are more likely to come back to a movie to come back to the newer movie that they're launching I want to take a different example right now with geo Tom so why don't I invite Neil can you please come on the stage and help us understand what geotab does come thanks for you yeah so can you do a quick introduction of yourself and tell us a bit more about geo tap sure geo tab is a global leader in vehicle telematics many people ask what is vehicle telematics we have a little device that collects data out of a vehicle we are in 1.2 million vehicles we collect all that data and then we analyze that at massive scale so we collect information about where the vehicle is how fast it's moving how the engine is performing fuel consumption information with you going over a pothole with you slammed on brakes and so you can just imagine the opportunities that we have generalize that data to deliver results to our customers using you know products like bigquery and machine learning you know are really massive and so that that's really what we do awesome can you share more about your current existing infrastructure before you enter into bigquery ml what kind of technology do you use from Google cloud how does the business do and then your transition to bigquery ml we can discuss sure we think of you know our relationship with Google as our competitive advantage we have more than 500 servers in GCE and that process the data every single piece of data that the organization generates is actually pushed up into Google bigquery and we're a massive use of Google ml and tensorflow we use data proc we use products like kubernetes and anything that gets announced by Google we very keenly look at because really the benefit is and it was it's an understated problem is that you know when you you first you start collecting the data you have it in one place the next part points is if you want to leverage AI Enel you have to have that ml close to where the data is otherwise you spend your life just moving data around and so it's been a great relationship and a great partnership and I know you've been involved with bigquery ml since we announced our alpha so I also know you have a demo so why don't you tell us what you're going to show in the demo and then what audience are we targeting and then show us the demo then sure I'll do that just to kind of level set we do have probably the most comprehensive and largest Big Data set of vehicles in the world and as I mentioned before this data set is very very rich it has you know we know that – ambient air temperature air pressures we know whether it's a dangerous intersection we know a tremendous amount of data so one of the things that I'm going to show you here today is how we have an add-in into our our standard product our feed management product but this is this one's focused more around smart City and what we're going to do is we're going to use ml to predict outcomes for safety based on weather so I'll get tchard and I'll show you how that all fits together and how that would mean while you're getting set up on that the key thing is there's another key thing will be launching today is it's GIS alpha so bigquery will natively support GIS capabilities like GIS data types within within the data warehouse we'll talk more about it a bit later and there's a detail session at 3:15 that we're going to talk about but I will hand it over to Neil to talk more about the demo okay super so we'll get the demo app all right we're up so what you're seeing over here is a view inside our product as I mentioned before this is an add-in this is one of hundreds of add-ins that are available in the product this is one really cool one will be leveraging Google ml and Google G is that the bigquery GIS features that have just being announced here that we just talked about in order to get some really interesting data and this is just starting to scratch the surface of where we can go with this and you can understand so what you're seeing on the left hand side is a view of the dangerous intersections in the Chicago area so over the last two weeks and essentially the hot spots are areas where it's more dangerous now you ask how could we possibly tell that well we know we're just about a hundred thousand accidents a year happening in our pool of vehicles we know with people slamming on brakes so if we aggregate that data we can then look at where are people having these accidents and where are people slamming on brakes or dangerous lane changes and swerving and whatnot so what happened is the the big data team which actually sitting here today what they did is they took the dead and then they said let's use the public data set that was available in Google bigquery around weather data and so we know for a particular date and time for a particular location what is the weather in that location and they used 250 different metrics to analyze and compute what can we tell about how weather impacts safety and so they ran this experiments and I'm going to show you some results of that so we'll drop let's say we drop the temperature down to around freezing and it's to snow and I'm gonna run the predictive analysis now alive and then what we see is actually really interesting some of the areas that were dangerous before it's still dangerous but there's been a big change in the pattern and so we are seeing things look remarkably different and if we zoom into areas now now we can start seeing well where are those dangerous intersections let's just take one little area over here where I'm going to zoom into and we'll find that whenever it snows we seem to have a dangerous area near school and so we might consider what is happening here maybe the parents are waiting across the road to pick up the kids and it's snowing and so the kids are running across the road and so you get get the circumstance or perhaps vehicles break down there but the point is by leveraging ml by leveraging this data you know cities can now you know look at what what the infrastructure is and change the way the roads are set up in order to keep keep everybody safe ER and this is really just starting to scratch the surface of what you can do when you when you leverage such a powerful tool like Google bigquery and Google Mo Thank You Neil this is awesome thanks a lot I think we're the key thing is just making the city smarter and there's having that kind of impact and you can actually do model generation and prediction so fast it's just going to expedite the whole whole solution cream absolutely one of the key things was how quickly our team was able to put this together there is no coding involved there's no kubernetes there's no spinning up magnitudes of service we love any one of these two but we have sequel people we love sequel so thanks Neil thanks a lot there is actually a session at 3:15 to go deep dive into the Chioda absolution the GIS capabilities if you are interested in GIS data types and all that would be a good session to go to later today other than that we have also worked with our partners to go ahead and give an integrated experience for this the bigquery ml capability so looker for example has this end-to-end workflow that they have built within looker where you can actually go ahead and pull out a data set see it in looker views actually run a prediction run a create a model within that visualize a prediction and then actually go ahead and fine-tune your model from the looker UI itself so we will be working with more partners to bring these kind of integrated capabilities so analysts who are using these tools can like you know from within the tools actually do leverage bigquery ml from from these tools and make it really easy for creating these models visualizing the models and all so yeah looking forward to this going forward couple of things in bigquery ml you have linear and more logistical regression models that are already available the beta is available so please go try it give us some more feedback in there in the beta world a couple of other things we are also announcing clustering beta is coming again I won't be able to go in details of partitioning clustering key capabilities just think about it this way you can do a petabyte scale query from bigquery you could do it like two years back you can do it now but with class partitioning plus clustering you can reduce the cost drastically because we the queries are going to be very more efficient we only access the data what is required within that cluster or within that partition so partitioning plus clustering is going to help you make your queries way more efficient and actually reduce cost drastically if you are using on-demand pricing model there's a detail session at 3:15 by Jordan Teague Ani today you should absolutely go if you are interested in in that topic later today there are some there's some amazing demos Jordan does in that in that session again as we just quickly touched upon GIS alpha is available today so think about that the scenario that we were hearing from our customers was all around for example we are in masks oh nice enter how many within 2 mile radius how many taxis are available of this region if you want to do that kind of query historically it's been like really difficult to do and with availability of GIS features you can do that kind of queries directly inside inside the query now we have some new connectors that are going live one of the other key things we are launching is our new bigquery UI which will give you capabilities it looks better and it also has some one-click experiences to go into data studio and do visualization and then we'll quickly take a look at Google sheets integration that's available so this is an example along with the GIS capability of like the the core data types and the ability to query we also have a visual tool that we are launching which allows you to go ahead and visually fire a query and look at the points on the map because if you're doing like a query around hey show me all the points that are in two mile radius of another point how are you going to visualize it it's really difficult so what we worked with our earth engine team at Google and have this visual tool that gives you ability to visualize that data so please take a look at that again with sheets as I said a lot of our customers use sheets for analysis and doing a move data into it now with Google sheets you haven't connector for bigquery from within there you can just click connect to your bigquery instance pull data in and start analyzing that visualizing it out of the box so one of the other key capabilities making it easy to go do that analysis connect to the data sets and also that's been one of our big themes for for for this year so that's bigquery how do you make it easy to go ahead and analyze data on bigquery streaming analytics we I touched upon it earlier now we have a whole portfolio of products that allow you to do that like you can do millions of event collection from using pub/sub data flow allows you to do large-scale data processing you can use cloud ml or bigquery to go ahead and do analysis on top of that data right Brightcove is one of the best examples of this they literally collect 75 sorry 8,500 years of video for month that's seven billion events a day is what they collect and they use data flow plus pops up to go ahead and analyze those videos and and leverage some great insights from it but it's not just the Brightcove or travel oka uses it for e-commerce clickstream collection and analyzing that cubed is another example where in retail they are doing point-of-sale analysis amazing scenarios with Nintendo in in game analysis in game utilization of consumables and then also nest for IOT data so any kind of large-scale event collection processing analytics you can use pub/sub dataflow for that we are actually announcing few enhancements in that space one of the big things that we are doing is Python Python is one of the fastest growing language in github if you just look at all the commits and all and we wanted to make it easy for our Python developers to do streaming so now we are going to enable Python streaming capability with beam so customers can actually build scalable data pipelines using are using Python so that's going in beta now so customers can use that now we also have dataflow streaming and shuffle capability it will help you do large-scale data processing easily auto scaling capabilities will come with it there are detailed deep dive sessions on these that we should check out if you're interested one of the other things we were we have done is we have actually enhanced the performance on and made our libraries much efficient much more efficient for pops up in seven different languages that you can use but in addition to that we have lot of customers who love Tosca they are like hey I use Kafka already I want to continue using it on GCP what are what are my options so historically you could just go ahead and deploy it to yourself and manage it but what we have now is with confluent we have a managed Kafka solution that's available so if you want to go ahead and use a managed service for Kafka you can just use confluent Kafka on GCP and that's that's one of our one of our strategies is is to work with our partners to provide these end-to-end solutions that you can leverage as as a customer so that that's a it's already available that that you can you can use one of the other things which is core to our strategy as well as core belief is this this open source and being an open cloud and we fundamentally look at things from sto to kubernetes that that we are investing in on our side in big data world we are investing a lot and in open source technology like if you look at just big data roadmap last 15 years the amount of innovation that Google has driven and made available before Google Cloud we used to make this available as paper so that the industry could learn from all the research that we had done everything from Dremel paper to MapReduce to GFS like all different papers and then we also are building a lot of these products based on these these technologies there are two key product areas that we have been investing in on open source side one is data proc it's managed Hadoop and spark capability as well as composer composer is fascinating when it was in private alpha we had more than thousand customers using it I have no idea how do you keep it private and then they have that many customers using it so it just took off it's based on air flow Apache air flow and it was just basically all the customers loved it and we started seeing am a tremendous adoption of it here so we are announcing the GA for composer now so it's already available you should be able to use it major enhancements in in our data proc side auto scaling and custom packages custom packages allows you with few clicks to pick top-level Apache projects that you want to go ahead and deploy now in data proc that's that's interesting and auto scaling based on your resource needs automatically we will scale your Hadoop clusters Hadoop and spark Luster's on your behalf and then of course we announced a few weeks back that Hortonworks now supports their infrastructure on GCP natively so you can use h DP or h DF directly on GC p with that let me call upon Michael from blue apron to talk about how they are using using GC p Michael welcome Hayden good beer can do a quick intro for yourself as well as the company your role sure absolutely everyone hope you enjoying the second day of next Mike I am so blue apron was founded six years ago with a well the modest goal and that goal was to reinvent our worked in this country and so what we've made some good progress that is an audacious goal as the visions are supposed to be and we thought we can get at this vision by making home cooking more accessible easier more affordable for more people in this country and in doing so we could go out there work with farmers producers and make sure that we were investing in sustainable agriculture you know humane ways to raise livestock all these different things so basically what we do is we send out proportioned seasonal ingredients to you in a box with the recipe to make those and we are around millions of dinner tables in the US every single night which is a privilege and one of them so I love blue apron okay so how is data analytics use that blue apron so one of the greatest privileges about working in food I think that I've learned is that people always want to tell you what they think we don't have to really go out and solicit much customer feedback no as I said you around people's dinner tables it's of a personal moment right and it's it's very intimate and basically I you know we have a responsibility to listen and as I said people will Scherz Scherz he'll tell us what they want in their recipes we were joking before that all the recipes have kale in them in the summer don't ask me to fix that I call so data is a really core part of how we make our business decisions and that's not immediately obvious if you look at what we do you think oh you should have a box of food okay so that's great but actually we are looking at the customer lifecycle at every stage and we are ingesting data about what you like what you you know what recipes appeal to you what photos appeal to you what titles appeal to you and we're building up a profile of what you like and as I said people tell us how they feel if you've ever written a comment on one of our recipes know that a human being has read it it's awesome but we can do better right I and what and what we think is we can build a virtuous cycle around what we're doing here and the vision with data and the way that we that we we think about doing that is you know just if we use an example of something that my team does a lot of which is recipe recommendations obviously helping people make sure we put the right recipe in the box that you're like obviously so if we have better recommendations we have better forecasting we have better purchasing we are going out and sourcing the right ingredients and the right proteins and the right dry goods that meet our needs that is reducing food waste it is cutting out another part of the another like middleman in the step right the supermarket and you know if we did get better at that we end up saving you know thousands and thousands of tons of wasted food right so every like small change is really important for us in that in you know at scale it makes a huge difference ah and tell me more about your philosophy around open source software and how you use it and stuff like that within the organization right so I you know I we're on the record is using a laundry list of GCP services you know our enterprise data warehouse is bigquery we use data flow for our streaming processing we use state approximation learning we use GCS for our data like for our prepared features are trained models all of this stuff and but a lot of that orchestration we use air flow for we have been using air flow from more or less day one that data engineering existed a blue apron and it's incredibly important for us because it helps us ingest information from outside sources it helps us you know one other like batch ETL processes it helps us run our you know batch machine learning models all of that stuff and it's actually a key piece of how we we end up actually serving our batch machine learning predictions as well you know we we use air flow to compute you know 122 million recommendations every day and we load those up into a little leveldb artifact that we serve in memory from our from our services which that's great because it means we can serve 122 million recommendations daily with you know about 15 microsecond latency that's pretty good Wow we can work with that that's awesome right but open source is a huge part of that right we got burned early on by I think the story should be familiar to everyone who's worked at a start-up maybe we got burned early on by vendor lock-in on on certain clouds okay and you know we've been committed to open source from the beginning but that really made us realize oh you know we have to take open source seriously as an engineering organization and you know not end us up not end up in that position again you know we're not a big you know we're not a big engineering organization data engineering for us were only 15 people we got you know we got to work on what we have competitive advantages in and that is not running air flow our data operations team managed to do the most recent air flow 1.9 update on our cluster yeah well I will they are not they do not sleep well that week so we don't you know if we we don't want to get locked in and we want to write it once run it everywhere in our hybrid cloud and you know when Google is saying we are committed to an open cloud that's very important for us and that's very important because you know you can compete for business on any other dimension but it's not you're locked into our product right and that's that matters a lot and then like that's a good signal to us you know beam beam spark tensorflow right these are all things that we have big investments in and you know if it's open source we can move it anywhere we like and we're not we hope you never move them but I get it you have an option of moving them wherever perfect uh thanks thanks a lot Michael any other key metrics that you've seen of business results you would like to share before before we wrap up you can't ask me that on the weak who know earnings released but no I mean basically you know we have seen a huge amount of a huge amount of uptick in engagement with our product and you know when we give customers more ways to give us feedback we get even more feedback so it's you know it is a really virtuous cycle there and we're also using those insights to help our culinary team and our amazing chefs basically plan recipes better and so that's that's a new exciting frontier for us is using AI to actually provide feedback from what we know our customers will like so that on the menu or more things that you know this something for everyone and things that people are going to love that awesome thank you thanks Michael great thanks thank you so as you saw when I talk to customers across the board this whole thing around open cloud actually resonates a lot especially keeping the expertise customers have in spark aduke as well as with beam what we have done and and other areas the fourth topic I want to talk about quickly was visualizing and activating your data the key thing is self-service bi is one of the priorities in various organizations how do you explore your data yourself like enable your users to go ahead and explore the data as well as do collaborative data-driven decision-making are the topics that come up in every conversation I have with with customers so one of the things if you haven't used data studio it's a bi tool that's available highly collaborative it's it's basically built collaboration the key thing is with the new bigquery UI capability that that I announce if you use the new UI you can literally do a one click from your query click once and directly do a visualization and data exploration so you can go explore what data set it is you can blend that data with other sources like Adwords or something in pull the data in and you can actually go ahead and create a report out of it within seconds like literally you don't need a specialist to go do that additionally we also have pre-built templates that are available now so you can literally go in and say there's actually a template I found on cloud billing so if you want to just visualize your billing on cloud Google cloud you can actually have a template for that or and you want to analyze your AdWords stuff you have a template for that so really good capabilities we are also we also have our data visualization developer preview that's available where you can do a d3 based visualizations create custom visualizations the other area that we have invested with with one of our partners trifecta is the data prep solution so a lot of our customers want to do data wrangling as analysts want to go do that visually data prep actually allows you to go ahead and visualize your data which may be in bigquery figure out what anomalies are there in the data clean that data up and store it back as we are getting ready for our GA with with that tool in the next few months the key thing is we focused a lot on getting the feedback from beta and we'll have some key capabilities available one big area of enhancements we have done is all around team based data wrangling how do you share your recipes copy and share and copy your flows how do you go ahead and like you know reuse your custom sample recipes and stuff like that so a lot of focus on that focus more on productivity like how do you go ahead and have quick shortcuts to the popular items and all and then we have a completely comprehensive design which looks much better and is made more efficient so that's one of the areas before I jump into the next one so one of the other things somebody told me a while back it's not being good is not enough you should also do good so we been working with some nonprofits to see how we can help democratize our analytics and machine learning capabilities in the in the nonprofit community and so let's run the video of how precision medicine is using it and then I will talk more about that can you take it over my name is Robert ABS and five and a half years ago my mother was diagnosed with Alzheimer's I knew none of the medications were working the entire time was a downward spiral and also lost my grandfather about 25 years ago disease my family at the time felt like it was already too late to change the trajectory of the disease and it breaks my heart when I hear the same stories today the mission for foundation for precision medicine is to bring AI and health care together to detect Alzheimer's early if you can detect Alzheimer's very early on that is when the disease is most susceptible to treatment the data that you have access to are anonymized electronic health care records we needed a HIPAA compliance environment which is why we use Google cloud we're dealing with hundreds of variables on millions of patients which generate billions of lines of data who a cloud enables us to scale our operations and bigquery Amala we are able to develop machinery models faster and utilize our entire data being a non-profit may rely on our volunteers across the US and what really enabled us to do that we wanted them to be able to apply machining on the data and look at trends themselves to empower them to come up with more innovative approaches to change the progression of the disease this work is so important to me because it helps us address this devastating illness that has no cure I heard somewhere that they said don't forget that dots on the plot our people and we really take that seriously so great example of how precision medicine is using the data analytics capability bigquery ml along with other bigquery features to go ahead and do an advancement in in their area so what we were able to do is today we are announcing data solutions for change it's a it's a program that we are launching for nonprofits across the world where they can go ahead and on need basis get access to Google cloud credits along with self training resources and hands-on enablement as I said our goal is how do we democratize analytics and machine learning for nonprofits around the world and give these capabilities enhance of organizations that want to do good in in the world so that's that's launching today the other one more thing that we are launching is visualized 2030 so this is a collaboration with World Bank United Nations UN Foundation and other affiliate organizations where we want to drive awareness and actions around the UN sustainable development goals there are 17 goals within next 12 years we want to meet and so we basically this is a storytelling competition for students from the grad students around the world where they can go ahead and submit create visual stories and insights and actions based on data studio and the public datasets into bigquery so bigquery has 70 plus public datasets that's available that you can go you start analyzing today so in this you can go ahead and create these visual stories and then submit it by end of September and then we will announce the winners at the UN world data forum in Dubai in October so this is one of the things that we are announcing today we want the next generation students who are like earlier we were talking about 80 million students using using G suite we want to extend similar capabilities on data analytics for this this audience so that they can start analyzing visualizing and and coming up with insights insights to go solve along with that one of the things I do want to talk about is our partner ecosystem is super to us we have partners across the board like from ingestion we have some amazing partners if you want to get data into bigquery or different analytics products that we have we have amazing partners that that provide those solutions we have data integration partners we have partners for visualization you saw a previous example of looker tableau is a big partner in that clique they're a bunch of partners that provide BI tools as well as a lot of si partners coming on board to help you with your various engagements that that you may have so that's key Google is a leader in the insights as a platform for a platform as a service from Forrester and I'm hoping we'll be recognized more and more in in different upcoming upcoming reports that come out the key thing from me is there's not more information on big data that's available on the solution please please take a look at that there are amazing sessions I highlighted the GIS one the deep Taiwan on clustering and all with enterprise data warehouse beyond enterprise data warehouse by Jordan T Ghani there are a lot of other good sessions on big data topics in the conference please attend them and give us more feedback thank you everybody [Applause] you

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *