Making sense of IoT data with the Cloud – Google I/O 2016



so this is the making sense with IOT making sense of IOT data with with the cloud so if you're in the wrong room then you can kind of check out now but so my name is Ian Lewis I'm a developer advocate on the Google cloud platform team I'm based in Tokyo Japan so if you're ever out there you can kind of give me a give me a holler I'm kind of into a Python and go and IOT and cloud and those types of things so if you're ever interested you can kind of hit me up on Twitter on Ian M Lewis I'm always on there and I'm really willing to to talk to you and I'd love to hear from you about questions or things like that if you if you think of them later after the talk so today we're going to be talking about IOT and so so IOT is is the Internet of Things obviously and that enables you to get a lot more data about the world around you and to be able to use that data to make better decisions about how to do things you know improve things in your life be more efficient etc so anything from smartphones to to to dishwashers to just connected devices that are that can measure sensor data like temperature or or humidity or those types of things and all of those are kind of creating all of this data but that like dealing with that data can actually pretty pretty hard you want to actually be able to use that data later so let's say that you have a great idea for kind of a connected device maybe it's a robot arm maybe it's a sensor maybe it's a dishwasher but so once you you have this data or this new idea and you're it's collecting all this data there's a lot of things that you have to kind of think about you have to think about being able to connect the device to the network to be able to send the data so establishing the connections being able to do authentication being able to manage that connection so that it stays stays live so you can actually send data then you have to be able to actually read the data from the sensors you have to know enough about the hardware to be able to read the data from the sensor to be able to then take it from in your program and actually be able to send it then you have to actually send the data over the network and this can be done like in a number of ways but what if your connection is over a kind of a wireless connection maybe it kind of goes in and out you have to be able to deal with those type of connection issues that might arise a lot of IOT related devices are on sim networks or on 3G or that type of thing you have to then be able to take that data and process it on you on the server so you have to be able to figure out how to take that data do some processing with it and then actually store it but you can imagine that if there's lots and lots of devices and lots of data coming in you have to be able to handle all of that all of that data coming in and then you have to be able to actually store the data so perhaps the you have a large number of devices that are all generating data you have to be able to store a large amount of data and to be able to handle that type of throughput and then there's one last thing that I think is like fairly that people kind of overlook you know like you can actually get the data and store it but then you have to be able to get the data out easily and analyze it and figure out you know actually make the there figure out then how to make the decisions more smarter and be more efficient so this analysis part is actually also really really important to being able to get the data out actually once you've once you've stored it so in this talk I kind of want to go through how you would actually build an IOT application that is creating a lot of data and is needs to process that data and store that data in the cloud and so I'm going to kind of go through a kind of a high level what what this type of project would look like what are the typical type of things you would see in an IOT application and then eventually will kind of get down and actually start within a particular use case or a case study about how to actually are protection application so some of the typical components in a an IOT application so here's an IDE level architecture you might have a bunch of devices say see this is like an industrial application I'm thinking mostly about industrial applications a lot of people kind of think of IOT as kind of this connected home type of thing but and that's that's also irrelevant but I think kind of think of connected home is a is a special type of of industrial use cases so I'm gonna speak in a very generic way about how to view IOT applications but here's kind of a typical architecture you might have a bunch of devices like in your home or in the in a factory or in a a power generation plant or whatever and they're all collecting data and then sending that data to a cloud service that can actually then process and store that data and you might have the devices actually connected via a just a cable to something like a gateway server that will actually send the data to the over the network worth maybe over a short term or a short wave like Wireless so that it's very a short distance or distance but then from the Gateway you can actually send to the Internet so you might have a number of different types of setups or the device itself could be connected directly to the Internet in that sense it would kind of serve as its own gateway so you can have a number of different scenarios but in it as a general rule that you kind of have this this gateway type of thing in a in your architecture quite often and then when you kind of zoom in out of the cloud side you might actually have a number of moving parts there so you might have some a front-end that actually takes in the data from your device and then some backends that actually process and store the data so this might be kind of a you know there might be any number of type of permutations of this type of background but you might see or this back-end but you might see these type of what all architectures will tend to tend to fall in this type of diagram so now now they have looked at that I mean I'll try to go in and fill the fill the blanks about how you might what you might actually use for the front-end of back-end for your application in a kind of a case study later but next I want to talk about the actual types of data that you would have to send from between devices and your service so we've got a blog about this and the one of the types of data that you would send is or that you would have is metadata so these are device IDs or classes or models or revisions those type of things about your device and so this would be basically data that that exists but doesn't really change very much this is something that's very static for a particular device and so and it might give you the information about the capabilities of the device but it doesn't really change and it could be kind of freeform data you don't really necessarily know exactly what might be in here depending on the the the type of device it might be different also state data state data is very similar to metadata in except that this will kind of change fairly often because the on the device itself might change state or change location or whatever so here we might have the status of the device whether it's ready or it's an error State or it's failing you might have the status of each and every one of the devices or the or the sensors so in this case we have one that's like a temperature sensor that you know shows the the current temperature and says that it's reading and we might also have like location if like say your device has a GPS or something like that so this might be like updated like every minute or every every hour or whatever but it's kind of continually updated as the as the state changes for your device next is is telemetry data so telemetry it's not really a commonly used word but it's a it's generally time series data so this is the data that the the kind of meat and bones or meat and potatoes type of application data that you're so generating so basically over time this will generate it will generate read-only data that can be viewed as a time series so maybe the humidity in the room or or the number the amount of water flowing through a pipe or something like that and then this is basically and it's basically read-only so it isn't it doesn't update but depending on the number of devices that you have you could very quickly get into very large amounts of data in a very high throughput so you need to be able to handle that and generally you have to you quickly have to apply big data type of strategies the fourth type of data that I wanted to talk about was commands so commands are things that would actually be sent to the device so telling it to do to a specific device to like spin 360 degrees or certain 90 degrees or something like that or run a self-cleaning cycle or do something and these are generally not really they have a specific type of properties in that they are not really represented as state data very easily so when you send a commands to a device you want to actually have that command execute one time and you want to be able to know if that if the device actually has succeeded in running the command or not so things like having the command sending the command and having it actually run twice because of intermittent type of communication problems or things like that like if you told it to return 90 degrees if it actually ran it twice it would turn 180 degrees and you don't really want it to do that so you want these kind of like run only like only once type of semantics it's also kind of temporal so you want to have some sort of like TTL or something to in order to be able to timeout these type of commands so that's some some of the types of data and a general like our view of architecture for a empty application so next I want to talk about actually applying that those ideas or applying exit some some the clouds to solve some of those problems and so I'm gonna do like kind of a case study and take a set of requirements and then actually try to create an architecture that will satisfy those requirements so the requirements for for this particular application that I'm going to do are pretty simple it just basically reads some temperature data from a large number of devices for analysis later and then we're gonna manage the info and status of each device so if we read these requirements we can kind of see that the types of data that I was talking about kind of pop out so here we have the temperature data would be our telemetry data this is the data that runs as a time series and then info about each device is metadata and then the status of each device could be seen as state right so let's go and actually start trying to architect an application like this so our VR architecture is gonna be pretty simple I'm going to kind of I'm not going to use a gateway our device will actually connect directly to the to the Internet so in that sense it's you it's acting as its own gateway but and so when you look at it the overall architecture it looks very simple but if you actually drill in it's a it can be a little bit more complicated so there's a number of things that you kind of have to think about so on the device side you actually have to think about which device you're going to use like how you're gonna actually build the device what sort of sensors you're gonna have on the device as well as the software that's gonna run on the device you have to be able to actually run the software or create the software that's gonna run on the device what kind of software is that going to be then you also on the cloud side need to be able to handle things like say you're creating telemetry data and you're continuously sending that to the cloud what if there's kind of the the the Wireless is it's not very it's kind of flaky how do you handle that sort of thing also how do you handle the stream processing like if the if the device sends data to the cloud and the clouds not ready to actually process it how do you deal with that so we're gonna actually deal with that by using a message queue so message queue is going to take a message from our device in this case it's going to be telling us temperature data in a message and that's going to get stored in the queue until the stream processing system is ready and then the stream processing system is going to take that data from the message queue and then write that to our sign series data base and then we're going to store our state data and metadata in a state database so I'm going to go through all of these and actually kind of fill in what each one of these are as we go along here so the first thing I want to talk about is the actual device so for for my for my little application I'm gonna be using a BeagleBone and so that's this this device over here if I can switch over so I have a little device over here this is a BeagleBone with some with some temperature sensors on it I'll show you that a little bit later so going back on the BeagleBone is a an open source mini computer that runs Linux it's very similar to kind of a Raspberry Pi if you're familiar with that it's developed by the seed studio or this this particular one is is developed by seed studio it's an open source one so if you wanted to you could print these circuits out if you yourself if you wanted to but it comes with these these really cool connectors these i2c in are rs-232 connectors which make it easy to kind of plug in some devices without actually having to solder them and seed studio also makes them some cool Grove sensors which you can actually plug into the Beagle bone and then actually get it to canet to connect and work without actually having to solder it which is pretty pretty cool for doing some prototyping so here I'm going to use this this analog to digital converter and a temperature sensor so the temperature sensor creates some analog data and then that's going to get converted to digital to actually go to the to the Grove or to the BeagleBone sorry the next thing I want to talk about is the software so I'm going on the Beagle bone it runs Linux and so what I'm going to do is I'm going to create an application on nodejs and there's this really cool library called johnny-five which is a JavaScript lobe robotics and IOT platform and it's got a really cool logo and yeah I picked it for the logo the but it's it's really got a really cool interface and it's written in JavaScript and most people kind of know JavaScript and it's JavaScript as a heck of a lot better than writing C++ so that's what I chose so johnny-five like if you actually write an application it kind of looks like this it's a it's just regular JavaScript so you can just you know require the johnny-five module and then I'm also using the Beagle bone IO module which is a which is used to connect to the to the to the interfaces on the Beagle bone and then you can kind of pass the the Beagle bone object to Johnny v to create a board object and then you can say you can set a on ready signal so that or a function so that when the board is ready it can you can actually run some code so you'll set up the board and then it will set up the all of the the connectors and things like that on the board and so when it's ready then you can actually run the function so that's really good so that you don't have to like deal with timing and things like that for when you're actually running your application so the next thing that I'm going to do is I'm going to store the state database in our state data in firebase so you might have heard about firebase from the keynote fire base is a platform for creating mobile applications or real-time applications especially mobile but what's really cool about it is includes this like newest SQL back-end which is kind of JSON based so it looks really a lot like a JSON object and works like a JSON object so it has a kind of a tree structure inside of it and so I'm going to be using this to actually store the state database or state data so as the the application is running it'll update the state data so firebase you can kind of connect to it like this you create a you require the firebase module and then you can authenticate it you to it using the firebase token generator and then once you've authenticated you can create a reference you'll have a reference to the database and then you can use that to set objects to the database such a and right now I'm just setting a whole JSON object to the date to a particular key in the database and this is just going to contain all of my sensor or my my state data for the device and so as I'm kind of looping through the application as its reading data from the device it will actually update the state as well so for the backend part for the telemetry data I'm going to be using cloud pub/sub so this is our messaging to you that we're going to use to take the the telemetry data we're then they're going to process that data using data flow and then we're going to store the time series data in a cloud BigTable so I'll talk about each one of these in turn but what's really great is that each one of these will scale to handle a large amount of data coming in to them so each of these independently scale which will allow me to handle many many devices in the future so cloud pub/sub is just the are managed fully time messaging service so basically just acts as a messaging cue so you can put data in and into the queue and then on the out outside of it the data flow will actually take messages from the queue and then process them as it's ready so this will do a lot to help kind of smooth out data sending data from the device to the to the processing system back-end and data flow is kind of a unified programming module module model and managed service for doing streaming data so this is very similar to something like spark or Hadoop or it's kind of like an in-between between them but it's unified so that it works with stant with streaming or batch models so in this case we're gonna use it for streaming but you can also use it for batch basically what you do is you create kind of a pipeline for with a bunch of different steps and each one of those steps you define how to handle one piece of data and then it kind of handles how to scale that up and paralyze that for you so we'll look at that a little bit later and then lastly cloud BigTable is a knows SQL database that that Google uses internally for a lot of our curse core services like search and analytics and maps and Gmail essentially what it is is a key value type of database but the keys are kind of sorted in kind of lexa graphically so that's great for things like time series data it's also maps from one key to a row of data so you can have multiple values so each of those are in a column so what we can do is you'll have a key and then you could map to like any number of of sensor pieces of sensor data in our case we're just going to be using temperature data but you can imagine that you would have like you could have many different types of sensor data so on the application side in order to send a pub/sub we're going to create our thermometer object this is a johnny-five object which will use read data from the the grove sensor every five seconds and then basically we'll set a callback on this thermometer so on when there's data available it'll run this function and we'll just publish that data to the cloud pub/sub so here I'm sending the the data as a with our device ID and an a channel and a timestamp and the actual temperature data so this just goes into cloud pub/sub as JSON data and then on the data flow side here we have like a couple of things to just set up data flow to be able to read from pub/sub and from cloud bait table and write to the cloud BigTable so these are just basically objects that I'm going to create from some command line arguments to tell data flow what what topic pub subtopic I should use what BigTable table I should use and what BigTable cluster I should use and then we'll actually create a dataflow pipeline so this pipeline is just you create an object this is Java code so we just take create a pipeline from our objects or from our options and then we're gonna create a 3-part pipeline one is just a built-in step that we can use for data flow this is a pub sub to or to read from pub sub and then we're gonna read that using a JSON coder which is a which allows us to read JSON data and then we're gonna apply this mutation transform which is a transform that I wrote in Java code and then at the end we're going to write to a particular table in BigTable based on the options that we had so here's how I'm going to write my actual mutation transform so I can write code for cloud dataflow that will define how to process each record that comes in to dataflow so we're going to be reading from from pub/sub so each message that comes into pub/sub is going to be passed to my my transform and then this process element is going to be called in order to actually process that so going through that we have we're reading from the the JSON object that was passed in so we're reading a single JSON object and all of the fields device ID channel timestamp and data and then we're gonna create a row key for cloud BigTable using our device ID and the timestamp and then we're going to create a put request for cloud BigTable and so now I'm just gonna loop through the the keys that were sent in the data and in this case it's just the temperature data but we're gonna just add that column to our row and then finally sends the the put request as output from our for our transformer so that's going to change basically just change our our JSON object into a put request for for cloud BigTable so now that I talked a little bit about that I'm going to jump over to actually do show you a working example of that so first I want to start my application so here I'm on my Beagle bone and I'm gonna start my no DJ s app this takes a second to start up but once that starts up we should be able to go into firebase I have a my firebase this is actually the old console here so once that actually starts up we should be able to see that the state data gets start starts getting updated into firebase and you can kind of see from the management console in firebase which which fields are actually getting updated so here we can see that the status of this Center is pending so once that's get starting if we can see that it's now reading and the temperature currently is a nice 20 20 degrees Celsius and like we're I have this set up to read every five seconds so this time stamp will get updated as well as the temperature so now that I've got that I'm going to go ahead and this is actually reading data and sending it to pub/sub so pub/sub is kind of queuing up that data and then I can actually start my my data flow line so data flow basically – in order to start it you just run it like a regular Java program so here I'm just gonna use maven – to start it up which you girls there's also a a Eclipse plugin that you can use to run data flow jobs directly from eclipse so if if you're an eclipse person then that that's definitely something you should look into but I'm kind of a command line junkie so I like to use the command line so if you run this this will actually start a data flow and data flow will actually start up some VMs on GCE or on a compute engine in order to actually run your processing and that will kind of scale up and down based on the number of messages coming into pub/sub but it takes a few minutes to actually start at those VMs so I actually have one running here that's the kind of the cooking show way of doing things but so I've got one already running here so let's go and look at at dataflow so if you go into the cloud console and you go to Big Data and to dataflow once you've created a job you can see it's in the list of jobs in dataflow and then if you click on that job you can actually see a visualization of the of the steps within that job so here these are the three steps that I I created in my in my java program so here it shows the pub/sub read this is the the Java transform that I wrote and then this is the depart to actually write to BigTable so you could have any number of permutations of how you actually create a dataflow pipeline but I just could have created a simple one here and it doesn't have to be linear like this you can actually have from my mind transform I could actually have a step two after it to write a big table but also writes to say bigquery so bigquery is another kind of analytical database that you could use to store data and then run sequel queries over it so you could definitely do things like that that's that's kind of one of the really great and flexible things about dataflow so just going back and then I have a little sort of visualization of the the data so if I go over here so this is a little app that kind of will query a big table in real time to actually check to see the the temperature currently so this is pretty boring it's just you know basically staying the same but if we go over here we kind of like kind of squeeze this with my finger hopefully it should actually make the temperature increase here so I'm pretty hot today I guess 25 degrees so that's so you can kind of do these like sort of real-time applications as well so this is actually will show you kind of that the data is actually going through a hole all the way through the hole the whole pipeline and actually ending up in BigTable because we're actually doing a big table scan here to get the data so I'm gonna jump back to the slides so kind of just to to wrap up I wanted to go through and make sure that we checked all the boxes for our requirements for for our application so we were able to read the temperature data from the device we were able to handle a large number of devices because each of the the back end pieces pub/sub data flow and BigTable are all able to scale and handle a large number of devices we're also able to like pull out and analyze that data using BigTable we can do that in either by scanning BigTable or we can develop like I said other methods like writing directly to big bigquery and then doing SQL queries on it and doing analysis that way and then we were also been able to manage the info and status of each device using firebase so these are a number of different how you could like kind of put together different parts of the cloud to actually create a full solution for managing IOT devices or managing data

4 Comments

  1. ravi shah said:

    What kind of services is provide for IoT from google as like greengrass service in AWS

    June 26, 2019
    Reply
  2. UBN Test said:

    Would be better if we get the slides

    June 26, 2019
    Reply
  3. Ido Shamun said:

    Where can we find the source code of this talk?

    June 26, 2019
    Reply
  4. Martha Lara said:

    .

    June 26, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *