David Simmons [InfluxData] | InfluxDB for IoT



okay so it's after lunch and you're all health full bellies and you're gonna be asleep in about ten minutes so I'm fully aware of that and I won't you but we also don't have a whole lot of time so I want to go ahead and get through this I it typically takes me a lot longer to get through this so this may go a little more quickly than so I'm gonna talk a little bit about sort of IOT architecture as it relates to influx DB and and how you might organize your your architecture and your deployment of influx DB based on your on your IOT needs so there's a couple of things to keep in mind centralized data collection versus distributed data collection are you going to collect most of your data out at the edge are you gonna send everything back to the cloud right and a lot of that is going to depend on how the data is going to be used and who needs access to the data as well as some of the infrastructure that you have in place to do that so all of these things have to be sort of looked into before you decide I'm going to do all my data collection in the cloud right well there's a whole bunch of things you need to think about before you make that decision so let's say you want to collect it all in the cloud so you're gonna send everything from every IOT device to your cloud instance right okay so you need a highly available network you need a low latency network you probably need a cheap network because you're going to be sending a lot of data over your network to some you know remote instance so you need it to be fairly cheap and definitely high availability otherwise you will lose lots of data low latency also and depending on how critical your data is and especially how critical the alerts on that data are you may want to rethink that and I'll give some examples of that as well the benefit of that is I can do my analysis and my and my visualizations from anywhere right I can build dashboards based on one central source of truth and I can distribute those dashboards and everybody can see those dashboards and they can do their analysis on a central source of truth from all the data from all my sensors from anywhere right so there's an advantage to but depending on things like network availability latency cost and things like that I might need to collect it much closer to the source out at the edge the example I like to use of this is I talked to a customer that was they put oil rigs out in the Gulf of Mexico ok their backhaul network is a high latency extremely expensive satellite uplink they don't really want to send all of their pump data over this satellite network to a back-end system and then back down to the oil rig to do analysis for alerts that can be a deadly situation right pump pressure skyrockets and they're waiting for the satellite to make another trip around the globe before the data can get back to the data center and the alert can be generated and then one more trip around before it gets back down and by the time the alert happens they had a bigger alert which was the thing caught on fire and exploded right that's not the kind of alert you want so do it being able to do the data collection out at the edge is really important being able to do not just the data collection but the analysis and the visualization and the alerting out at the edge is really important in some scenarios you may want to do it let's say your multiple you're monitoring multiple oil rigs you may need to collect data at lots of different collection points out at the edge and then sort of feed those back to a let's say a more central location that's not necessarily a back-end cloud situation and then from there feed those back to a cloud right so there's this there's there can be this layered architecture and this is what I call a data layer IOT right so that I can collect my data I can store my data I can analyze my data I can visualize my data and I can alert on my data anywhere along this whole collection and and storage architecture whether it's at the extreme edge on an embedded piece of hardware out you know embedded in a wall somewhere I can still collect and store my data there and I can still generate alerts there or it's all the way back in the cloud and I can I can pack all my data back to the cloud and I can generate some alerts back there and this gets back to who needs access to the data who needs access to the visualizations who needs access to which parts of the data and how do they need access to that data right so this distributed version gives you in in a lot of ways gives you a more fault-tolerant data collection architecture right because it's unlikely that all of my edge devices are gonna go down at once my network may go down but at least I can store my data locally on those edge devices and I can generate alerts and I can keep them oil rig from catching fire and I can keep people from dying and when the network comes back online I can backhaul some of that data back to the data center so that the business people can see that yay we didn't explode today right and that's what they need to see is yay we didn't explode I'm on the oil rig I need to see that we're about to and I need to see very highly granular data what that really regular intervals right back in data center people they just need to know that the that the rig functioned all day then and we were able to produce oil all day and nobody died and and it's all good right all right and I want to be able to see that from all of my oil rigs I use a shop floor example as well right I have a person that is responsible for ten machines in a shop floor right and they all have a bunch of sensors on it and they need a dashboard that shows that all 10 of those machines are functioning correctly and are not about to you have a bearing failure and everything's going okay right and they need that local machine access they need that local data access local dashboards local alerting and then you can feed that data you can down sample that data and feed that data back into sort of a factory wide system so the person who's responsible for the factory can see that all of the various areas of them of the shop floor were working correctly and and no alerts were generated and we're all good and we produced everything we were supposed to produce today right and then that data can be down sampled and fed back to a larger system where the person responsible for all the factories can see that all the factories were working correctly and I can do different dashboarding so it's who needs access to what data and and the immediacy of that data can really inform how you are going to lay out your data architecture so I think I already went over this I just didn't get to the slide right of deciding who needs the data where they need it what are the costs involved in sending that you know millisecond level temperature data all the way back to the cloud versus storing it locally one of the things that you get with in flux DB that you don't get with a lot of these other systems is that you can deploy the entire stack on very limited hardware you can deploy the entire stack on a Raspberry Pi very easily anybody remember the Intel Edison no you do yeah long live the Edison thank God it's gone but it was a very small piece of hardware and I ran the entire in flux stack on that very small piece of embedded hardware so I can deploy this stuff to a whole bunch of different locations in a whole bunch of different scenarios and the nice thing about it is the same software it's the same stack it's the same deployment model no matter where I'm doing it whether it's on embedded hardware or it's on multi node you know server instance so I'm gonna blind you with this slide because it really only works with a white background and I'm sorry about that so I can have a bunch of these sensors out here right and those sensors can feed back to a a you know an access point let's say and that access point can feed it back into a you know over the Internet to the cloud and here's the cool part right I can use I can do Telegraph to collect my data out on that edge node and I can run it in the cloud to collect all the data from those edge nodes right I can run in flux DB on that edge node for short-term storage of my data and I can run it in the cloud for the long-term storage so I can have both I can run capacitor as long as I'm not running the two dotto stuff I can run that out on the edge for my local alerts to keep my oil rig from catching fire and I can do my sort of system-wide alerts running capacitor on the backend right and same thing with chronograph I can have local dash boards out at the edge and I can have sort of system-wide dashboards back in the data center and again it's the same software it's the same stack and it runs anywhere along that entire architecture right that's there's a lot of actual value in having that same stack being multiple deployed in multiple places right I don't have to have different people that are trained differently to go and deploy this stuff on the thing that's based in the wall versus the people that are deploying it in the data center so things that are really important I talked about I was actually talking to somebody yesterday about this when people talk about the IOT they generally talk about you know how many billions of devices are gonna be deployed over the next 10 years right I don't know what number Gartner and Forrester are making up this week it's probably somewhere between 20 and 50 billion devices in next five to ten years right it the numbers vary when you know depending on how they feel that day if you're a device manufacturer that's really cool to know that number but if you're actually start to think about it the number of devices is really kind of trivial in terms of what's going to happen with IOT because each of those devices is gonna start generating a data stream at least one data stream typically anywhere between six and maybe fifteen data streams per device depending on how many sensors are on that device so let's average it at ten data streams per device and if you go with twenty billion devices each of them getting out ten data streams you're talking 200 billion data streams good luck with that that's much more important number than the number of devices what are you going to do with that data how are you going to ingest all that data how are you going to store that data how are you going to analyze that data and and react to that data right high volume data collection high volume data acquisition storage analysis and alerting is really key to making any of that work and it's actually the whole reason that I work at influx data is because I've been doing IOT for a really long time and the typical answer to what do you do with your data was some sort of some version of mumble something mumble analytics right but when you actually drill down to what are you storing it in and how are you gonna deal with it you know well maybe I'm putting it in you know MongoDB good luck with that how many people are doing MongoDB stop doing that really our benchmark tests that we're actually run by a third party MongoDB can in just about 25,000 data points per second which sounds great right same exact hardware same exact data set in flux DB does about 1.4 million data points per second right when you're talking this level of data coming at you know this velocity you need something that's gonna be able to ingest that data at that velocity for a sustained amount of time the thing about IOT data is once I put that sensor out there and once I turn it on and it turns on it's ten data streams it's never gonna stop it's never gonna shut up ever ever right kind of like me up here it is never gonna stop so being able to handle that not just immediately but long-term you know for the next 20 years is really important to how you acquire store analyze visualize your data right so here's a little dashboard that I use for and I have to first of all apologize I generally have a demo that I run and I have a bunch of sensors up here and you can see there's a display up here that shows you the relative humidity and the temperature in this room except that somebody dropped this repeatedly so one of the 7-segment displays is a little wonky it's actually 33 C in here so one of the ones I normally run has a co2 monitor in in it and yesterday I was running the co2 monitor in the big room out there and we were at about 600 parts per million which is actually quite good and then it developed a short somewhere in it and it quit putting out data and I you know 3,000 miles from home and I don't have my soldering iron or any of my stuff so I can't fix it right now sorry about that but this is a dashboard that I run and I'll do a short demo and I'll show you the dashboards that I run the top one actually it says raw versus compensated co2 and one of the things that I have been able to do with in flux DV 2.0 using flux is what's called cross measurement math so now I can do math across measurements so I can take all of this data I can take the temperature data and the pressure data that's coming from sensor and storing it one place and I can take the co2 data that's coming from a different sensor and stored in a different measure measurement and I can compare those two and I can do an ideal gas law calculation in real time so I can compensate the raw co2 levels for temperature and pressure and basically calibrate the sensor in real time in the visualization which is really cool I couldn't do that with without flux so I can do that now it's a it's it's a hairball of flux right now but it runs really well and I can I can do an ideal gas law calculation in real time for for my gas cow by my co2 compensation the other thing I get to do with all of this and it comes sort of free is I get to monitor the platform all right because if I'm collecting all this data and I'm storing it and I'm analyzing it I'm doing alerts it's really important to know that the platform that I'm running it on is also healthy one of the things that I'm not showing in this dashboard and it's because it hasn't shipped in telegraph yet because I haven't fixed the test suite so that it passes everything it works it just doesn't pass the test week is monitoring the Wi-Fi interfaces right on a remote embedded system that may have you know Wi-Fi connected sensors it's really important to monitor that part of the infrastructure because if I've set a Deadman alert on a you know a carbon monoxide detector that's way up and event somewhere and I suddenly stop getting data from it it makes a huge difference to me whether the sensor died or the Wi-Fi that was providing the connectivity that the sensor died right if the sensor died I've got a roll a truck with a technician that's small enough to go up in the duct and get it and fix it and get back out right if it's the Wi-Fi I just have to call somebody local and tell them to go kick the Wi-Fi router and it comes back up and everything's happy right but knowing the difference between those two is really important so I'm going to talk a little bit of it did has people gone to flux talks already know a little bit about flux because we've been talking about I'm sure people have been talking about flux all you know the last couple of days so I'm not going to bore you too much but it is supposed to be usable readable composable contribute Abul which is actually quite important and part of the demo I'll show is is a contribute contribution that I've made to flux separately so one last thing about IOT data these three things I have said about IOT data for the last 15 years it's got to be timely if I'm just collecting data and I'm not in it and it's like it's you know arriving 45 minutes late then how useful is it right it's got to be accurate I need to know that it's accurate and the most important thing is it's got to be actionable if I'm not taking action on the data that I'm collecting then why am i collecting it right and the example I use for this is a factory automation group that that we talked to and they were collecting all of this data and they weren't doing anything with it so they were collecting vibrational analysis data on some of their machines and one of their machines blew a bearing and it went down and it took the whole shop floor down for a week because they had to oh no the Machine died now we have to order the part and the part has to get here and then we have to fix it and now we can bring the shop floor back up and we just lost about a million dollars for having been shut down for a week turns out they were collecting the vibrational data the vibrational analysis data and it would have told them had they been paying attention to it that that bearing was about to go but they weren't doing anything with it it wasn't actionable had they been paying attention to it had it been actionable they would have seen that it was about to go they could have ordered the part scheduled the maintenance they could have shut the place down for maybe an hour while they replaced the part and then brought it all back up and they would have lost maybe $10,000 instead of a million huge difference in making your data actionable and and accurate and timely versus just collecting it right if you're just collecting it you're just keeping disk storage companies in business and you're not helping yourself and the whole point of IOT is to monitor the environment and and your systems so that you can take action on that and get some business value out of it all right so this was basically supposed to be a workshop a hands-on workshop alright and you were supposed to have influx 2.0 installed before you got here at least according to the prerequisites but going through a hands-on exercise in the equivalent of 35 minutes is really really tough to do so the the last time I did a hands-on workshop involving installing and running and configuring influx 2.0 it was a three hour workshop and compressing that into 40 minutes is is pretty tough right so there's a couple of ways that you can do this of course there's docker you can install the docker container you can just run that you can do a download and install on my Mac I don't run it in docker I just download the binary and run the binary because whenever we do a spin of the 2.0 alpha all I have to do is remove the old binary install the new one and and go run that and sted right same thing for Linux you can just run it locally if you're on Windows I apologize we're we don't really support Windows for running and especially for doing deployments of influx DB and finally you can go and sign up for a free account on the influx to cloud instance and this is pretty simple way to do it just go sign up for a cloud instance you get it's it's fairly limited in the amount of data that you can write to it and the retention policy you get but it's a great way to sort of test it out and write data to something and build some visualizations and see how you can use it right and it's actually free it's not give us a credit card and we'll let you use it for a while at which point we'll start charging you it's it's free you don't have to have a credit card to sign up and you can use it in the limited form for as long as you want and if you want to upgrade later then then we'll do that for you right it's all set up via the browser so it's really simple to set up it's a couple of clicks you know click getting started add a user give the user a password and a couple of other fields and you're ready to go and configure a data collector right so that you can start collecting data because what good is it without collecting data right and telegraph is super simple to set up now the database itself will generate a telegraph configuration for you and so all you have to do is run that configuration decide what you want to monitor this is just going to set up sort of the monitoring of the platform right and it creates a telegraph configuration for you and you can run that configuration and start pumping data into influx DB some really important points that I like to point out as of 2.0 security is on by default and it can't be turned off I've tried it can't be turned off this is a good thing and it's all all of the accesses token-based right so any right to the database any read from the database has to present a valid token or it will not succeed that you can generate different kinds of tokens so you can generate tokens for clients that are right only that all they can do is stream data into the database you can generate tokens that are read/write and then you can generate sort of the uber token that can do anything like create measurements and create buckets and delete things and stuff like that but every single request of the database has to include a token and that's different in 2.0 than it was in one dotto and that's actually kind of important for IOT access because each access from your IOT device has to include a token or it can't write to the database we didn't install telegraph so all my slides will be available by the way so if you want to go through this later and and sort of work through this building the instance you'll have full access to the slides and how to do this and you can work through all of this later all right and I start Telegraph from the command line and it starts writing data to influx DB immediately and then I can go off and I can create dashboards right and I can actually create some if we we ship with some templates for dashboards like system metrics and other metrics that are shipped as templates and you can just start one of those up and you can see how your how your instance is running so we have these pre-built system dashboards right we have some metrics of influx itself again part of what's important when you're if you have an IOT deployment is not just the data that you're writing but you really need to monitor the whole platform right I need to make sure that the system it's running on is healthy and I need to make sure that the database is healthy right I need to monitor all of that especially if my data collection is critical things like oil rigs that can explode you know things that can have safety implications I need to know that not only am i collecting and generating alerts based on that data but I need to know that the system that it's running on is healthy and is not oversubscribed and that the database that's that's behind it all is also healthy right in flux QL is dead right well long live flux you can now use flux with one dot X line as well right tick script is dead and we've moved to tasks I don't know anybody I have not met anybody yet that will miss tick script if you will miss tick script feel free to raise your hand I won't shame you but I certainly won't we've been through this so some things to keep in mind as you're moving to flux is everything's now a bucket all of your data goes into buckets right and we use this pipe forward operator to chain things together to chain operations together in flux and everything is returned in tables so if you're using the chronograph front end one of the little buttons that you'll see as you're building your query is you can click a little button and it says show raw data and when you do that what you see are tables because now everything is returned in tables and depending on what you're querying you may be returned multiple tables for a single query right so here's a quick example you know always have our time range I think if you were intense talk before lunch you will understand the importance of having a time range not having a time range can very easily send you into the out of memory in loop right because you'll just bring the machine to its knees and then I'm gonna filter on measurement so I'm gonna get the CPU and the field the usage system and I only want the CPU total and then I'm gonna call yield now in chronograph in the UI you don't actually have have to call yield the UI assumes that the last thing to do is call it yield and it will do that for you what used to be called you know capacitor is now being handled by flux tasks and a flux task is basically just a flux query that is run at intervals and that interval is completely configurable so this demo that I have running I have a bunch of tasks for CPU for gas for humidity and temperature and I'm sorry that this is so small I've actually written an output from flux to mqtt so now I can query the database and I can get back values and I can write those values out to mq GT basically as alerts so what I have up here is I have a co2 sensor that's no longer measuring co2 and I have a temperature humidity and atmospheric pressure and light sensor that's actually working and those are sending data directly to in flux DB and they're actually sending data directly to an influx DB instance that is running in a data center I have a server in a data center back in North America and it's writing all that data directly to that data center and then I've got these tasks running that are querying that data roughly every 4 seconds and writing out an alert to an mqtt broker which this thing is listening to so it's telling me that we are at sixty two point three percent relative humidity I don't know if that's good or bad for London but it's better than it was yesterday when it was raining and we're at 33 degrees in here and you'll if you've been watching this you'll probably have noticed that it changes every now and then you think it's 23 okay so there's I'll have to admit to this now the temperature sensors in this little box and this little box has a small processor in it that generates a fair amount of heat so it's actually off by about 8 degrees right so you caught me on that one but if I breathe into the box there's about a five to seven second delay we should see those numbers start to change in some direction at least yeah we'll see so let's actually let me go to the demo oh yeah no we were right we're at 23 it's that 25th says at 24 now and there was a small spike in in humidity right here so you can see these dashboards these are real-time dashboards right one of the things I did with these and and I think it's the reason that this thing had a short and no longer will read co2 is I redid these last week so that they don't have to be plugged in to actually work and so not only down here you'll see I have some these sensors are actually monitoring their own battery health and they're updating their battery health and it appears that this one the reason that one's not updating anymore is because it has lost its mind and is no longer reporting data so all I can reset it and you'll see that the co2 is we're reporting no results and that's because it's shorted out in the sensor is no longer working so it runs through a test pattern finds Wi-Fi and remembers what it's supposed to be looking for right sixty two point three one and what do you know we're at sixty two point three one right so it's a this is basically a complete IOT not very useful IOT deployment right i'm collecting this data I'm storing it I'm actually going I'm not doing local local collection and storage I'm sending it all the way back to a data center halfway across the planet and then I'm sending my alerts back here so that I can sort of monitor all of this right it's kind of important with things like co2 it turns out that and I'm really I'm I'm really upset that my co2 monitor isn't working because it's really handy to have I I travel with it and I take it to meetings and I always run it during meetings because there are certain things that happen to human beings based on co2 concentration in the atmosphere and over about a thousand parts per million research has shown that decision-making is compromised it turns out if you're in a small conference room with a bunch of people like having a business meeting in a small small conference room typically the co2 concentration in that room is about 1,200 parts per million in 1500 parts per million so you're having a meeting and by definition decision-making is compromised I was giving this workshop in Spain last month and in the room it was very poorly ventilated we had about half as many people in a room about quarter this size and co2 is at 3200 parts per million and it was really tough to keep people awake because people start to nod off at about 2,500 parts per million people really start to be unable to concentrate so that's actually why I typically used that sensor and that's one so annoyed that it's not working right I was actually thinking since I can't do my ideal gas law calculation I was thinking one of the things I would do is I would we would sort of live code a you know the heat index right the temperature versus humidity to get the heat index and we could do that in flux and we could get the heat index so I went during the break and looked up what is the calculation to get the heat index and that's the calculation from the US National Weather Service to get the heat index and so I said we are not doing that in real time not because it's not possible it's just a real pain and doing that in flux is not something I thought I would like to do in real time and inflict on you so I hope you appreciate that I'm not inflicting that on you I am just about out of time so here's my task example in something that you can read right I'm I've got a range I'm looking for the measurement that I want I'm sending it to MQTT and this MQTT part of flux I need to give it to broker the topic a client ID and the format and the values and it will then send that information out to the MQTT broker it's really handy this will be coming in a future version of flux again I just have to finish the test suite so we'll get to that if you are writing in arduino if you're playing with arduino z' there is a library for writing directly to in flux DB that library has now been updated for in flux DB 200 and basically now you have to set the bucket set the version set the organization set the port and set the token once you've done those things you can create a row to go into the database you can add your values add your tags and you can either write single rows or you can prepare them into batch mode and write a batch and that will go directly to influx DB it's super simple to do that's as you can see well it's you know you can do it in very few lines of code depending on how many values and how many tags you want to add right this is actually exactly from this sensor of how I'm getting the data from this sensor into the database and in fact I created two instances of the in flux object one for writing to my local laptop here and one for writing to the data center so it writes to both at once so that if Wi-Fi is down going to the internet I can at least run it locally it will write to the local database and those two de to those to the local and the remote would get the same data at the same time right and if the remote one fails I don't care because it's going locally and I finished almost in time do I have questions yes that is a it's that comes down to a sort of a personal choice or architecture decision right do I do I want to write directly to the database do I think I need an inter me intermediary like MQTT or Telegraph and I've done both right depending on you know just how I felt that day all of all of the above work right yeah yeah any other questions yes so when I've done the the you know this multi-tiered deployment what I typically see and and what I recommend is so syncing it to the cloud is I don't necessarily need or want to send that really really you know millisecond level data to the cloud so downsample it right and I can down sample that and especially in flux I can down sample that and I can write it to write it out to in flux DB and I can actually write it out to a remote instance of in flux DB so I can just chip it off that way right there are other things that I could do to compensate for network availability we have an integration with Apache knife I so I can use that I can use it off kind of pipeline things so there's different ways to compensate for whether or not I have a reliable backhaul network or not but typically I'm going to down sample my data before I send it back to an upstream instance there was another question over here oh so I got that one no mostly because when I'm talking about storing it at the edge I'm talking about doing it in a small embedded environment and having an enterprise version with you know means I'm basically trying to build a like a Raspberry Pi cluster to have an enterprise version running at the edge which you can do but usually a single instance you know open-source version at the at the extreme edge is fine and then no you're not going to have this you know a cluster to sync your data across so you're gonna have to find another way to down sample and and upstream your data yes right now you have to implement your own mechanism for that from an open-source version to an enterprise version yeah you had a question basically a task is now a continuous query so yeah you'd use a task yeah I believe that continuous queries are and I don't know if we have one of the engineers in the room to answer this question I believe that it continuous queries are going to be are basically turned into tasks now that you just run at a very high frequency yeah right mm-hmm great well thank you very much I really appreciate your coming today Thanks [Applause]

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *