thanks for the invitation so we're studying climate informatics due to the threat of climate change and the extreme events such as extreme storms and heat waves which can cause wildfire and droughts and their effects on communities and ecosystems and this is based on a vision that machine learning can shed light on climate change so there's certainly a scientific consensus on climate change there's some things that are known but there are a variety of open questions such as how does climate change affect extreme events let's look at what is known so the Intergovernmental Panel on Climate Change this panel that advises the UN puts out reports in their 2013 report they looked at observed data and this is just looking at surface level surface temperature trends from 1900 and there has been a warming so what this means is on average temperatures increasing okay so let's have a very simplistic view this is also a plot put out by the IPCC in their summary for policy makers if we considered that we had a PDF a probability distribution for temperature and we had warming of the mean we just get a mean shift to the right from a normal PDF a Gaussian in gray to the dotted line if that's all that's happening then we're gonna get heavier tails on the hot end so higher probability of extreme hot weather but you'll see that there's would be then lower probability of extreme cold weather and since there's been a mean shift we've observed warming we might be getting some of these effects and and this effect alone the mean shift might explain some of what we're observing but of course this is just a small part of the story even of course take a normal PDF without a mean shift just by increasing the variance we can end up with heavier tails on both ends which means not only would we have a higher probability of extreme hot we'd also have a higher ability of extreme cold events this may be happening there's a lot of uncertainty around the variance and generally speaking there's a lot of uncertainty about what's going to happen in these tails because that was a very sort of idealized setting that everything was symmetric right there's an infinite number of scenarios certainly we could be in a case where we get about constant probability of extreme cold events but extreme hot events are increasing and I bring up this this problem so understanding extremes has been put forward as one of the Grand Challenges by climate scientists and so let's take this as an example of how we would apply machine learning and so say we were gonna study extremes but by definition the extreme events are are rare so we're not going to have many many sort of positive examples of them in historical data and and then we talked about simply for temperature alone ways that climate change might be changing its distribution so the trouble is now that statistics we can get from historical data may not be sufficient alone for future prediction so what's really unique about this as an application for machine learning is that scientists for over 40 years have been trying to simulate the climate so these physics based simulations can then output simulated data I'll get into these in a moment but essentially we're evolving millions of variables over over the surface of the globe over time and so now we're in a massive data setting and that's where machine learning comes in and so I want to argue that this is an interesting and sort of unique application area for machine learning and we'll get to the climate models in a second but we do have some past historical data but it's mostly very limited so you may have had a sea captain writing down temperature in a fountain pen and then it rained or the ship sink we didn't have a global grid of measurement stations instead we have proxies so their biological records like you can look at tree rings to infer nutrient properties and temperature properties similarly with corals in the ocean you can take a core of earth under a lake and it's known that different pollen species thrive under different nutrient and temperature conditions you can take a core in a glacier a long cylinder of ice which will have trapped air bubbles showing you atmospheric gas concentrations such as co2 which we care about for for understanding the atmosphere and you can also look at water isotopes to infer temperature so this is actually not a big data problem it's mostly very heterogeneous collections of small data in the past we are now in the in only a few hundred years doing global measurements we're also taking massive amounts of satellite imagery we are in the very sort of heavy measurement era but it doesn't go back too far in time but the climate model simulations the outputs from these physics driven models give us a lens into the distant past and the distant future and there are some caveats that I'll get to shortly but I think that this is sort of a unique area and we do have some critical mass in this area so climate informatics now refers to a community we had our first international workshop in 2011 we've actually had a lot of people you know from France Germany Asia the Middle East we've had a lot of international reach and I wanted to quickly plug the workshop in Boulder with a submission deadline of Monday for two to four page abstracts and we have a lot of funding for we usually try to bring students and early career scientists but if you can't make it to that I was already the deadlines xxx so I guess that's Sunday I would also mention that if this interests you on Tuesday this coming Tuesday at just yeah we're having a practice hackathon for our climate informatics hackathon and if you're interested you're certainly welcome to participate and later also the link the start it is already online okay so when I was sort of pitching this problem more to a machine learning community so I just wanted to mention the National Center for Atmospheric Research is this big Center really focused on understanding the atmosphere and so it's a great place to meet climate scientists you're a data scientist or your student is a data scientist if you can come to this first as beautiful that time of year but also it's a good place to meet a climate scientist and when we try to explain to machine learners where we could make a dent DNA mentioned that we're sort of not as far along as by over Maddox but it's like the early days of bioinformatics right anything is possible and we can try to break up the world of climate informatics into potential problems but there might be other problems as well so I mentioned that the Paleo proxy data that we have that's important to try to reconstruct past climates because we didn't have good measurements back then and to put current climates in context governments now such as the UK government is asking for climate predictions at the level of postal codes and that's much more fine-grained than the predictions that we're getting say from the physics based models so there's a whole field on down scaling I'll also mention that our sort of to our version of this tutorial is online on both my and Arindam Banerjee's website if you want to watch the video or look at the slides of some of these other topics i'm my group has mostly focused on using ensembles of these physics driven climate model predictors and try to trying to improve reduce uncertainty on ensemble predictions I'm specifically interested in space geo temporal data I'm glad John introduced non-stationarity in the beginning I'm introduced I'm interesting interested and gonna talk today about how do you learn when you have non stationarity not only if in time but also in space and I'm motivated with extremes I hope to talk about some hot off the press extremes results we have if I have time at the very end okay and the goal of this talk it's basically a variable-length talk so I'm gonna adaptively decide how much to do and that's what I'm gonna give you the take-home messages at the beginning one take-home message is that you know used to work on climate informatics it's compelling for machine learning algorithm eclis as i said i'm gonna focus on non stationarity over both time and space but along the way and sort of relevant to the workshop I just want to demonstrate that starting with an application you can still come across problems that really ask new algorithmic questions in the field of machine learning and so sort of the punch lines there that I'm going to talk about our online learning when you also have distributed or spatial or other dimensions at play and may also have non stationarity in that direction that's in those directions that's largely open what if you need to make predictions at in multiple time skills simultaneously so there's specific applications that beg this question in climate you can also look at the same sort of thing in in a financial stability and monitoring setting and there maybe create a relevant problems there and also tracking highly deformable patterns so not just object tracking that you have in computer vision now if we're looking at extreme storms hurricanes and those sorts of patterns and fluids okay so again just quickly on the on the level of punch lines can we actually learn the level of non stationarity from our data can we use a multi task approach to predict at multiple time frame simultaneously and perform better than treating each task independently and various approaches that exploit local structure where you you might want to be local in space you may also want to exploit a temporal structure okay so I do have to mention these climate models in slightly more detail so all you really need to know about them is they're trying to simulate these four major systems atmosphere ocean land and cryosphere which is processes involving ice but each of those systems itself has physical components which themselves are each a mathematical model so each of these things such as you know advection of heat from land or precipitation various processes of atmospheric gases changing in the atmosphere each one is a differential equation so a non stochastic partial differential equation the the parameters are thought to essentially be known based on first principles but there's still a lot of differences in how different modeling groups do the simulations so first off we can't resolve at scale so we're doing some kind of discord ization different modeling teams do this differently so the size of the grid box might be around 100 kilometers per side for some the vertical instead of being height could be based on a level set of pressure but this is actually a really hard problem you have massive differences in scale if you look at the coupling of the atmosphere and the ocean and think of time scale so the atmosphere circles circulates on the order of a couple of weeks the ocean circulates on the order of hundreds of years and so where they meet you have to resolve differences in time scale similarly for spacial scale so why do I bring up these differences in the models so the Intergovernmental Panel on Climate Change is informed by a lot of different models this is these are some of them and there's supposed to be a lot of text here so each model so the first one was from Princeton in 1969 so each model is 34 30 or 40 years old laboratory of people implementing these first scientific principles and modeling assumptions and descritization x' typically using Fortran so that there are these really large software systems that then you can you can run them right you can set a bunch of initial can let all the processes interact and then measure certain things in the future measure temperature humidity at certain locations etc and then you'll get these trajectories for any individual variable so this is a measure of temperature called anomaly I'll explain that shortly but it's a measure of temperature that in this case is averaged both globally and annually and the models from all these different countries are kind of predicting all over the place their mean is in red and the observed value in the past is in blue and you'll see that we we have a lot of uncertainty into the future and even though it would have been okay – maybe predict with the mean here the multi-model mean we start to have divergence and the mean divergence for most of the models so there's this question of how to combine combine the predictions and anomaly just means that you take at each location you take from the time series you subtract out the constant which is the average of that time series at that location so that when you average now these time series globally you reduce the variance as opposed to looking at raw temperatures or the raw value so here's the machine learning problem no one model predicts all the time the red curve is the average prediction that was basically found to be better than using any one fixed model from one of the countries and they've done this massive project of actually saving the outputs of these physics driven simulations which now we view as input data this is on a very grand scale so all the model simulation outputs that have been stored dwarfed all the satellite measurements that have that are stored and a lot of it hasn't been analyzed and they're starting to be Bayesian approaches etc sometimes they make relatively strong assumptions like the mean is good or there exists a good model in the ensemble etc so there was actually this interest in ensembles coming from the climate community and I had been working on in Samba 'ls and machine learning let's let's view this as a challenge for machine learning so we want to improve the predictions of the ensemble we want to predict the future value of the blue curve right so in this example the future value of temperature but it's not just a time series prediction problem because instead of just having access to this curve we have access to the simulated the the predictions from all the different models okay and this has been one of the main thrusts of my work and probably what I'll spend most of the time on certainly not going over all of these but looking specifically now at online learning let's first just consider time varying data so the the spatial effects are move removed because we've averaged over the whole globe and so we'll have will show an approach for learning from data that varies over time so learning in the non stationary setting and where we're going is that we actually have space neo temporal data that varies over both time and space and so that's where we're going so the red curve the multi model mean is saying okay we have some ensemble of models I've put five here there's now like thirty five models from all the different countries and we're going to have we're gonna bet our money equally essentially it's just a discrete uniform distribution over the models that's what the red curve is doing and if you take sort of a data assimilation approach or allow the observation to be observed at the end of each prediction interval then of course you can do an adaptive weighted average so you could update so in January all the models predict temperature for January and at the end of the month we've observed the temperature we can maybe take the squared loss between predicted and observed or just you know any loss that makes sense for a scalar and then update our weights accordingly okay so you can do this any algorithm that's going to do this though remember climate is changing the you know the the sequence of losses may be non station may be non stationary so if model B had accrued weight by having oh and by the way the weights are renormalized so if model B had the lowest loss in the first several rounds because they did the best predictions but then conditions change a little bit and model a is better whether you're predicting with the mode of the distribution or a weighted average and it's going to take a while for Model A's prediction to contribute much because it has such a low weight okay so in AI in in in terms of AI this is an instance of Explorer versus exploit so the trade-off is exploiting by predicting with the current best predicting climate model and Explorer would be always being nimble hedging your bets a little bit more and being ready to switch to another model should conditions change and this instance of Explorer exploit hinges on how often the best model the best climate model changes and so so in in my in my history I started by studying non stationary data and designing algorithms that had theory behind them and later we sort of resurrected that algorithm when I started working on on Climate applications but what we're really interested in getting to is the spatio-temporal approach so let's just first look at the temporal approach so I said the trade-off hinges on how often the identity of the best model switches so that's essentially the non stationarity of your observations and we had previously a long time ago worked on an algorithm to learn that level of non stationarity so this this whole online learning literature was kind of introduced with john by talking about bandits this is a full information setting and let me zoom in on what these algorithms are each algorithm is updating weights over the climate mod which are your experts whatever stocks etc but in this case climate model predictions and they the algorithms just differ on their setting of the switching rate parameter so this is not a Bayesian algorithm but it can be derived by writing down and appropriately defined graphical model so nodes are random variables and edges are conditional probability dependencies or distributions there's a there's a Markov chain here a hidden Markov chain among which climate model or which expert is the best and so you have now a state space of if there's 35 models 35 states and then observations are you know the temperature this is more generalized if you've seen in hmm this is more generalized because we're allowing arbitrary dependencies between the observations but to get a whole family of online learning algorithms called multiplicative weights algorithms they just fall out as Bayesian updates of this graphical model defined as follows so whatever your loss is loss is just your squared difference between predicted and observed if you're doing this climate application just view that as the negative log likelihood of the observation so as the admissions probability given the value of the of which expert is best and allow dependence on on past observations you get this as your Bayesian update I haven't talked about the transition dynamics okay but this gives you a family of online learning algorithms one of the ones you've probably heard of most is hedge so hedge says for the transition dynamics I'm just going to assume there's one best expert it turns in some dynamics in a Markov chain right is a stochastic matrix which is the number of states by the number of states and each row is going to say if I'm in a particular state at the current time give me a probability distribution over next state so to get hedge you just use the identity matrix for air transition matrix so you're saying with probability one I will be in the same state at the next time and so the probability of transition to any other state zero so that's how we would simplify this update and this update as you can see is going to drive the weights of the experts with high loss down to zero exponentially fast so if you start with uniform weights you'll quickly hone in and get a very low entropy distribution not great if you have non stationary data okay so a nice thing and actually this also relates to one of John's slides as well a nice thing that had been done in the literature is saying well instead of saying that if if you know the UK model is best in January then I assume the UK model will be best in February you could say if the UK model is best in January then with high probability it's best in February and high probability is quantified by some parameter this is exploit this is how much we're exploiting the current best model let's mix in a little bit of the Explorer let's say well let's just share some probability with all the other experts if there's n experts we'll just divide up the remaining mass with all n minus one of them okay and so the idea here though is that alpha now is is going to be the switching rate between best experts but you can't know that beforehand so might as well use a hedge or static update to learn that parameter from a set of meta experts so a deep architecture where each each expert is using each met expert is is an algorithm that's using a different value of the switching right parameter so this was just a thing I had happened to work on it so I applied it to to my domain more recently but if you're interested in this idea because we had a great motivation in the beginning of learning the level of non stationarity there's a nice survey paper that has these and other algorithms but in the context that we were trying to do which is do better than the red curve at predicting these temperature anomalies this worked well we had a lot of different experiments on historical data where it worked well how do you do a future simulation so we don't have labels in the future and so we had a climate science collaborator that that said let's use something called the I think it's called the true model or the perfect model assumption in the climate literature where you just clamp one of the models let's pretend the NASA model is the truth so it'll just be used as the label sequence it won't it'll be removed from the ensemble of experts for training um and see how well we can predict that and then just do this repeatedly with a variety of different holdouts so with respect to that fake labeling sequence there was another model the one in green that did pretty well at minimizing loss so here up is bad and the learning algorithm can do as well or sometimes even better than the best expert whose identity can't be known in hindsight okay so that's the temporal story and we want to go to the spatiotemporal story and model spatial influence and this is where I want to sort of pop out and say that this is something that really came from applications but is is open certainly in terms of the theory so John talked at the beginning about how contextual bandits don't have theory in the non-stationary case the online learning algorithms that I talked about do have theory in the non-stationary case and right now we are actually trying to work out the online plus spatial theory we have the algorithms but we're trying to back that with theory but this was largely open in machine learning okay so one thing you could do maybe let's just run the algorithm at a bunch of different places simultaneously but how do we exploit sort of local structure and this picture by the way is terrible local structure this is saying oh I'm related to my neighbors to the north south east west really there are techniques where you can learn geographical relationship so for example in South America over the Andes there's not a lot of connections or correlations on one or the other side of that range of mountains but some one will give us a neighborhood scheme and then with respect to a neighborhood scheme we can change the algorithm to to propagate neighborhood influence and now we're just changing that transition dynamics that I talked about before in the hidden Markov chain and we can we can turn it off we can turn off the geospatial influence beta and get back to the update that we had before but to the extent that we turn it on we look in our neighborhood set of a particular region and we we check how a particular climate model is doing with respect to the Algar the weight that the algorithm gave it in the neighboring region and we'll use that to potentially increase our probability of switching to that expert so what do I mean here maybe in in Paris the NASA model was predicting really well and if we're also trying to model sac clay which is outside of Paris we might want to put more weight on the NASA model because it's a nearby region and we know that the NASA model was performing well in Paris okay so this is what I'm calling a distributed online learning approach distributed spatially you could also take the fact that well to derive those online learning algorithms we appealed to Bayesian updates of a hidden Markov model and say well we should be able to get to a hidden Markov random field over the whole globe right so we had a hidden Markov model over time now we could just extend these over space have a lattice over space and also model non stationarity over space so now we have this lattice we have parameters governing or fitting parameters or learning them online for non stationarity with respect to time and non stationarity with respect to space so maybe the UK model is better in one location than in another location for example I just want to say this approach is not lightweight and distributed this is highly complex right we've got this hidden Markov random field that's evolving over time so at each time we get a whole new layer of the grid and then we're gonna have to recompute the marginals everywhere but we did it an implementation of this at at certain resolutions using Gibbs sampling so that we could compare all the methods so the method that I talked about earlier in the global setting does does not do as well with respect to minimizing annual prediction loss as the spatially explicit techniques and this is just for a global experiment that curve will go away when we consider a regional experiment because it's not a regional method and there we see so this is regional losses now averaged globally we'll see that just making things spatially explicit get some improvement the most improvement is from the Markov random field approach but given the high computational overhead we view sort of our lightweight distributed online learning algorithm as a better idea and that's the flavor that we're trying to now analyze theoretically okay so in modeling spatial influence we were exploiting spatial neighborhoods I want to take the same idea and think about exploiting neighborhoods in time and this is again where I want to pop out and say from applications we talked about a problem that we think is also interesting and has not been very well studied in machine learning so in climate we had data of the form or climate predictions of the form where each model so the NASA model the UK model the French model etc has to make a prediction one month in advance but also two months three months four months so it has to output an ensemble of 11 or 12 predictions at every time so in January I have to predict February through December so 11 different predictions but in February I again output 11 different project predictions so the idea was this is this means that we're solving a bunch of tasks simultaneously how could we apply multi task learning so that our prediction at any of the of the of the desired time frames is is more skillful than treating them independent independently so here we we extended from online a multitask online learning approach you do have to specify your task similarity matrix so I would say it's future work to maybe learn this and for now we were just assuming some locality in time so we're basically saying if the French model is good at predicting in April from January then that's also assumed that it's relatively good at predicting March and May so you have locality in time monthly and there's going to be some some parameter of how much temporal influence you include between tasks and now you can sort of forget about what we said about learning the level of non stationarity here we're just going to use standard hedge but then modify it to look at the similarity matrix between tasks to get the multitask version and we varied the parameter s and now down is good because we're looking at improvements against standard hedge and for all of the forecast period so this is predicting at two weeks one point five months two point five months etc we get improvement for some s and only on two of the tasks do we eventually get degradation after some value of the parameter for the short tasks that can be explained by the fact that if you're doing the two-week task you have had the freshest data the last time you updated so the the other tasks aren't going to help you that much and we also did a variant of this to predict the volatility of the Dow from from implied volatility on the individual stocks at a lag okay Wow okay I was so much faster than I thought awesome because I wanted to discuss some some recent work this is an ongoing project here in Paris with my postdoc so if if our resin and we have an m2 intern mo young and we're collaborating with guillaume sharp yacht at lri and so there was interest sort of let's see wait how much time do I have oh okay there's some deleted slides that maybe I can include but so you you had all these things where you know Puerto Rico was hit twice and it didn't know to prepare or was it Cuba there were there were a lot of different very unpredictable storm tracks and can we use machine learning in this setting and so we are gonna have storm track data and maybe we can also consider fields field data that we have like wind field temperature field pressure fields etc and so can we predict tracks and intensification x' and is deep learning an effective approach here and so first I wanted to mention with respect to machine learning or computer vision there are actually papers on tracking deformable objects but typically that means like a person or an animal or a soccer ball that will just deform a little bit but not just you know some kind of trace of fluid and sort of more of a fluid dynamic setting actually if people know about machine learning algorithms for this setting please let me know and then on the climate side there's essentially I would call nearest neighbor techniques this idea of method of analogs let's just stored it data on all past storms and then given a new storm let's see which one it looks like the most in in whatever setting we're looking at and they're starting to be said machine learning approaches there was an approach they got a lot of attention but it wasn't looking at time at all so also a plug for Tuesday's hackathon we have all these storm tracks so it's measurements of location there's a few other things like wind speed that are measured every six hours on storms since 1979 tropical depression or above so goes tropical depression tropical storm and these are the categories of extreme storms and it's either a hurricane or cyclone or a typhoon depending on where you live that's just language okay so what we have results for now is a deep learning model where we wanted to take into account image data that's what deep learning is good for so you can get images by looking at the field of some variable so we're looking at the field of U and V wind speed so a field for each and we're looking at three level sets of pressure and for now we're just looking at it at the current time and the previous time which is six hours ago and for that we're able to do convolutions right because we have image data we're using rail U and batch norm and then we're doing some fully connected layers at the end to predict just an XY coordinate for the next location of the center of the storm so what's really great about this data is that we are getting where the storm center is by some domain experts that you know can determine that and so we just want to predict the X Y for the next Center so this is not from the storm track data this is data that we created by centering boxes around the X Y position of the center then we can also use track data and these are not images so we just do fully a fully connected neural network here and train them each first we train these each independently to predict the XY coordinate of where's the storm gonna be six hours from now and then we combine them and do some fully connected layers to get down to a final four and I mean these are very sort of like trial and error deep learning things and the first time that we've presented it but I will say that these the pre training is really important but neither of these networks worked the best independently so you can drill down and look at some of the predicted tracks so here's an actual storm track in black the baseline is just saying that the displacement from the next six hours will be the same like in the same direction and will end up as the displacement from the previous six hours and that's actually one of the features that's input to the whole neural network so we have the baseline as one of our features so some so often our prediction is is better than that so the the baseline might be saying okay to continue on this trajectory but we've obviously learned something that could lead to a change so these ones are pretty good they're you know as you can see this is just saying add a vector of the same direction to the previous time that's the baseline but the prediction is doing something a little bit more interesting and you know some of these storm tracks are relatively difficult the prediction is still doing okay it can also get messed up though so here the red did better at sort of following this sudden curve but then when the storm actually did straighten out our deep learning is telling it to go off somewhere else whereas just predicting the previous displacement would have been okay and so in terms of cumulative results here's how the test errors compare of the baseline which is just using the previous displacement so we don't have plots yet comparing to any other methods other than just using the previous displacement and so none of the individual networks could do as well as the fuse network this is test error and then when we restrict to although we train on the whole data in our which is obviously different from our test set in the test set we only test on those with wind speed greater than 64 knots so those that classify as hurricanes because this compares to the wetter weather literature then this is the result it's similar okay so quick to go messages tried to get you interested in climate informatics talked about learning the level of non-stationarity in both time and space and exploiting local structure in space and time although there's open questions and these are some of the open some of the new questions from machine learning that our research addressed but we're completely spurred by applications and so for anyway so if people are interested in the hackathon please see me after and I'll get you hooked up with that and here's information on the climate informatics community so any any question for a clear yep us so thank you for your talk in terms of the the answer in the answer though method that you talk about is there a reason why you are using a lattice model for the spatial part or is there a reason why you're not using a more continuous spatial model for that I think we would probably get better results by learning that that field and a random Banerjee's group has done that so I mentioned so they did it in South America for example and found that there were no edges over the Andes so you could start by learning your graphical model spatially and then and then do your updates from that that's a combination of sort of our work and their work that I think would be interesting to do I was just wondering to what extent the climatic dynamics is taking to account by the domain expert models so how much of the let's say how much historical data they it's oh I love this question okay so the first time I put this plot of the data which is towards the beginning up it was at Snowbird and Yamla Coon had invited me and he but he was in the back of the room and he stood up and said it looks like they're fitting to the past data right we got this big fan out so first yes so first one caveat when I mentioned computing anomalies where you take an average from a benchmark period and subtract it from each time series per location the benchmark period is actually in this region and that's why you get values close to zero here and you generally get a tightening if that's where you take your anomalies but yes what has happened at least from anecdotal evidence that I can understand is that if there was some kind of competition and a model didn't perform well those modelers would say oh you know why don't we simulate some more aerosols or something so there's no direct as we view it tuning there's no data-driven tuning so they're not using input data to change their predictions here using observe data but there's it's basically hand done if if a model is consistently performing badly they might say oh we need to improve we we need to change one of the parts of the model that our results are sensitive to I mentioned aerosols because actually the results are usually quite sensitive to that and it's very hard to monitor so yes that that it certainly happened luckily in terms of the theory the their regret bounced or relative performance guarantees and they hold even if experts or predictors are completely correlated another even bigger correlation though is what is a model it's all these different components well the sea-ice component that is implemented in one country is used in several countries for sea ice so and there's a there was a software engineer that did a study on the code base overlaps so the so some of the models overlap in that way as well thanks for the talk I have a one concern actually we've talked about causality today what I mean from my point of view there is why why don't you consider causality to study interactions between model and this exact setting I mean there's metrological models eventually which leads to a geological or model so why don't we consider causality have you considered actually there's a big climate informatics group considering causality I would refer you to Emmy Ebert OOP off at Colorado State so they did this beautiful work they were looking at various variables having to do with circulation in the northern hemisphere and they would learn a Bayes net and it was actually I thought sort of interesting use of Bayes Nets because usually you're trying to use them generatively or for classification etc or predictively they were learning from the learned structure so she learned this Bayes net at at different times deleted edges that she could disprove as not being causal of course you can never prove an edge as causal looked at the persistent edges and they had a finding that storm tracks are moving north in the northern hemisphere with climate change now this is known in the meteorological literature but it was a fully sort of causal Bayesian approach it's just not what my group has worked I was working on these lightweight online learning algorithms and regret bounds I'm coming more from a community of John but yeah causality is really important and I think we've we've had various speakers from exactly that field so I would highly encourage you to look at the work of emaii Burt OOP off hi thanks for the talk I have this very weird question about models you just hinted on this just a bit earlier how how much change of parameters is there going on in each of these models that you do not see eventually that you're essentially trying to fit to or I mean you're trying to apply your methods to but how do you have trust on any of these models that have essentially changed over time okay so there's two ways your question could be interpreted so I'll answer both ways in a single run so if we look just at this purple curve that comes from say Britain the parameters don't change they're fixed in all the differential equations okay but you may make a very good point well I'll say two things one is ensembles of models from different countries is one way that models are studied that ensembles are studied but actually a lot of what they're doing is saying you know in the UK with our model let's take one parameter say something to do with ice melting or something and let's vary that slightly and now we get an ensemble of simulations from that so they're interested in studying ensembles that way they're also saying even fixing the parameters if we change initial conditions slightly now initial conditions actually are sometimes based on on data observations we get drastically different changes and then you can throw a machine learning at it to do all sorts of things with those ensembles of course we as the data miners don't really care how the data was generated but it's it's frustrating and they have a field on uncertainty quantification you can you can quantify uncertainty with respect to the distribution that you whatever your perturbed and you can get nice schedules for exploring the parameter space that you're exploring but it becomes existential or philosophical how to draw the boundaries of the space that you're exploring and this we've seen this in finance that's yeah it'd be fun to sort of chat about but there's not much more we can kind of say about that I'm sorry to ask the second question then how do you feel like researchers in Britain for instance are interested in that particular parameter essentially changed their point of view with regard to whether or not you should be studying this more or not based on the fact that essentially the model has been doing well or they have been all in doing all these computations my question comes down to a research wise people that are on the methodological side of things end up trying to spend most of their lives on very specific parameters and they are essentially putting their input into those models you seem to be saying that eventually you we cannot care about this what's the oh no I'm acknowledging it as a problem I think what is new is to come in with more tools than they had been using before for data analytics and to also come in with using observed data in addition to the physics based models but I've had modelers I've given talks about this to climate scientists and I've had modelers say you know we're working on the next generation of climate models all this funding is going into it what if you put the new generation climate models and the old generation climate models all together in an ensemble and see which ones are you know get more weight etc which you know I didn't think I was gonna get a paper out of it that in machine learning but there's a lot of sort of interesting issues there with I've been actually arguing that I mean I definitely want the modeling effort to continue and to continue it to be funded it's important to argue for that especially in my country right now but I sort of build data analytics as most the cheapest way to unlock insights from all this data observed and modeled that we have that hasn't really fully been exploited [Applause]

## Be First to Comment