Signal Processing in Financial Applications | Dr Tuncer Aysal | Big Data Analytics Conference 2015

yeah yeah thank you for having me so I feel like I'm kind of the odd one out someone from an academia from industry so I thought I'll spend one slide and what we do and who we are at Winton Winton is effectively a global investment management company but it's fully driven by data analysis and we mainly trade futures and stocks all around the globe it's fully systematic so we don't have really these tall and handsome and guys in suits making decision into what by – goal and self SNP are things like that it's completely systematic and as a consequence data collection and processing and analysis of it is is at the heart of the company as usual it always starts for us is by collecting data so we collect this massive amounts of historical data going for wheat prices for maker or micro second order book level data and then obviously we spent a lot of time trying to cleaning and organizing that data set putting into databases making sure that we got the right prices for the right assets and that takes a lot of effort for in our company and finally then you can get to a stage where you start thinking about ideas what's driving the markets and how we can start thinking about predicting where the markets are going to go and as a consequence we do a lot of testing and peer reviewing once we come up with any ideas just like a normal in their publications world you would do and finally once we happy to that we found something then we put that strategy into live and and that's executed fully automatic fashion as well I just thought I wanted to get I wanted to give you guys an idea of what sort of data we look at because generally there might be some misconceptions into what type of data that financial data can be and let's just start with this Apple prices for instance so say we're interested in predicting where Apple is going to go next week or next month one thing you might be interested in thinking oh well let me just look at what Apple has been doing historically so you can pull out the state even from your home finance and say oh well it looks like it's been going up recently maybe we'll continue to go up so that will be more one way of trying to predict where this thing is going but there might be some other interesting data out there they might help you to predict this so maybe you might think I'm not saying this is the case but you might think that the maybe the company's balance sheet is relevant maybe when they recently declared that they made I don't know 17 billion in their earnings is maybe it's irrelevant information in terms of where this company is going you can also maybe say all or they tangible assets has decreased compared to last time they they declared this balance sheet so that you might think that's a relevant information well you might also say well Apple RVs these selling tablets and iPhones all around the world so they're selling well according to that this data said forty percent of the date forty percent of their sales is coming from the United States or or North America but it's a good chunk coming from Europe and Asia so you might think oh well maybe Apple where Apple is going in terms of their prices is relevant it's important to know what's going on in Euro USD exchange rate because if if that goes really high maybe European Europeans when I can be able to buy iPhones anymore it's gonna be really expensive for them etc etc so you might think the effects rate is a relevant predictor for where where the Apple prices are going but you might also wonder that all that information was more directly relevant to Apple but there might be also some other stuff going around Apple that's relevant to where Apple prices are going so here's a little example for instance a little customer supply chain this is within the information technology sector you can see that for instance an arrow here indicates that that particular company is supplying something to that company on the other hand freeze you can see QLogic is supplying to HP AMD is supplying to HP so those are the companies that are supplying some some services or some product to those companies so you might say well if AMD is not able to sell any chips anymore probably HP is not able to sell anymore computers so you might say well okay and clearly it's not just HP's prices is relevant you might think about well it was going on an AMD's world is also relevant to predict what's what's going to happen for SMP in the long run I will put H at HP in the long run so there's information not just particular to that particular stock but there are other informations around that stock there might be relevant for predicting that stock you might also say well the market share of Android has been increasing compared to recently compared to recently well that might indicate for the future that maybe Apple is not going to be able to sell enough iPhones and that means that's gonna have some impact on the future prices again so there's loads of lots of source of information that might be relevant to predict stock prices the other thing well maybe the global economy is interesting so when the when the economy is contracting maybe particular sectors like health care staples because you still have to buy bread and drink water so you know maybe those companies are still going to do ok but the companies require a lot of capital like IT or energies maybe they're not going to do so well so maybe where the global economies go where it's going is also relevant in order to predict what the stock prices are gonna do this is an interesting one this guy is really successful investor investment manager said once to be successful in business and investing you've got to have skin in the game and stake in the company so he's kind of saying well if you if you if you're in the board of Apple and MIT and you starting to sell Apple shares maybe that means something that maybe you know something about the predict where the stock prices are gonna go if you if you start the exercise or options of your own company or maybe that's just something about what the company is going to help how the company is going to perform so that sort of data could also be relevant to where the Apple is where the Apple is going so putting all that together are data sets generally looks in this shape or form I put dates over there but that's not necessarily restrictive it could be date at a time so I could I could be reporting data per millisecond I just thought for ease of explaining I for the moment it's just days what you got is effectively a day and lots of prices corresponding to lots of stocks because as we discussed maybe the history of Apple is somewhat relevant where the Apple is going to go in the future but we also discussed that well Apple and Google are related in some shape or form because one is supplying to another one so that those prices could also be relevant to predict Apple so we have effectively a matrix of data just from prices but then we said well the balance sheet could be relevant so the fact that Apple on the 20th of May disclosed that their revenue was twenty forty seven billion maybe that's the relevant information so we'll bring that into the get airplay but in an in a synchronous fashion in a couple of days ago Google announced that they had at seventeen billion of earnings so maybe maybe there's some dynamics in there that we need to take into account so we bring that data set in but as we discussed it could be that the effects rate is playing a role and how much Apple is able to sell iPhones or iPads so we need to bring in all that information into play as well so we need to bring in on that day what was the effects rate for all these global currencies so that's a relevant information the other thing that the interesting thing we discussed is it is it is interesting that for instance on the 15th of May Tim Cook is one of the top guys and Apple decided to sell 10,000 shares of his own stock so what does that mean about where the price is going maybe you know something about short-term price moves similarly maybe Larry Page a couple of days a couple of days after was decided to buy 25,000 shares so these are all relevant possibly relevant information that we have to bring it in table the other example was the where the global economy is going maybe that is helpful in terms of predicting particular stocks you might have on a particular day on a 17th of may say a GDP announcement say it's now a 2.2 percent a couple of days before then you could have a CPI think there's an increase of 0.1 percent and then again and on 21st of May you can have a some sort of non-farm payroll announcement which is said which says 222,000 so that maybe is giving you a feel for where the economy is going so that and we think that's relevant that information we should bring that on the table as well so I what I wanted to get across with those examples I guess is that the financial data is is very non stationary so it's a it's got properties that are not a time variant but it's also very heterogeneous one data is prices the other one data is and percentages the other data is number of shares the other one is effects rate is completely different so it's very heterogeneous data set and as you can see it's multivariate so for predicting one stock we're interested in lots of sorts of areas of information and by its nature it's also synchronous because not all companies declared or report about their balance sheet on the same day or on the same time so they report in a synchronous fashion so you have to take that into account the fact that Apple just declared something now even though Apple hasn't declared anything probably sir even if Google declared some balance sheet now but Apple hasn't so far that probably says that you can infer something about what Apple's balance ship might look like so that the synchronicity is also interesting and important it's also financial data can take contextual like couple of days ago there was a really funny example I think there was a GDP announcement you can't remember now is it I think it increased by half a percent or something and the Bloomberg said stock markets are up because GDP has increased by half a percent and then half an hour later the stock market actually turned around completely was down 1% and the Bloomberg News line is literally just changed from stock market up from due to increase in GDP stock markets down due to GDP increase it because he meant that he met the GDP increase meant that the healthy there's a healthy economy but he might imply that they might raise the interest rate so that if the borrowing might be more expensive so they might have some effect so I guess the message I'm trying to get across here is that just because GDP is up by half a percent doesn't always have the same implication about what the prices are going to do so the financial data is contextual and as we've seen is there's a lot of it that we have to deal with I think it's very similar to some of the problems that we've discussed we don't have a lot on we have a lot of variables we don't have a lot of time effect or a timestamp if you like maybe I'm for unfortunate maybe it's something unfortunate for us we only have one history I can't go and create another prices for Apple so maybe one can go ahead and try to run more trials of to understand if a drug is working we kind of struggle with that we don't have a I can't get someone to create another Apple stock and trade that on the side for me and just no one else to see it and only I get to see it so we don't have that luxury there so we we have one on one and what only one history so once we get to use it we get to use that data that data is consumed so we have to very careful before we start consuming consuming data and know that it is clean obviously no one knows that but in our cases I find and it's probably similar or not a lot of lots of other cases even the clean data is full of tricks and traps in our case so one one thing that we tend to come across for instance say that number 47 billion of apples earnings announcement it to vendors or the data of the people who are collecting that that kind of data say say to you or on the 27th of February that was the number 47 billion and you go ahead and do analysis and you do a lot of work on that you find out who that that's that's interesting maybe this earnings actually is predictive but then you end up learning that that number has been changed retrospectively so original number was say 30 billion or whatever and then two months down the line actually they corrected that the correct number was 47 billion but they still assigned it to the historical date so you sorta had some sort of future information in your data even though it was clean it was the right number their earnings and their earnings was 47 billion but actually you wouldn't have known that number at the time that that it was actually announced you'd only know that two months down the line after they corrected and found the right number but the markets tend to be clever enough to incorporate that information very very quickly another major problem for us I think this is probably more unique to us is the historical data has our footprint in it because we do trade even as we speak Winton is trading but the prices we collect has has this impact in it obviously the price the current price of Apple wouldn't have been worth what it is if we were not in the market because we want to bias as Winton we go in to market and we say 1,000 shares of Apple that has an impact on the price and and that as soon as that has impact on the price then the price you collecting in historical price is effectively something you have something to do with you can't treat that as a whole this is completely an a new set of data where I can pretend that this is brand-new data it's not it's it's you you already have something to do with the data is is correlated to your actions to some degree already if you like so there's all sorts of tricky biases that's in there that we can probably talk ours so effectively our challenges is can't sort of be summarized in this one slide you're given given if you could digest everything we already seen is that we try and we're interested in predicting stock return for tomorrow next week or next month using all the information available to us knowing that actually we don't know what information is relevant to us we can hypothesize like we discussed a second ago half an hour ago we should we just should we just let the machine learnings algorithm go wild and find that the fact that Apple had up down up down up down means something or should we just should we just hypothesize that if it goes up up and then it should go up the next day it's it's it's a tricky question but what we're interested in generally giving all this information that we were aware of and we think might be relevant conditioned on all of that we're interested in finding a conditional density in effect and we ideally would like that to be sparse because we want some sort of control for overfitting and and all that problems that comes with having lots of parameters we'd like it to be smooth because we don't want our predictions to be a very jittery change over time and again touching on some of the stuff we discussed we'd really like it was interpret about because it's really hard to explain to a client well give me ten billion I'll just pass it through this black box and it tells me to buy some apple and it sells some gold and trust me it's gonna work that's a that's a hard sell so he needs to be he needs to somewhat interpret least to have some understanding behind that why this idea and why this methodology might work and it needs to have some sound sound fundamentals behind it at least I mean absolutely we do do things like we do let machine learning algorithms go wild and and try to see if they can find something that we couldn't hypothesize it's like touching up on some of the stuff we discussed but on top of that suppose there is something that actually were convinced enough what we end up doing a lot at a time is trying to pick it apart and can we and we spend a lot of time trying to decompose into things that we actually can understand and we would in in majority of the cases we would be happy to give up a lot of performance as a for a trade-off of interpretability and understanding it's actually what's going on so it's a it's a it's a it's a very complicated it's a very complicated and tough challenge for us and but it's complicated but it's not like it's not doable here's a little example that I cooked early I like that when when the cook brings from behind that they're already perfectly done up for passed that in the background so this is one of those so this is one of those F functions that that we come up with after many years of research which actually does taking into account these tons of a synchronous non-stationary at their genius time series in order to predict thousands of stocks so this is one of one of our from one of the analysis but I cannot talk about what their function is but I can maybe tell you about how it seems to be performing so if you look at that how it's been performing in in the last say 15 years and what I'm plotting there is the is the cross-sectional correlation between what I think those thousand stocks are gonna do and what really happened in reality and you can see that some days I'm 10 percent there some days we're off thirty percent and it just looks like a really wild ride and it's something that you wouldn't be able to tolerate and and if you put that all the data together and looked at well how it's on aggregate on the 15 years it's been working and generally speaking we go far back for just for presentation purposes you see that oh well my data predict my prediction quality it looks like I only want one and a half percent and you might be like oh look at this guy he's got one and a half percent prediction and he's giving me a lecture but it's actually in finance one and a half percent is is is can be interesting because there's not that much predictability in financial prices all right anyway so you can take that one and a half percent and possibly actually one of my manager said make sure you put possibly in there because we're never sure possibly we can turn that one and a half percent prediction quality into profitable investments strategy for our clients and and he this is one of the examples that's one and a half percent which one like this but it's actually great for in finance can can look like this in out-of-sample in real world I mean even in finance I also find it's a bit of a test your character and emotional stability as well because I we put this system live I think it's Jan 2013 and I was feeling pretty smug about it in October 2013 I was pretty proud look at me because this is a zero-sum game and there's a lot of other clever people trying to do something better or just competing with us so if I'm making money someone else must be losing so I was feeling good in October and there comes the August 2014 we're actually where we started a year and a half ago and questioning my soul my education should I have gone to Harvard instead of Cornell maybe UK was a better choice but then you hanging in there and you hope you rely on your research you hope that you've done everything you can do and and stick in there and then the markets tell you what you had you have something maybe if you hanging in there you might be able to realize that one and a half percent and that that is a great success story if it happens to be the case because it's very hard to extract money from the markets but that doesn't even end there unfortunately to trade an investment strategy like for instance the one we've just seen it's not enough it it's just a starting point to have a good return good prediction model so somehow from return predictions we have to go to position so we have to say well what I want to do today about Apple AJ I don't want to sell this much that much and actually we go ahead and do that execution ourselves as well wouldn't Winton and that's that's also just as a challenging problem what happens in the markets is it is like an auctioning process where the red is effectively the lowest sale price and green is the highest virus and all the other things going on around it the white the white levels and the shades is how many number of orders sitting and waiting there to be executed so if the price goes up to that level you get your order filled if it doesn't you don't get filled so obviously we because if you want to realize your return model you need to achieve the position you want to get into so that poses some type that poses another decision decision making problem so let's just say we're in that whether this is an intraday say it's like a minute a minute worth of data and in a minute you can get millions of prices millions of updates so we're in that dashed line moment in time and we need to now decide the way how we're going to sell hundred shares of this Apple if if you put if you put your sell orders too high well I want to sell my I want to sell my shares really high because you want to get the high price for them you might miss the market market market mark go away from you if you put them too close then you sold them too cheap that's that's not great and if you put them somewhere in between and then because it's a five foot process you might you might not get the pro trade depending on if the market hits your order so this wasn't a great challenge very complicated and great challenge again how many orders at what distance away from the current US price should we place our sell orders this needs to take into account the participants in the market how would they react if they were to see your order what's going on and the other assets and well is there a recent announcement going on in the market again there's a lot of variables going into going into this complicated modeling and that kind of brings me brings it to to the end for me I discussed a little bit about and prediction and execution algorithms in this talk and just a few of the many challenging tasks we have at Winton and it's an ongoing challenge you we don't have a ground truth so we don't know what how the markets work like so the the challenge of taking into account the non stationarity or the contextual analysis it's an it's an ongoing challenge and it's going to be an ongoing ongoing process for us we're always interested in finding new data sources and what might be relevant to predicting stock returns I just got a few like FX rate or if Tim Cook has just sold some shares of Apple or whatever they might almost be some other type data sets out that they might be relevant in addition to return an execution we're very interested in predicting risk we want to know the uncertainty of our predictions I might say well the Apple is going to go up half a percent but it would be nice if someone can tell me if that's guaranteed to be is it like a plus point three and point seven or is it actually like minus two plus five percent obviously your reaction and your decision about that will be very different so interested in predicting variance of the assets correlation across assets concentration of our portfolio in an ideal world we like a lot of return for not risking anything so that's that's that's a challenge and also portfolio constructions are very complicated challenge as well it's giving that we now have this return and risk predictions and this is the way we're going to execute these trades what positions should we be taking tomorrow that is a little bit of a control problem and because you need to do multi period planning and and but also you have to take into account your return horizon and how you can execute it how much it's going to cost so that problem usually tend tend to look like tends to look like a constraint optimization problem if you like and this is just says everything I say is a lie

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *