Areas-of-Interest for OpenStreetMap with Big Spatial Data Analytics

okay hello everybody my name is Stefan Keller from the University of Applied Science rapperswil and that's on the other end of the lake of Zurich Switzerland and I want to talk about areas of interest and with some experience report of how to handle this in on a big planet scale the program or the overview is about this I want to define what our areas of interest what's the state of the arch and give you a definition how we defined area of areas of interest I will then show you the implementation many of you are looking forward and how those are processed and then some remarks on further work and finally about what about big spatial data so first of all I just want to ask everybody who does know areas of interest on Google Maps so just to make a proof how are they shown on the on the map I mean on the mobile what's the color they say shaded orange then they say it's very in the background and it they had a vague definition what it is but so once again about about half a dozen or so no this kind of interesting additional information and the there are in fact several different notions behind areas of interest I mean Oh II there is a disambiguation page in Wikipedia that there is a Japanese girl group or ORS or something even and there are different in notions but in in computer assisted editing including humanitarian operations team they define areas of interest so my first sentence the the focus area where they expect to have buildings or streets for example and areas where to look at inside is the tasking manager this is also kind of area of interest because interest is a broad broad notion and a little bit more focused on the on topic is that in in tourism it's shopping entertainment and cultural areas polygons to help travelers to explore the world so if you visit first-time Milano Milan then you would know where where to go and I mean this attendance is evening after and as an alternative to the bellyfit area after the the beer event or instead of so so that's what I want to show you and let's take in my example I will take an example of Milano at the quarter a little bit between here and to the west as an example of areas of interest so this is how Google Maps looks like it's probably difficult to see from behind because those orange shaded areas are for example around here and and this is more pink so then this is a building as a shaded in area of interest and this is not an area of interest and so you have to look closer I mean perhaps you see it up here at Loreto and there was a two years ago blog post of Google technology team Maps team they defined it simply the places where there are there's a lot of activities human activities and so areas with the highest concentration of restaurants bars and shops that's some small citation out of their blog post and in addition they they reported that in high-density areas New York City we are using a human touch sounds good in my English non-english ears they have some human intervention and that was an on this blog post there are those shaded orange areas and it's a single category comprising all touristic activities which is something I want to differentiate quite in the next slide and they are most probably using user tracks I mean GPS tracks from their mobile applications Android phones where they also derive information like opening hours when or people entering a building and and things like this so most probably and for all almost for sure they are using also this kind of human activities in addition to indicate if there is area of interest note also that there are two kind of visualizations in Google Maps one is building based I mean once a building contains a shop or something the whole building is shaded and that's building level how I call it and when you zoom out to zoom level around 14 it gets some kind of aerial visualization polygonal polygons which is the kind of areas I'm looking for so I'm looking at this aerial visualizations recently we find out we found out that there is a start-up and I startup from from Spain from Catalonia called a they make a living off selling areas of interest differentiated between four or five categories they defined as shopping sightseeing eating night and nightlife and they have few products around this there they sell to tourists ik organizations so they also did not tell the exactly how they derive its that's part of the start-up secret and but they say they are collecting dozens of open data sources like most probably Flickr photos and and other information instead of human activity based on user tracks like Google does they have a heat map and I also have polygon data set which they sell to the touristic users so that's the same area all more or less where we are here cheetah study and weather is obviously some commercial area in this in this region and this differs from from category to category as they differentiate and so here it is shown the shopping – shopping area which comes close to the Google definition and my own that's the other product from a boxy they have in addition parks and waterfront as a separate category and it looks also very similar so obviously there are those commercial areas according to their demo page which is online finally there is also open trip map which show shopping point of interests so sorry but only on a point so as a point level I mean only and that's on a point now that's not the thing I'm looking after I'm just mentioning it as to be complete more or less so my or our definition of areas interest is it's an urban area at city or neighborhood level with a high concentration of point of interests and typically located along a street of high spatial importance so you see that I wanted to have something like a Street notion in it and not only some some kind of point of interest concentration concentration we focus us on neighborhood level not building level as I said before and we are also just aggregating all categories and putting all into one not yet differentiating between some categories which are difficult to to define anyhow so what's the difference between eating and shopping and things like this so it's only one category to make it simple it's based purely on over sleep map data and it should be open documented it should be reproducible and that's what I show you now the implementation was part of a master thesis of a computer science student who finished recently and it was the goal was to explore the areas of interest based on open source software databases or Python and other tools if needed so the first that the overview of the processing steps are like this get in five or five steps step one filter the point of interest out of OpenStreetMap containing a selected list of tags cluster those point into polygons with a clustering algorithm third create hulls concave concave hulls around those clusters and then fourth apply Network centrality a linear notion to this kind of hull so based on the convex hull that is a polygon center and bay and from this Center we retrieved the street network data the pedestrian street network and calculated a so called network centrality algorithm and extended and and buffered these convex hulls these hulls if needed in order to somehow connect nearby clusters and finally there is some cleanup like waterways and sanitizing so and that's the whole current implementation so to visualize those five steps once again based on Milano this region these are the buildings containing the boys so of course I not only are taking notes but either there is a building with with with a shop tag or some Museum or or or an other facilities all the or there is a point inside a building so the building is shown as an area of interest in in the first step then those buildings are are clustered so one second oh these are the an an extract of the tags we are using from OpenStreetMap just to be complete so if joining and an idea which tags we filtered we retrieved then the second step is there are there is a convex hull building algorithm round around those polygons and and for this we used to post je s DB scan algorithm a well-known algorithm to cluster polygons and in this clustering defines some common areas and around those areas which belong together I mean those polygons which belong together define an area and around this area convex or concave hull is being calculated which looks like this that's the third step and the fourth step then the linear notion comes in you see in red the buffers and you see in blue the Street Network and the centrality algorithm defined important streets along and inside those polygons and we allowed to buffer about 50 meters to be outside those those hulls and finally we exclude water areas whether or no water areas in Milan obviously so I show you an example of Surak where the city of Zurich where there is a lake and and a river so we excluded in these areas from the areas of interest and unsanitized the whole thing which then looks like this for the for milano and finally the result now visualized in orange looks like this so that's the result and there is a home page showing that kind which i just quickly show we show you one one second here I entered the coordinates of the area of around where we are and in this demonstration page you see the these steps explained and one by one which I explained just before which will end up finally to these areas of interest similar similar to the example shown before that just a demonstration page just as to explain what what really happens to those who are interested and the evil a evaluation showed that results are quite interesting I mean comparable more or less to do this what Google shows so for example there was a blog post from before by Justin about Google Maps mode and how far ahead it is and that he said it's not only enough to collect data its own is also needed to analyze it and here it is we also analyzed on steep map data and made areas of interest out out of it and it looks quite interesting and quite similar and further work is that for example terrycloth areas SQ 53 sorry 53 he want he used it to identify completeness as compared to a history in this area of interest so then he's interested in this kind of further use of this of this data and we are still at enhancing the parameters because even in rural areas in touristic areas of switzerland we we think it could be better and of course we could also use more input data like coordinates of pictures it's implemented in Python using Postgres database and thus lower part is the OS mnx which implemented the network centrality which is unfortunately not available in post post GIS but it could be done that's the slowest part is 10 times lower than the older Aldi all the other calculations at least so and then there is a Jupiter Jupiter Jupiter a notebook which is a publishing format which very much looks like the visualization demo page I showed you before but there you could adapt parameters on yourself and everything is deployed as a docker to be to be installed in an easy way on your own so the resources you can memorize which I also I published the slides of course there is this demo page which I don't want to broadcast too much it's on a small computer and that I can give you access to this hidden URL and which is there in the public in fact there is that there will be open source and of the month when when the student is back from his well-deserved holidays and there is this master's thesis to be published soon and as a plus I registered first time in my life a digital object identifier and eternal an eternal web resource URL similar to ISBN for books to be referred in scientific papers it's a it's a DOI at this data publishing repository now just a few words i skipped those slides we repeated the same we already did in post GIS and with python we repeat the same by evaluating one of these apache spark projects which dooper ization and scale to be able to calculate the whole thing of the whole planet in a few minutes and we did not fine finish this work because we would we would have had more time to implement DB scan and network centrality algorithm which is not available and in inside those packages and that's the short message about the the real paralyzed big data the message is I skip this one lessons learned or the available tools with protests or rock-solid it allowed us to use sophisticated functions like DP scan for example or St Union and things like this the approach is still very young to pearlized the whole stuff but it's much more expensive and time-consuming to implement the that kind of scaled up infrastructure so I'm coming I'm just I'm fast on this I know we didn't finish this implementation but it was interesting for us to evaluate alternative to post yes and we did not find any at least not as oh we are still using post areas even for the planet and we take the time to wait a little bit longer until Postgres has finished its implementations because it would take much more time to implement all those algorithms already available in post GIS so I will finish and at the door to thank the this master thesis and the student also to the team of my lab to an intern from Singapore who also helped to analyze and reverse-engineer Google Google's implementation and cherry cloth famous mapper from the UK thank you very much thank you Stefan [Applause] so a little bit more time for questions because we can finish at 3:30 for coffee so first question at the very back hi I was just wondering do you have a like a compiled list of tags you use that determine whether or not something something to Aoi or some tags a compiled list of the tag I use of course of course that's part of this webpage I will show you so if I start over the calculation you'll see this is the list the complete list of tags that's what you are asking for yep that's the list okay cool I mean in the implementation in the repository in the github repository in the open source it will be in your pores where you see it in the source thank you for the presentation what's the processing time finally when you are only using post GIS if you said you wanted to go distributed with spark to go a few minutes computation but I didn't get how long it was with just a relational database framework so you see now an area of city of Zurich which is not a big area around zoom level 14 now I push the button and it will take some minutes over you can observe so it's some minutes for an area of few kilometers it's around half an hour for Switzerland and expert an except for whole Switzerland and extrapolating this to the tour the whole world would mean I said I would say a few hours yeah so it's about one hour story for speed and about 45 minutes is just the network centrality algorithms and only 50 minutes all the processing within post GIS 400 cellent I think the result of this is set of polygons or kind of data set what you can use for visualization or analysis of locations that's a idea my question is this algorithms would say work even in the client-side I mean I'm I already have my vector tiles for example with the data on the browser so cool this kind of three steps be done in vector tile level dynamically live in renderer in mobile browser or web browser CC algorithms don't require really global data's local tiles already have everything what you need it's available as Jo Jason as there are simply polygons and the server cycles can be tiled vector tiles yes the data processed and in server-side the problem is that if you process it it will be outdated right in the moment when you are starting to use but if you have theoretical at least lie by vector tiles you don't need to transfer anything to the client extra or process anything on the server it just takes the same algorithms implement in JavaScript with vector tiles you have the point of interest data there already and do it live on JavaScript level in client what have you considered this scenario all this algorithms would say in principle work in very local geographically local level with a with a browser it could be done on the fly yes but then you would need a whole you know the pite and all the dependencies you know you would need post yes which is probably already there but you would need also this OS mmm X library and all these dependencies yes you could but you will need to re-implement everything in JavaScript that I wouldn't do that no I will to that if you zoom out it would request whole Europe and that would take on four hours even on the fastest well you don't have points of interest in Europe and level and vector tiles anyway so this kind of level that disappears but yes circles can be this kind of scalability issues and this live live implementation of that but yeah yeah if I wilt I'm thinking this data that's kind of rendering help or kind of visualization technique of points really so it's called pay done and in principle in okay implant but in an infected amount is all coming from environmental planners and and allocation location allocation analysis so I try another time here and they are they don't care about changes lost the last few weeks they are they are just interested in where our areas of hi shopping activities and they are using this as another input to decide on a new location and and for example and also for touristic purposes it is not and as they don't see a need yet to be really that up-to-date okay yeah yeah thanks for your talk to chef on Google had another approach than you would OpenStreetMap did you compare the two if there are some laughs yes as I said before first we we looked at Google and tried to find out what how they are outdated and then and then I added a small ad hoc research to get with terrycloth where he looked at New York compared to this blog post and it looked it there are differences and it's interesting that also Google sometime is wrong when it attacked a Swiss bank back office area a large area of area of interest but it's also only a back office area and probably my algorithm is a little bit too optimistic and little bit wrong in the rural touristic areas in the smaller ones so we have to adapt the parameters for example of the DB scan algorithm in the beginning of the process you have five different categories of your eyes and after that frankly you merged them but do you keep the initial information for whoever would be interested to see the interactions between the five categories once again the five categories you have defining the different aoi's yes and did you keep the information during the process for whoever would be interested – tuinal I the interaction between these categories I I'm not sure if I understand you correctly because when you look at these tags I directly take these awesome tags and and use all those and I did not make a mapping also I already have made some some experiments to relate those to sightseeing or to shopping so so I could make this relationship but I did not yet because in the the next maps you have different colors please when it was not me me who differentiated it it was a boxy no no no after that yes and next one exactly which previous one the previous one with your colors the party in here you have different kind of Aoi's colors just indicate that I mean that there are clusters and the color just indicates that close to one two three four there are random colors or just it's one category and an ante visualization just show has chosen a random color for each different cluster okay there's no different it's just making the visual to visualize the difference between each instance okay we're going to have a cloth that is definitely an area of interest is topic so thank you for your attention and around of applause for all of the speakers [Applause]

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *