BITC / Data Analysis – Intro to Biodiversity Informatics 1



so let's talk about biodiversity informatics centrally it's a field of science that really hasn't been around for long a couple years ago some friends and I published this paper basically just to kind of lay out what are the questions behind this field it's funny because usually a field knows what the questions are and evolves towards answering them so if you think about the field of phylogenetics the question is what is evolutionary history and the techniques have evolved from from phonetics to parsimony to likelihood to Bayesian but all with that same objective and in the case of biodiversity informatics it's actually quite the reverse which is to say you had a bunch of people doing data basing and you had a bunch of people doing information management as far as curation and you had some people doing data analysis and you had some people doing policy and they really didn't until relatively recently didn't acknowledge one another's existence and they also didn't really know and I would say a lot of people in the field still don't know what the questions are you'll see in the course of this week that at least I and I hope the the other experts who are here will essentially speak frankly and openly but yeah there are a lot of people in this field who don't know what the field is trying to do so let's start off with some definitions I just pulled this out of Wikipedia informatics broad academic field encompassing human-computer interaction information science information technology algorithms mathematics and social sciences so that's about as broad a definition as we could ask for doesn't really tell us very much bioinformatics now this is kind of interesting interdisciplinary field that develops and improves on methods for storing retrieving organizing and analyzing biological data that's a nice general definition in reality bioinformatics refers to genomic genetic and to some degree protein based data and really when we talk about data on organismal biology bioinformatics that term has already been grabbed up so more recently along comes this term biodiversity informatics application of informatics techniques to biodiversity information for the same purposes so you would think that we would have bioinformatics being the overarching term that has to do with all biological data and biodiversity informatics would be a subset of that in reality they're kind of parallel bioinformatics is sub organism and biodiversity informatics is super organism so that I think that's a an unfortunate distinction but but in effect that's what we have so let's change some of the definitions I would say that informatics goes beyond just human-computer interactions in fact the museum world has been doing biodiversity informatics for centuries the very complicated card files and indexes and catalogs and things like that I already mentioned this to you that violet bioinformatics should be broadened to be the more inclusive term and then I would also throw in that biodiversity informatics has to go one step farther and go out to how do you capture the data how do you make the data exist in the first place that ends up being a major challenge in this field so there are the research areas within bioinformatics according to Wikipedia and you can see they never get around to anything above the level of organism but really we have sorry I really we have this whole suite of institutions around the world that have long worked in informatics related to biology and related to organismal biology and biological diversity and so these are Natural History Museum's they're certainly not the only place that biodiversity informatics is done but perhaps they are the original seat of biodiversity informatics in a museum setting this is this is scenes from my own lab it usually begins with animals or plants those are frozen animals and plants because I took these photographs quickly one afternoon and didn't have time to go out and and get some new ones all sorts of steps involved in preparing the animal or the plant into a specimen but essentially the idea is long-term preservation if the curator and collections managers are doing their jobs then the specimens should last forever for example that's a bird skeleton about halfway prepared you can see the information management there's a tag that says n HR in 1931 you can see all of the pieces of the element of biodiversity in there eventually this will be prepared as a very elegant skeleton packaged into a little box with a label with all the data on it and in fact we go so far and this is a horrible step we go so far as to write the final catalogue number on every bone it's a terrible process particularly when you get around 50 and your eyes start to go too it's a pot so the data get managed initially in field catalogs or temporary catalogs and those look something like this this was the best handwriting I could find but field number and species and the collector and the habitat and the color of the legs each organism has its own particular set of ancillary data but it all looks something like this that's from an expedition to Equatorial Guinea that that my group ran a few years ago in the old days we would then laborious ly transcribe in India ink on rag paper the catalog into these big catalogs and you can't really see it but these catalogs are held in a bank safe and literally it's a safe inside the bird division of the University of Kansas and it is supposedly if the building burns down the safe sinks four stories and can be retrieved I don't believe it that's what those old Ledger's look like and you can imagine some poor soul sitting and writing another and another and another we have a hundred twenty thousand birds imagine writing out a hundred twenty thousand lines and a ledger we also keep the field notes in a very organized fashion they get bound etc etc so this is kind of where the field was when my career began back in the 70s and 80s there is the safe nobody knows the combination I hope it never closes but really the original data reside with the birds and that's almost always the case it's not so Universal when we're talking wet collections like fish or amphibia here are some bird eggs and again you can see quite a bit of data because that's a pretty new specimen here's an older specimen actually has a lot of data also but that's a that's quite an exception there are some fluid specimens and you can see things are quite a bit more abbreviated just because all of this has to sit in fluid for a century or two and eventually the specimens get organized in in the final collection they're sitting on acid-free paper they're in these very in their drawers and cabinets that essentially exude no acid so the if the idea is to make these specimens permanent then we ought to do everything possible to make them permanent there are the cabinet's that's my office down there at the end so here's my revised definition of biodiversity informatics you know application of informatics techniques to biodiversity information for improved capture cleaning management improvement analysis and interpretation okay so we can kind of use that as the the basis of what we're talking about in general it's kind of an exciting time okay lots has happening there are a lot of people working in this field now and yet there are massive massive challenges ahead still so there's a lot happening right now with automated data capture essentially where it's the easiest with botanical specimens because they're two-dimensional and pretty large but here's an herbarium sheet and you can see the label right here and so this person is developing an image of the specimen and of the label and then that label gets translated into a structured database by various means there's more imaging you can see this is for for an insect and obviously there you have the problem of size we get to essentially a next step where we can start doing some of the tasks of museum curation digitally one of the best examples is in Brazil with the virtual herbarium of flora and fungi and essentially what one can do is do searches see how many records are in the database but also we can see which ones have images okay and essentially unless you need to get in and look at the three-dimensional structure these images are so detailed that it's essentially like looking at the specimen and so you can do things like I will I have a plant that I just collected and I want to compare it here's a jolla type I want to compare the thing I just collected or the thing I just observed to the holotype well for Brazil which is invested massively in its botanical digital herbarium for Brazil it's possible to look at the holotype which might be sitting in Paris or in Rio de Janeiro yeah Adolfo and I have done projects where we had a fascinating specimen and we had to wait until we had the resources to be able to visit the Smithsonian and the American Museum and the University of Michigan Museum and it took us two or three years okay this is taking 10 or 12 seconds another big task is none of this really has much to do with this course but I want to give you the whole panorama another big task is that of geo referencing so maybe the specimen label might say on the east side of Cape Town but if we want to do essentially any of the analyses that we're going to be talking about in this course we kind of need something quantitative we need coordinates and so geo referencing is that procedure of assigning the coordinates and there are now some very very nice procedures totally digitally enabled that allow us to take text interpret the text with some degree of intelligence and turn the text into a hypothesis about where we're talking about on earth there can be different levels of supervision so that the the human who is who is overseeing this project who hopefully has some knowledge of the taxon of the history of its collection and of the geography the human supervisor can say hmm well we have this version this version in this version this one's fairly precise this one's less precise this one's pretty vague I'm going to choose this one for these reasons okay so essentially our human supervisor gets in there and then we end up with with geographically referenced data now there's still a lot of garbage in there so we have to go through a data cleaning phase this is a classic this is coming out of gif and the only thing I want you to notice here is this big cross anybody know what that is

One Comment

  1. Faustin Gashakamba said:

    "the whole panorama" should also include explaining the point of keeping these specimens (digital or otherwise). For biologists, it makes sense. But now you are bringing in computer scientists and others, it would be good investment to let them realize the importance of all this.

    July 12, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *