May 29, 10:41 AM
An Indicator Experiment
Background
The use of macroinvertebrates as biological indicators of water quality has a long history [1], and variants of the biotic index developed by William Beck in the ’50s [2] are currently in wide use in stream and river monitoring efforts. In [3], for example, macroinvertebrates are divided into four classes, those that are sensitive to pollutants (e.g. Mayflies); semi-sensitive to pollutants (e.g. Dragonflies); semi-tolerant of pollutants (e.g. right-side opening snails); and tolerant of pollutants (e.g. left-side opening snails, the filthy beasts). EPA has an excellent collection of pages describing the nature of the pollution sensitivity of each taxon [4]. The surveyor counts the number of taxa from each class, and plugs the results into a simple formula to come up with the index.
Ontology
This seemed like a useful collection of concepts to represent in OWL, and so we created an indicator ontology [5], which defines the following class structure:
<BiologicalIndicator>
<AquaticBiologicalIndicator> <rdfs:subClassOf> <BiologicalIndicator>
<SensitiveAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<SemiSensitiveAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<SemiTolerantAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<TolerantAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
Then we asserted the asorted indicator taxa to be subClasses of one of SensitiveAquaticThing, SemiSensitiveAquaticThing, SemiTolerantAquaticThing, or TolerantAquaticThing.
Queries
Here are a couple of queries that we’ve run which combine the indicator ontology, our invasives ontology [6], our tree of life ontology [7], our food web data [8], and the rdf representation of some EPA spreadsheet data on macroinvertebrate counts from North Carolina [9]. (The spreadsheet data is converted on-the-fly via the rdf123 web service [10].
i. FIND OBSERVATIONS OF BIOLOGICAL INDICATORS.
Of course, this should retrieve almost all the EPA spreadsheet data. Interestingly, when we run it without the tree of life data, it only results in a small number of hits. This is because the ontology designates taxa to the various sensitivity classes at (typically) the family or order level, whereas the reporting is done (typically) at the genus or family level. When we do include the tree of life in the dataset, we get thousands of hits (as expected).
ii. FIND INVASIVE PREDATORS OF THE MACROINVERTEBRATES OF NORTH CAROLINA, WITH LOCATION.
This results in a number of hits [11]. But it’s not really scientifically interesting, since most fish will eat most insects, provided that the fish and insect are co-located. This query is more designed to show off the integration possibilities provided by our approach. Useful queries on the data remains a goal (see below).
Near-term Future Work
i. Calculate biotic indices for a variety of un-assessed water bodies. Often, if macroinvertebrate data is collected, it is for the express purpose of calculating a biotic index, and so our approach adds nothing. We do have some Sierra Nevada food web data, soon to be published in RDF, where the macroinvertebrate community data falls out of the food web. So this is probably where we’ll start.
ii. Get data on chemical pollutants. This will enable some interesting correlations to be done with not only the macroinvertebrate data, but also with presence/absence data on invasive fish.
iii. Expand the indicator ontology. If you look at the ontology [5], you’ll see that it has plenty of room to grow. We’d like to add concepts like NonBiologicalAquaticIndicator, AirQualityIndicator, etc. But we won’t add theses concepts until seeing the actual instance data behind them.
iv. Fix some mistakes in the ontology. For example, bloodworm midges are currently equated with Chironomidae, which is not accurate.
Comments, Suggestions, Better Ideas
Please.
References
1.http://www.uwsp.edu/cnr/research/gshepard/History/History.htm
2. http://www.washjeff.edu/Chartiers/Chartier/BIOTIC.html
3. http://watermonitoring.uwex.edu/pdf/level1/data-Biotic.pdf
4. http://www.epa.gov/bioindicators/html/invertebrate.html
also, e.g. http://www.epa.gov/bioindicators/html/stoneflies.html
5. http://spire.umbc.edu/ontologies/IndicatorOntology.owl
6. http://spire.umbc.edu/ontologies/InvasivesOntology.owl
also, http://spire.umbc.edu/ontologies/lists/ISSG-GISD.owl
7. http://spire.umbc.edu/ont/ethan.php
8. http://spire.umbc.edu/ont/allFoodWebStudies.owl
9. http://rdf123.umbc.edu/server/?src=http://www.csee.umbc.edu/~jsachs/water_bugs_big.csv
10. http://rdf123.umbc.edu/
11. http://cs.umbc.edu/~jsachs/InvasivePredators.html (These are distinct predator/prey combinations, without locations.)
Sep 7, 05:21 PM
Avoiding Islands
I’ve been inspired to start this blog by learning of the Linking Open Data project. This is a Semantic Web project to create a interlinked data commons on the web using RDF to link across open datasets. The project is still young, but has grown impressively. The figure at right is their diagram of the currently linked datasets. The whole network has well over 2 billion RDF triples in it, the datasets interlinked with approaching a million RDF links.
Though this network is rich, as of now it contains little in the way of scientific datasets. In the course of the Spire project, we would like to begin extending this network to biodiversity and natural history information sets. Of which there is a great deal of content already on the web; this catalog from TDWG lists 556 different biodiversity informatics projects to date.
The trouble with this set of biodiversity informatics projects is that the vast majority of these are islands, with little means to network these data across projects. History is partly to blame here — many of these projects were started before the rise of the Programmable Web, and the notion of supplying open web APIs for data access was simply not part of the developers’ thinking.
In the posts that follow, we will be exploring tools, projects, and other advances that may help to lead to a well-developed semantic network for natural history information.

