May 29, 09:41 AM
An Indicator Experiment
Background
The use of macroinvertebrates as biological indicators of water quality has a long history [1], and variants of the biotic index developed by William Beck in the ’50s [2] are currently in wide use in stream and river monitoring efforts. In [3], for example, macroinvertebrates are divided into four classes, those that are sensitive to pollutants (e.g. Mayflies); semi-sensitive to pollutants (e.g. Dragonflies); semi-tolerant of pollutants (e.g. right-side opening snails); and tolerant of pollutants (e.g. left-side opening snails, the filthy beasts). EPA has an excellent collection of pages describing the nature of the pollution sensitivity of each taxon [4]. The surveyor counts the number of taxa from each class, and plugs the results into a simple formula to come up with the index.
Ontology
This seemed like a useful collection of concepts to represent in OWL, and so we created an indicator ontology [5], which defines the following class structure:
<BiologicalIndicator>
<AquaticBiologicalIndicator> <rdfs:subClassOf> <BiologicalIndicator>
<SensitiveAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<SemiSensitiveAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<SemiTolerantAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
<TolerantAquaticThing> <rdfs:subClassOf> <AquaticBiologicalIndicator>
Then we asserted the asorted indicator taxa to be subClasses of one of SensitiveAquaticThing, SemiSensitiveAquaticThing, SemiTolerantAquaticThing, or TolerantAquaticThing.
Queries
Here are a couple of queries that we’ve run which combine the indicator ontology, our invasives ontology [6], our tree of life ontology [7], our food web data [8], and the rdf representation of some EPA spreadsheet data on macroinvertebrate counts from North Carolina [9]. (The spreadsheet data is converted on-the-fly via the rdf123 web service [10].
i. FIND OBSERVATIONS OF BIOLOGICAL INDICATORS.
Of course, this should retrieve almost all the EPA spreadsheet data. Interestingly, when we run it without the tree of life data, it only results in a small number of hits. This is because the ontology designates taxa to the various sensitivity classes at (typically) the family or order level, whereas the reporting is done (typically) at the genus or family level. When we do include the tree of life in the dataset, we get thousands of hits (as expected).
ii. FIND INVASIVE PREDATORS OF THE MACROINVERTEBRATES OF NORTH CAROLINA, WITH LOCATION.
This results in a number of hits [11]. But it’s not really scientifically interesting, since most fish will eat most insects, provided that the fish and insect are co-located. This query is more designed to show off the integration possibilities provided by our approach. Useful queries on the data remains a goal (see below).
Near-term Future Work
i. Calculate biotic indices for a variety of un-assessed water bodies. Often, if macroinvertebrate data is collected, it is for the express purpose of calculating a biotic index, and so our approach adds nothing. We do have some Sierra Nevada food web data, soon to be published in RDF, where the macroinvertebrate community data falls out of the food web. So this is probably where we’ll start.
ii. Get data on chemical pollutants. This will enable some interesting correlations to be done with not only the macroinvertebrate data, but also with presence/absence data on invasive fish.
iii. Expand the indicator ontology. If you look at the ontology [5], you’ll see that it has plenty of room to grow. We’d like to add concepts like NonBiologicalAquaticIndicator, AirQualityIndicator, etc. But we won’t add theses concepts until seeing the actual instance data behind them.
iv. Fix some mistakes in the ontology. For example, bloodworm midges are currently equated with Chironomidae, which is not accurate.
Comments, Suggestions, Better Ideas
Please.
References
1.http://www.uwsp.edu/cnr/research/gshepard/History/History.htm
2. http://www.washjeff.edu/Chartiers/Chartier/BIOTIC.html
3. http://watermonitoring.uwex.edu/pdf/level1/data-Biotic.pdf
4. http://www.epa.gov/bioindicators/html/invertebrate.html
also, e.g. http://www.epa.gov/bioindicators/html/stoneflies.html
5. http://spire.umbc.edu/ontologies/IndicatorOntology.owl
6. http://spire.umbc.edu/ontologies/InvasivesOntology.owl
also, http://spire.umbc.edu/ontologies/lists/ISSG-GISD.owl
7. http://spire.umbc.edu/ont/ethan.php
8. http://spire.umbc.edu/ont/allFoodWebStudies.owl
9. http://rdf123.umbc.edu/server/?src=http://www.csee.umbc.edu/~jsachs/water_bugs_big.csv
10. http://rdf123.umbc.edu/
11. http://cs.umbc.edu/~jsachs/InvasivePredators.html (These are distinct predator/prey combinations, without locations.)

