Sep 28, 09:27 AM

Report from TDWG: RDF is not dead

Ontologies and RDF were the big buzzwords at the TDWG 2007 meeting this year in Bratislava, Slovakia. This is surprising because two years ago at the St. Petersburg meeting I felt something like an outside semantic web agitator, conspiring in a smoky bar with a few other like-minded colleagues such as Kathi Schleidt. Also, I had just flown to Slovakia from a wedding at which two people from the IT world independently said, “RDF, isn’t that dead?”

During the last year, TDWG has been exploring the potential of richer semantics, led by Roger Hyam and others. They have outlined a technical architecture implying that all TDWG standards should be enabled for W3C semantic web technologies.

Some background for non-TDWGians. The Taxonomic Databases Working Group (pronounced “TAD wig”), recently renamed Biodiversity Information Standards, is the international body that has been working for several decades to define standards for data exchange among natural history museums. They develop and approve standard schemas (e.g. ABCD) and protocols for exchanging biological specimen data (e.g. TAPIR) and other kinds of related taxonomic information (e.g. descriptive data and literature). Applications that use these standards can then be implemented. TDWG standards make portals like GBIF possible; they are also supposed to underpin the Encyclopedia of Life. There is discussion of enlarging the scope of TDWG to go beyond the museum-oriented information it has focused on in the past. This makes sense to many of us because of the primary importance of biological taxa in many related fields such as ecology.

Many in TDWG are still trying to understand why it may be important to provide RDF-based solutions, especially when years have been spent developing XML schemas. The most compelling argument to me is that over and over again, individual projects end up solving their particular problems (a need for flexible schemas, an inability to effectively map XML schemas) by independently deriving RDF-like solutions. Specific examples of which I am aware are the ALTERNet project in Austria and the Spider Assembling the Tree of Life project in the United States.

Over the next few weeks I’ll flesh out my thoughts coming away from TDWG 2007. At the moment, they fall into two categories of ideas.

  • How does the Spire project’s ETHAN (our Evolutionary Trees and Natural History Ontology) and related products (InvasivesOntology, SpireEcoConcepts, etc.) relate to existing and proposed TDWG standards? What has Spire learned from its modeling and tool-building that might be helpful to the overall effort. We have experience to share as interest groups with complex schema like SDD (Structured Data Description) and TaXMLit (Taxonomic Literature) contemplate how to proceed.
  • What architecture really makes sense for a semantic TDWG? Spire models its architecture on a highly distributed web document push model, indexed by global semantic search engines like Swoogle. In contrast, TDWG tends to assume that content providers form an organized network where consumers pull data directly from nodes using mutually agreed upon protocols.

For now, I’ll just note that a move to the semantic web is hindered by two significant barriers. First, we don’t have really impressive examples of the power of the semantic approach. Computer scientists talk about wines and FOAF. Inspiring but not entirely convincing to biologists. Spire has done some proof-of-concept work using real data to answer fake questions. We really need to tackle something where the result is publishable in a scientific venue. Until we get this even I have to remain skeptical that all this headache-inducing work is worth it. The SEEK project is working on semantic data flows in ecology; maybe they can point to a simple success story that will resonate with the taxonomic community.

Second, we don’t have user friendly tools for biologists, much less non-ontologist developers to use to jump into the semantic fray. Bob Morris spoke about this and I agree. We’ve worked a little on this in Spire with Spotter and RDF123, and I’ve done a little on Leptree.net. There is going to be a SWUI workshop at CHI (Computer Human Interfaces) next year, so I like to think the community is working to remedy this.

Posted by Cyndy Parr at 09:27 AM in | Link |

Comment

Commenting is closed for this article.

Previous: Next:

California Information Node - NBII National Biological Information Infrastructure Information Center for the Environment