1 November 2007
Filling A Niche
There has never a widely adopted RDF vocabulary for representing geographic shapes. Four years ago the W3C came up with a Basic Geo Vocabulary which was restricted to representing points via their latitude and longitude, but gave no specification on how to represent lines and polygon features.
The W3C Geospatial Incubator Group has just published as their final report a pair of documents on Geospatial Vocabularies and Geospatial Ontologies. In so doing they have come up with a GeoOWL ontology that includes classes for points, lines, polygons, and boxes. It is based largely on the GeoRSS specification for encoding geographical information in RSS feeds. The ontology does not have a model for spatial relationships, e.g. being able to say a feature is contained within another. Nevertheless, being able to associate geographic shapes with any other entity solves many of the semantic modelling problems associated with biodiversity data.
14 September 2007
TOAD Modelling
A basic query to ask in biogeography is ‘what species lives here?’. There are a lot of data resources to answer that question. I come up with at least four different types of data resources. First, there are direct records of observations of species, one online example being the citizen science effort eBird which allows birdwatchers to record on the Web what birds they’ve seen. There are coarse-scale range maps, an example being the map in this species account on the mountain lion. There are species lists collected over a substantial period of time, such as lists from parks and nature reserves. Finally, there are probabilistic distribution models generated by tools such as openModeller.
We have just started work on a Semantic Web application that will return information in RDF on species status and distribution for a selected geographic area. The aim of the application is to provide a framework for amalgamating the four types of data sources above into some sort of uniform species list. (I’m calling this TOAD data, for Taxon Observation And Distribution. A TOAD ontology may be in the not-too-distant future.)
I think the basic data granule here comes down to who-where-what-when? That is, a combination of data source by geographic region by taxonomic entity by time period. For now the idea is to concatenate the who-where-what-when parameters into a single URI, thus designating it as a resource over which one can return an RDF description. Variants of this URI pattern will return information such as species lists for a particular region (either according to one data source or across all data sources handled by the system), sets of observation data for a particular species, or metadata for a data source.
10 September 2007
Naming The World
Last winter I gave a talk about gazetteers to a geography seminar here. I mentioned some of the important online gazetteer resources (e.g. the Alexandria Digital Library gazetteer or the Getty Thesaurus of Geographic Names), but somehow hadn’t yet run across perhaps the most interesting web-oriented gazetteer project to date, GeoNames.
The GeoNames gazetteer project has been around for a couple of years and currently contains more than eight million geographic names representing 6.4 million unique geographic entities. The project started off by assembling some of the major public domain gazetteer resources such as the USGS Geographic Names Information System for place names in the US and the National Geospatial-Intelligence Agency database for non-US names, but has expanded its content to include many other sources as well. GeoNames provides an elaborate RESTful API to its data and in a wiki-like fashion allows people to enter or correct placenames on their own.
A year ago, Bernard Vatant set about the task of integrating GeoNames into the Semantic Web, and with a little help from Harry Chen put together an ontology for placenames in GeoNames together with a URI scheme for the placenames backed by Semantic Web-friendly content negotiation.
From the point of view of somebody who wants to refer to placenames in a Semantic Web document, this set of URIs is a great resource. As a case in point, we are interested in providing species lists in RDF for various geographic regions. Often such species lists refer to named regions rather than geographic coordinates (for instance “Yolo County, California”). It is now straightforward to come up with a URI for such a region — Yolo County is represented by http://sws.geonames.org/5410882/. Even better, the fact that users can add their own placenames allows the creation of good URIs for locations with species lists that aren’t yet in GeoNames.
GeoNames is getting ever more comprehensive, though. It’s fun to see that the building where I’m writing this now, Wickson Hall, already has a GeoNames URI.
7 September 2007
Avoiding Islands
I’ve been inspired to start this blog by learning of the Linking Open Data project. This is a Semantic Web project to create a interlinked data commons on the web using RDF to link across open datasets. The project is still young, but has grown impressively. The figure at right is their diagram of the currently linked datasets. The whole network has well over 2 billion RDF triples in it, the datasets interlinked with approaching a million RDF links.
Though this network is rich, as of now it contains little in the way of scientific datasets. In the course of the Spire project, we would like to begin extending this network to biodiversity and natural history information sets. Of which there is a great deal of content already on the web; this catalog from TDWG lists 556 different biodiversity informatics projects to date.
The trouble with this set of biodiversity informatics projects is that the vast majority of these are islands, with little means to network these data across projects. History is partly to blame here — many of these projects were started before the rise of the Programmable Web, and the notion of supplying open web APIs for data access was simply not part of the developers’ thinking.
In the posts that follow, we will be exploring tools, projects, and other advances that may help to lead to a well-developed semantic network for natural history information.

