Take 4 code fellows with different levels of exposure to RDF and linked data and different technical skills. Add several datasets of varying size, quality and type. Stir for 5 weeks. It turns out it was actually a good recipe. My role as technical Code Fellow turned into was overseer, quality control and ontologist. What’s an ontologist? Someone who creates ontologies obviously! A major part of the project was to fit a model to the datasets so that they could be synchronised across the three councils involved. So that we could re-use this model we created an ontology for them. In database terms think of it as a schema. These describe the types of properties that a dataset can have. Its not an exclusive list and in RDF you can basically do what you want but if you stick to the rules we defined in these ontologies then it makes linking the datasets across councils much easier.
Creating a linked data model is not that dissimilar to the the way you would model any software project. Armed with an initial ontology and model for streetlights it was time to create the triples. An RDF based dataset is basically a set of facts, each of which are described as a triple – something has a property with a value. In RDF speak these translate as subject (the something), predicate (the property) and object (the value of the property); you will see these written a lot in shorthand as SPO.
There are a few ways to express these triples and each of the code fellows preferred a different format. So we ended up with datasets described in RDF/XML, N3 and Turtle. Thankfully the tools available, mainly Raptor
, can handle switching between them although translating can get a bit confusing. I ended up describing things in Turtle for no better reason than one of the example ontologies I used was written in it. I found myself switching between formats depending on what the source was that I was looking at.
The design process was very organic. As we noticed a new property was needed we would just add it to the dataset and let the ontology catch up later. This agile approach was necessary or else we would have wasted too much time debating things. It’s also one of the real strengths of using RDF.
Communication flew thick and fast with emails a constant stream as we discussed modelling issues between ourselves and SWIRRL
, the technical consultants and quad store providers. Although the RDF tools have improved hugely over the years the knowledge base seems to have decayed so it always seemed that we were forging new ground. As we discovered, you should always describe things as simply as possible. If it looks too complicated then it probably is. When you describe something in RDF a good technique is to say what the triples mean as a sentence. If it doesn’t make sense then you probably need to think again and re-model.
The key to linking is to use the same properties (predicates) and values (objects) that other people use. With the ‘somethings’ (subjects) we were often the first to describe them using RDF so we got the privilege of defining new things like Streetlight
. In years to come we may well get the blame as well! The main sources for predicates and values were the Office for National Statistics, Ordnance Survey and data.gov.uk.
In addition to ontologies, there were considerations to the ‘data cube’. Basically this lets you publish spreadsheet like data as RDF. Then you can publish things like payments data or council tax as a set of observations each of which are specific to a point in time. This was the biggest head scratcher and Bill Roberts from SWIRRL, the Open Data Communities
linked data site and the data.gov.uk
payments ontology proved invaluable sources of information.
The timelines were fast and furious, agile development was the key and there were none more agile than us in the week leading up to the coding challenge. It was a veritable frenzy of validation, changes and uploads as we charged towards the hard deadline of the 24 Hour Coding Challenge
. At one point Steven had to cycle across the city with his data on a USB stick to hand to SWIRRL because the internet connection from from the other side of the city just wasn’t good enough to transfer such a large file. The button was pushed and data.gmdsp.org.uk
The highlight for me has been the people. Seeing the eureka moments as the code fellows understood more and more about linked data. Watching the different tools and techniques. I would never have believed that Open Refine
could be so useful. Or that people loved streetlights so much.