Friday, July 31, 2009

SPARQL Federation and Quints

There are currently a couple popular way to federate sparql endpoints together:

1) In Jena the service must be explicitly part of the query, and therefor the model,

2) In Sesame the basic query patterns must be associated with one or more endpoints before evaluating the query, or

3) Hack the remote query into a graph URI: http://gearon.blogspot.com/2009/05/federated-queries-long-time-ago-tks.html

Although both can be used to achieve the same results, Jena's solution puts more responsibility in the data model, and Sesame's put more responsibility in the deployment. Both have their trade offs, but I believe the query is suppose to be abstracted away from underlying services. The domain model (and therefore the queries) should not be aware of how the data is distributed (or stored) across a network. Therefore, I prefer to describe which graph patterns and relationships are available at each endpoint during deployment and make the application model independent of available service endpoints.

Furthermore, I think it is a bit silly to add yet another level of complexity to the basic query pattern. Adding the service level turns the basic query pattern from a quad to a quint.

To fully index a quint (with support for a service variable, which Jena does not support) would take 13 indexes (nearly double what a quad requires). Below is a table of some complexity levels and how many indexes they require to be fully indexed (variables could appear in any position within the pattern). I have included a theoretical sext that would allow you to group services in a network (just as graphs can be grouped in a service).
Level#ofIdxTermData Structure
double2subjectdirected graph
triple3predicatelabelled directed graph
quad7graphmultiple labelled directed graphs
quint13servicereplicated multiple labelled directed graphs
sext25networktrusted replicated multiple labelled directed graphs
Switching from triples to quad provides a big functionality leap (the ability to refer to an entire graph as a single resource). However, I question how much functionality a quint (or a sext) has over a quad. Couldn't the same functionality be put into a property of the graph (or embedded in the graph's URI authority). An inferencing engine/query could also conclude graph relationships like (subGraphOf), which would still allow a large, but precise, collection of graphs to be queried more effectively.

Hopefully, this topic will have more time to mature before the SPARQL working group makes any official decisions on the matter.

Wednesday, July 22, 2009

Enterprise Information Systems and Web Technologies

I recently got back from speaking at the Enterprise Information Systems and Web Technologies conference in Orlando Florida. There I presented my paper on an Object-Oriented rules engine. In the talk I shared examples of when businesses need to coordinate, track data, and policy check between organizations. Such as in transportation, satellite data tracking, and contract management. I outlined the following requirements and went into detail on the various components of the system.

Requirements
Reduce the investment costs and time
Policies understandable by domain experts
Rules must not inadvertently interfer with one another
Model complex domains
Easily adapted to change
Policy rules have access to external services
Track all state changes both their cause and effect

The talk was well received and posed some interesting discussions.

Wednesday, July 8, 2009

Panel: Linked Open Data

The SemTech 2009 Videos have been posted, including the Linked Open Data Panel.

The "data commons" is a cornerstone of the semantic web vision. The Linked and Open Data movements are progressing beyond the early adopter phase and preparing to cross the chasm. Enough experience now exists to reflect on how this data set is being used, how useful it is, and where we can take it from here. Beyond the basics, the panel will discuss issues such as quality of service, stability, and longevity. They'll also explore the evolution of the semantic web with a particular emphasis on modes of data use, reuse and aggregation.

Paul Miller, The Cloud of Data
Jamie Taylor, Metaweb Technologies, Inc.
Leigh Dodds, Talis
James Leigh, James Leigh Services, Inc.
Kingsley Idehen, OpenLink Software, Inc.

http://www.semanticuniverse.com/semtech-panel-linked-open-data.html
Reblog this post [with Zemanta]