Tuesday, June 30, 2009

HTTP Servlet Caching Filter

The nice thing about the HTTP protocol is how easy it is to implement an trivial HTTP server. At some point, however, just responding to HTTP requests is not enough and response caching must be introduced.

If you search for "servlet response caching", you will find advice to ensure you use the correct response headers to facilitate HTTP caching and suggestions to use a servlet filter to cache the response. If you are like me, you would continue to search for a way to use both - a servlet filter that caches based on the correct response headers.

With the HTTP protocol so well supported and J2EE so popular, finding a caching servlet filter that adheres to the HTTP spec should be easy, but it isn't. In fact, it is really hard to find any Java implementations that caches based on the HTTP response headers (servlet filter or otherwise). This seemed like an interesting problem that is fairly common, so I spent some time to see how far I could get with a servlet filter that understands HTTP caching.

Despite my general knowledge of the HTTP spec, implementing it is a lot more difficult. For example, the If-Modified-Since and If-None-Match headers are fairly easy to understand, but when you try and implement this logic, things get a little more complicated. In working through this, I realized that there are nearly 20 possible scenarios that need to be handled by the server. The request might not have an If-Modified-Since header, it might have been modified, and it might not have been modified are three states the server must handle for that header. The If-None-Match may not be present or it might or might not match and it might have a '*' tag, which may not may not have an existing entity. You can't just process these one at a time either, but once you write down the edge cases it can all be handled fairly compactly within a precondition check.

Another area that surprised me was the request Cache-Control directives. There are five boolean directives and three with a specified value. All of these directives are fairly easy to understand and are used to determine if the cache can be used and if it needs to be validated. However, that is a lot of variables to manage and combined with the possible server directives, its gets really harry tracking their state. There were many occasions when adding support for a new header/directive that I inadvertently broke an earlier unit test (couldn't have done it without them).

The HTTP spec is fairly clear is some areas, but less so in others. One area that has had various interpretations is entity tags. It is fairly clear how ETags should be used with static GET requests, although I had to digest their implications on caches before I could understand how to use them with content negotiation. However, their recommended use with PUT and DELETE is still a bit of a mystery. When an entity has no fixed serialized format (such as a data record), it has many entity tags (one for each serialized variation and version). So, which entity tag should be used after a PUT or a DELETE that effects all variations?

This get even more complicated when some URLs are used to represent a property of a data record. If the property is a foreign key, the response has no serializable format, its a 303 See Also response. What does a PUT look like when you want to reference another resource? Furthermore, a DELETE of a property, just deletes the property, but the data record still exists and there is still a version associated with it, shouldn't the client be given the new version?

In the end I have a new appreciation for why there are so many interpretations of the HTTP spec and I have a fairly general purpose HTTP caching servlet filter on top of it.
Reblog this post [with Zemanta]

Wednesday, June 17, 2009

Resource Oriented Framework

What happens when you put an Object oriented Rules Engine in a Resource Oriented Framework? After three years of research, I think I have found the answer and have released it as AliBaba.

AliBaba is separated into three primary modules. The Object Repository provides the Object Oriented Rules Engine. It is based on the Elmo codebase that has been in active development for the past four years. The Metadata Server is a Resource Oriented Framework built around the Object Repository. Finally, the Federation SAIL gives AliBaba more scalability.

In AliBaba, every resource is identified by a URL and can be manipulated through common REST operations (GET, PUT, DELETE). Each resource also has one or more types that enable it to take on Object Oriented features that can be defined in Java or OWL. Each object's properties and methods can be exposed with annotations as HTTP methods or operations. Operations are used with GET, PUT and DELETE HTTP methods by suffixing the URL with a '?' and the operation name. These operations are commonly used for object properties, while object methods are commonly exposed as other HTTP methods (POST) or as GET operations. This HTTP transparency allows the Metadata Server's API to hide within the HTTP protocol and not dictate the protocol used - allowing it to implement many existing RESTful protocols.

AliBaba provides a unique combination of Object Oriented Programming and a Rules Engine, available in a Resource oriented Framework. I believe it combines some of the most promising design paradigms commonly used in Web applications. Its potential to minimize software maintenance costs and maximize productivity by combining these paradigms is very exciting.

Reblog this post [with Zemanta]

Monday, June 8, 2009

Intersecting Mixins

As software models become increasingly complex, designers seek additional ways to express their domain models in a form that more closely matches their design concepts. One way this is done is through Mixins.

A Mixin is a reusable set of class members that can be applied to more then one class. It is similar to inheritance, but does not interrupt the existing hierarchy.

Suppose we have a class called "Invoice" within our domain model and we want to "enhance" this class with operations to fax, email, and snail-mail it to the customer. To prevent the reduction of Invoice's cohesion, we want to define this behaviour in a separate construct. We could subclass Invoice, but the ability to send a document is common among other classes as well. We could put this logic in a super class, but that only works if there is an appropriate common super class among them. An alternative is to create mixins, called Faxable, Emailable, and Mailable, that are added to all the classes that can be sent.

Suppose some of our documents require a unique header. If this behaviour is common, but no appropriate super exists, a mixin would be a desirable choice. Mixins allow classes to be extended with new behaviour, but what if you want to alter existing behaviour? Unfortunately, many mixin implementations do not allow calls to the overridden method, and the ones that do require it to be done procedurally (by changing an open class at runtime).

When using inheritance, a subclass can call the overridden method to intersect and alter the existing behaviour, but a mixin does not inherit the behaviour of its targets and there are possibly multiple mixins wanting to alter the same behaviour, so there is no single "super member", but one for every mixin that implements it.

Most languages allow a mixin to override the target's behaviour, but don't allow it to be intercepted. Some languages, like Ruby and Python, allow the target class to be altered by renaming and replacing members. This allows the programmer to simulate an intersection, but is a much more complex way of handling it.

In AliBaba's Object Repository a mixin, also known as a behaviour, can declare precedence among other mixins and allows them to control how method execution proceeds. For example if a mixin has the annotation @precedes, it will be executed before any of the given mixin classes are executed. By declaring the method with a @parameterTypes annotation, with the overridden method parameter types, and a Message as the method parameter, the mixin can call msg.proceed() to execute other behaviours and retrieve their result. This allows mixins to call the overridden methods and provides intersecting other methods.

By extending the basic mixin construct to allow them to co-exist and interact, a mixin can be used to address other aspect oriented problems in an OO way.

Reblog this post [with Zemanta]

Thursday, June 4, 2009

Time to Rethink MVC on the Web

Typical web applications don't do a very good job at separating the model, view, and controller logic. The view is particularly confusing as logic is split between server side and client side. This is a side effect from trying to support "dumb" (or nearly dumb) clients (browsers that only support static HTML). Furthermore, logic that would be more appropriate in the model, often gets put into the controller to avoid unnecessary access to the database.

While these design decisions lead to faster web application servers, they also lead to what I call coincidence coding - an unwanted code separation that has no documentation or declared interface. The worst part is that this separation often happens in the most visible part of the system: over HTTP and SQL. This prevents the system from becoming a block box utility, because the internals rear their ugly heads. Any attempt to standardize the protocol (or to use well documented services) is impossible because the protocols get stuck between various components of the view or model and cannot be separated cleanly and maintained across versions.

It is not all doom and gloom: the web is a different place then it was when many of these web frameworks were designed. I am pleased to report that all modern desktop web browsers have good support for client side templating using XSLT (finally!). This means web applications today don't need to support these dumb clients. Furthermore, web proxies have also been improving enough that it is now common for production deployments to plan on use a proxy server in front of the web server(s). Accessing the persisted data is not as much of a problem as it used to be, as database caching is usually planned when systems are being designed.

With these (somewhat) recent developments, its time for a new breed of web frameworks to emerge. These frameworks could implement MVC on the web like we have never seem before - really separating the model from the view and from the controller. This could have a significant impact on the maintainability of web application software.

Yesterday, I introduced what I believe to be the first web application framework that does a decent job of separating MVC. It is called the AliBaba Metadata Server and can be downloaded from OpenRDF.org.

The view logic can only exist in static HTML, JS, CSS, and XSLT files. The model/data is transformed into standard data formats, like RDF and some forms of JSON without model specific transformations. You can't put any model logic into the controller, and all service logic must be put into the model. Stand alone, it would be slower than most web application servers, but with proxy and caching on both ends, I believe it will perform just as well and, most importantly, yield more maintainable web applications.

Reblog this post [with Zemanta]

Monday, June 1, 2009

Standard RDF Protocol

The SPARQL Working Group's first meeting has come and gone. Part of that meeting discussed the scope of what should be undertaken as part of SPARQL2, including what features should be added to the protocol. SPARQL is a standard query language and protocol.

I am a big fan of standards, as most people are in the Semantic Web community. However, I feel that a database protocol does not provide much value and in fact, should be tailored to the evaluation and storage mechanism used by the implementation.

Standards are intended to be interoperable between implementations, and SPARQL must abstract away from the storage mechanism used in order to achieve the desired level of interoperability. This leads SPARQL to result in a less than efficient query/retrieval operation, when compared to storage specific mechanisms like SQL. However, this still has significant advantages, as a single query can be used with a wide array of storage mechanisms and remains unchanged between significant storage alterations. This does come at a cost to both the ability to create efficient queries and the ability to evaluate queries effectively using the SPARQL protocol.

The cost of query parsing and optimization can always be improved and can be tailored to specific models, without any loss of interoperability at the query language level. However, a standard protocol cannot be optimized the same way.

When a database backed application has performance problems, 9 out of 10 times it is due to excessive communication between the database and its clients. Therefore, I question the value in abstracting this communication protocol away from the storage/evaluation mechanism, because this will exacerbate communication overhead.

While I think the SPARQL query language is developing nicely and should be considered in any project that wants to ease the maintainability of their queries, I also think any project that is concerned about the performance of their queries should consider using a proprietary protocol to optimize its communication with a database.

Reblog this post [with Zemanta]