Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Simplefeed

What is Atom?

[Atom](http://en.wikipedia.org/wiki/Atom_(standard\)) is an XML-based syndication format which represents time-ordered series of events. In Atom terminology, each event is an entry and the series of events is a feed.

...

Atom is a general-purpose format that can be extended to fit a particular domain. By using a well-understood general-purpose format as a basis, we can reuse tools, infrastructure and semantics. The Atom syndication format was formalised in RFC 4287.

Event-driven systems

Atom can be used to implement event-driven systems by adding an entry to the feed every time something happens that consumers might be interested in. Consumers find out about these events by polling the feed. When new consumers arrive, they can simply also start polling the feed without any change needed to the feed provider.

Depending on the nature of the event and the size of the resource that it applies to, the Atom entry could either include a snapshot of the new state of the resource, or simply link to the resource so that the client can issue a fresh GET.

Because Atom is based on polling, it has high latency compared to other approaches, which might make it unsuitable for real time systems. However, using a RESTful polling approach decouples client and server, and provides scalability through the opportunity for heavy caching.

Simplefeed

We will describe a protocol suitable for broadcasting events to multiple consumers using Atom. To keep things simple, we won't worry about editing or writing to feeds. We will assume that the server providing the feed has some way of recognising and registering new events as Atom entries.

"Atom" can also sometimes refer to AtomPub, a protocol for editing and publishing web resources built on top of the Atom XML format. If you need your feed to be editable over HTTP, AtomPub is a good place to start.

...

  • Entries are ordered by the time they were added to the feed, from newest at the top of the feed to oldest at the bottom.
  • An entry, once published, never changes.
  • The series of entries is paginated over many Atom documents.
  • All documents except the oldest newest will have a link with rel "next-archive" pointing to the next document in the feed.
  • All documents except the newest oldest will have a link with rel "prev-archive" pointing to the previous document in the feed.
  • Only the newest document, known as the recent document, may change, and that can only change by newer events being prepended to it.
  • When the feed provider decides the recent document is finished, it archives that document and creates a new document.
  • There is a published URL that always points to the recent document and serves as an entry point to the feed.

Though it is good practice to use intuitive URLs, the protocol in no way depends on the structure of the URLs. Consumers should follow links, and not attempt to construct URLs themselves.

Paginating feeds

As Atom entries are intended to be available indefinitely, the feed of events can grow very large over time. This makes the feed too large for consumers to conveniently GET over a network.

The solution is to break up a single logical feed into many documents. As Atom is designed to follow RESTful conventions, Atom does this by means of links.

The feed provider breaks up the series of events into separate documents and gives each of them their own URL. The provider might choose to make each document represent a period of time e.g. a day, or might divide the series of events evenly so that each document has e.g. 100 entries.

Similar to a doubly-linked list, each document has prev-archive and next-archive links that can be followed to find the next document in the chain.

There are two special cases. Because it is at the head of the list, the recent document cannot have a next-archive link. Similarly, the first document does not have a prev-archive link.

The recent document is the published entry point to the feed. Consumers of the event feed will always be able to retrievehttp://example.com/recent to get the most recent entries.

...

<?xml version="1.0">
    <feed xmlns="http://www.w3.org/2005/Atom">
        <id>urn:uuid:ff31a040-75bc-11e2-bcfd-0800200c9a66</id>
        <title type="text">Notifications</title>
        <link rel="self" href="http://example.com/documents/1" />
        <link rel="next-archive" href="http://example.com/documents/2" />
        <!-- There is no "prev-archive" link as this is the first document. -->
        <entry>..</entry>
        <entry>..</entry>
        <entry>..</entry>
    </feed>

Consuming feeds

A consumer of an Atom feed keeps track of the unique identifier of the most recent entry it has processed. Because the entries are time ordered, and the only change is to add new entries onto the front of the feed, consumers can work backwards till they find the oldest entry they have not yet processed, and then work forwards through the feed processing each event in turn.

For example, a consumer might know that it has most recently processed the entry with id "urn:uuid:fc374b00-75c7-11e2-bcfd-0800200c9a66".

This consumer wants to check if there are any more recent entries, so it issues a GET request on http://example.com/recent, which is the published entry-point of the feed.

...

Notice that the feed provider does not have to keep track of who the consumers are or which entry they are each up to. The guarantee that new events are always added to the front of the list allows consumers to do that for themselves.

Documents are chained using next-archive and prev-archive links

Implementation considerations

One of the biggest advantages to using Atom is caching. Servers should serve archived documents with aggressive Cache-Control headers, because once a document is archived, it does not change. Servers can take advantage of this by writing archived documents out to disk and serving them as static files.

Note that this caching is possible because the pagination of entries into documents is controlled by the feed provider. If every consumer could decide how to break up the series of entries, we would have to cache many different pagination combinations.

Consumers should not care how pagination is implemented, because they simply follow links and retrieve resources. The provider might choose to break the feed into time periods e.g. a document per day. However, if the flow of events is irregular, that could lead to some documents being very large and others being very small. This can be addressed by keeping a fixed number of entries per document.

Cache-Control headers on the recent document are limited by the freshness requirements of the system. It may still worthwhile allowing consumers and HTTP caches to cache the recent document for a short period of time, as it can reduce load on the server hosting the feed, but the length of time in the header must take into account how quickly consumers need to find out about new events.

For example, in a system with heavy load, using Cache-Control headers with a short time-to-live in conjunction with a reverse proxy gives an upper-bound on how often the server needs to generate the recent document. For example, a time-to-live of 60 seconds means that the origin server will only have to generate the recent document once per minute, regardless of how many consumers there are.

Another useful optimisation for the recent document are ETags and Last-Modified headers. Consumers poll the recent document potentially very often, so if they are able to issue a conditional GET accompanied by an ETag or Last-Modified date then the feed provider can simply reply with a 304 Not Modified response and avoid transferring the entire recent document to the consumer.

References

A great reference for understanding Atom's use in RESTful event-driven systems is REST in Practice by Jim Webber, Savas Parastatidis and Ian Robinson. We based our description of Simplefeed on Chapter 7.

...