I am going to implementing a publish/subscribe mechanisms for our recent Web-based research prototype. Therefore, I am interested what others already thought about this pattern. Looking for the pub/sub paradigm, Google returns the corresponding Wikipedia article on the first place. Screening the article, there are a few starting points worth to be remembered:
Looking for more information on the Web I cam across the Publish/Subscribe integration pattern from Microsoft's patterns & practices. The problem statement seems reasonable:
In the context description, the following communication infrastructures are mentioned:
In contrast to to the Wikipedia article, we learn about three different types of mechanisms:
Having a closer look, the List-based Publish/Subscribe mechanism maintains a list of subscribers, similar to the Observer pattern. Attach() and Detach() operations allow to modify the list of subscribers while a Notify() operation is used to send updates to the subscribers. Seems to be well suited if you have one publisher and many subscribers, but does not look suitable if subscribers watch many subjects. The core functionality of List-based Publish/Subscribe can thus be identified as
If we understand subscription lists as named channels, the List-based Publish/Subscribe represents a topic-based subscription mechanism.
The Broadcast-bases Publish/Subscribe mechanism simply dumps messages to the local are network. Each subscriber is responsible for listening and inspecting the subject line of the message. If the subject line matches, the subscriber processes the message. This approach seems to be a optimum in decoupling the system. Clearly, this can be identified as some kind of a topic-based system. If the publisher needs to know about subscribers to a particular topic, a hybrid approach can be chosen, where a additional process requests information about interested subscribers. To establish the hybrid system, however, every subscriber must respond to the request. Another name mentioned in the article is publish/subscribe channel with reactive filtering due to the responsibility of each subscriber to filter the messages on its own.
In difference to the Wikipedia article, in this article the author differentiates between topic-based and content-based mechanisms. In this context, both, List-based Publish/Subscribe and Broadcast-based Publish/Subscribe are understood as topic-based mechanisms.
While topics are considered as a pre-defined set of subjects, each message in a content-based system can be understood as a single dynamic logical channel. This idea was proposed in The Evolution of Publish/Subscribe Communication Systems [pdf]. We will come back to this paper later.
Where to implement the pub/sub functionality depends on your underlying communication structure:
The article also differentiates between fixed subscriptions and dynamic subscriptions. While applications cannot control their subscriptions, dynamic subscriptions allow to modify subscriptions through certain control messages.
Some more keywords are listed in the article:
How to implement a dynamic list-based publish-subscribe pattern is illustrated in the MSDN library.
I found also some article about the way EDA (event-driven architecture) extends SOA including a nice depiction of the idea behind EDA. There, EDA is proposed for a publish/subscribe mechanism rather than a command/control mechanism as provided by SOA. EDA seems especially suitable when you are facing
It is also mentions that some good support for the EDA pub/sub pattern would be a declarative model.
I also came along this article giving a brief overview of Publish-Subscribe Channel pattern from the book Enterprise Integration Patterns by G. Hohpe and B. Woolf. It basically tells that the channel delivers a copy of the message to each of the output channels where each output channels has only one subscriber. After the message is consumed, it is removed from the channel.
Having a look into Exploiting Virtual Synchrony in Distributed Systems, mentioned in the beginning, gives you an insight into several issues in distributed systems. One interesting fact to bear in mind is about synchrony vs. asynchrony. If your publisher requires responses this could be 0, 1 or n for n subscribers. If you expect 0 responses you actually run a asynchronous system. For so-called process groups an interface is provided, allowing to join or leave a group but also to receive updates on the group memberships. Sounds similar? The Observer pattern, I see here. In the described news service, one already realizes the common concepts described before: "Each subscriber receives a copy of any message having a 'subject' for which it has enrolled on the order they were posted.". The overall description is rather abstract, but gives a interesting insight into the development of the mechanism.
Afterwards I ended up directly with The Evolution of Publish/Subscribe Communication Systems, providing a well written summary of the publish/subscribe paradigm. Especially the decoupling fact has been structured into
Again, we see content-based and topic-based mechanisms which makes me think twice of the classification proposed in the Wikipedia article. Back to the paper, the authors state that content-based pub/sub systems cannot rely on
A single server simply cannot deal with a high number of subscribers and the limited number of IP multicast addresses does not fit the large number of logical channels. Rather they propose a application-level realization through a set of event-brokers, exchanging information on a point-to-point basis. For broker interaction the following issues are pointed out:
Maybe its worth to mention, that both papers address communication systems on network and overlay network infrastructure-levels than on application-level. However, the concepts are the same.
Baldoni comes up with the concept of ad-hoc subscription languages, compared to SQL for databases. This, however, requires a-priori knowledge of the structure of the information space. At least, the idea of selecting subscriptions or topics using a query language sounds quite appealing. As future research direction, a potential formal specification of the subscription service, provided by a pub/sub system is proposed.
Another often cited paper I have a look at is The Many Faces of Publish/Subscribe [pdf]. Similar to the paper before, the three decoupling dimensions time, space and synchronization are considered to extract the common concepts of different variants of the pub/sub paradigm. We learn that individual point-to-point and synchronous communication leads to rigid and static applications. Three types of pub/sub mechanisms are introduced:
The basic terms for sending and receiving messages through a software bus/event used here are
The core system should provide a
The events used here are called subscribe(), unsubscribe() and publish() - not that different from the ones we know from the Observer pattern. Some new operation is called advertise() to advertise the nature of future events of an publisher. That way, the event service can adjust to the expected event flows and subscribers can learn when new types of information come available. We also learn about alternative communication paradigms here:
For the three pub/sub forms we find some more detailed information:
Having a closer look at events we learn about the classification into messages (delivery through a single operation e.g. notify) and invocations (event triggers some specific operation on the subscriber). Furthermore, invocations are directed to a certain kind of objects and provide some well-known semantics. You can also differentiate between on-way invocations (COM+ or CORBA Event Service) and those requiring some return value.
We see different kinds of architectures there:
Dissemination of messages is also discussed but relies a lot on the underlying concepts. Efficient multicast in content-based pub/sub systems, however, is pointed out to be still an issue.
Some more points to be considered are related to QoS:
Bearing this information in mind, I now have a look into the Publish-Subscribe Notification for Web Services [pdf] whitepaper as part of the WS-Notification family. The document deals with the notification pattern for notification-based or event-driven systems in the Web service context. Here we see the same pattern as learned before:
The spec defines (among others) some interesting requirements:
In the terminology section we find another interesting statement saying "a Subscription is a WS-Resource" where a WS-Resource is defined as follows:
"A Web service having an association with a stateful resource, where the stateful resource is defined by a resource properties document type and the association is expressed by annotating a WSDL portType with the type definition of the resource properties document"
Got it? At least let us think of subscriptions as resources. This idea lines up well with my current research.
Also the fact of hierarchically structured topics is considered: Especially topic trees are hierarchically structured topics and topic spaces are a set of topic trees grouped together into the same namespace (obviously due to administrative reasons).
It is actually the first document dealing with security aspect, listing the following classes of attacks:
Finally, I found the Distributed Publish/Subscribe Event System on CodePlex: In the whitepaper, the various types of pub/sub are characterized by
The Web Solutions Platform (WSP) is designed as a distributed pub/sub system and works both, intra-machine and inter-machine. Applications here subscribe to event types, so it looks like an event-based pub/sub system. The document provides some more descriptions on the system itself but no more information on publish/subscribe mechanism in general.
That's a lot of stuff and now I have to spend some time in reflecting all these information for my design.
Remember Me
a@href@title, strike