This document is still not avaliable. Please come back in a few weeks.

Introduction

Message processing is a problem that involves the creation, replication and filtering of data that flows across different logic paths. The creation and manipulation of such data can be very memory consuming if not done carefully. Think about a large Internet mailing list with 500 subscriptors and a batch job that has to send 10 messages to them. With a message of 100kb an application could be using 500Mb of memory to store the messages before sending them. Seems like this is not a good way of doing things.

'distribution' is oriented to dispatch messages one by one, thus consuming the smaller amount of memory resources that is possible. To do this, handlers are designed to return the messages as soon as they are processed, allowing each message to reach the end of the processing chain before handling other one.

The bad side of doing this is that some resources (POP3 connections, SMTP connections, database connections...) may timeup in the interval in which a message is processed by handlers. To handle this, components and handlers are designed to be able to reopen connections whenever possible. For situations when this is not possible, caching strategies are avaliable.

Design Overview

The central part of distribution is the DistributionProcess. It represents a distribution process which is, in gross, defined by a set of nodes and components.

A node holds a MessageHndler, which performs certain operations on the message that it receives. A node also stores a list of forwards. Each forward references another node that will be receiving the messages returned by the former node.

The set of nodes and their forwards conform a flow graph that messages go through during the distribution process. In each node, messages may be edited, created, blocked or replicated.

When the DistributionProcess is initialized, a DistributionMessage is dropped in the process' start node. This serves only for the purpose of signaling the first node that the process has begun. From that moment, the messages will go through the chain of nodes following the rules that have been configurated in the forwards.

Messages are routed by a piece called the MessageFlowController.

images/design/distribution-class-diagram.png
distribution principal classes diagram

DistributionContext and EL Expressions

Data management

TODO: Explain the mechanics of the data plugins package vs message flow.

Plugins

Plugins are the pieces of distribution where job is actually done. The plugins. Plugins provide the necessary components to process messages and data.

A plugin may provide different classes of objects:

  • MessageHandler: this type of object process messages and return none, one or many messages. MessageHandlers are used to fetch or send mail, to edit messages, write messages to files, store information in a database...
  • DistributionComponent: Are objects that hold a resource or perform an operation, but are not related to each particular message during the message process. Components are used to hold references to resources like databases and query results.
  • DistributionFunctions: Are objects that provide functions to be used inside expressions.