Asynchronous Middleware for Parallel Systems
Home Overview Problems Architecture Benefits Developers Products Add-Ons Contact
  AMPS Architecture
| Application Model | Event Management | Object Caching | Timer Management | Memory management

Module types in AMPS

A module can be categorized in AMPS as of two types: non-blocking or blocking.

Non-blocking modules

These modules perform CPU bound operations and possibly I/O operations, but those I/O operations could be performed without blocking. AMPS provides non- blocking interfaces for almost all I/O operations for which the underling operating system provides non-blocking mode of operation. For example, on Linux and Windows, it provides APIs for sending and receiving of network messages and reading and writing of files for event logging and accounting etc. For such non-blocking modules, the steps described above are sufficient for integration of such a module with the rest of the application.

Blocking modules and I/O agents

There may be instances where blocking may become unavoidable. Examples include resolution of Domain Name System (DNS) queries (since most libraries available with common operating systems for this purpose use blocking mode of operation), and interfacing with databases. In both these cases, the underlying OS or library may provide non-blocking interfaces in addition to blocking ones e.g. Windows provides asynchronous DNS library calls and Oracle provides an asynchronous interface to database. However, commonly used database and DNS interfaces are blocking in nature. When dealing with such modules, AMPS require the developer to create these modules as I/O agents. I/O agents are modules that perform blocking I/O and are I/O bound i.e. they prepare the I/O requests and then spend most of their time waiting for I/O operations to complete. I/O agents are composed of a front-end dispatcher of events and a thread pool (yes this creates the exception to the overall paradigm of single threaded execution. This is where non-determinism is selectively introduced) . Each thread in the pool has an associated event queue. Threads remain blocked on their event queues when they have no work to do. The front-end dispatcher is registered as the event handler with the main event manager for all events the module has registered for as a listener. The dispatcher selects a thread from the pool and dispatches a received event to the event queue associated with the thread. This design has the advantage that large burst of I/O requests could be absorbed in thread-pool’s event queues. Each thread runs the same code. Once it wakes up as a result of a newly queued event on its queue, it determines the event type and then calls the event handler for the event. The event handler may block while performing an I/O operation. In that case the thread serving the event blocks but the application continues with processing other events undisturbed. The application never blocks when there is useful work to do. Since the threads implementing I/O agents are supposed to be I/O bound, they take very little CPU time and then block on the I/O. They would usually not live their time-slice given to them by the operating system. They therefore, do not create much disturbance for the progress of the main application. When the results of the I/O operations are available, the threads wake up and generate appropriate events, which in most cases would be response events. A fundamental requirement for I/O agent threads is that they must not share any data with other threads or with the rest of the application. They must communicate with other modules via events carrying data via inter-thread communication mechanisms provided by the OS (of course AMPS provides APIs for this purpose). It is assumed that I/O agents would perform dedicated I/O related tasks such as formatting a database query and sending it over to the database, and processing the response and sending it back to the application via the events system. We do not foresee any requirement for these threads to share any data or requiring synchronization with other threads or with the rest of the application. The designer of an I/O agent has to take all the steps from (a) to (g) given in the previous section. In addition, he/she has to determine how many threads should be created in the thread pool. This may depend on how much load is expected on the I/O agent in question. For example, an I/O agent involved in each protocol session and in processing path of most of the messages coming from the network e.g. DNS resolver, may require several threads to achieve maximum concurrency under heavy load, whereas another agent whose interaction with the rest of the system is infrequent may require a single thread. As for the dispatcher function, the designer may use the dispatcher supplied by AMPS to distribute requests between threads, or write his/her own dispatcher for this purpose. Figure 2 below illustrates the flow of information between a module requesting a service and an I/O agent.

 
 

In figure 2 we have module 1 requesting a service that is provided by module 2 which is an I/O agent. Module 2 is further decomposed into a thread pool and a dispatcher function. The dispatcher function is registered as a callback with the event manager. The thread pool consists of four threads. The request event R1 generated by module 1 is routed by event scheduler to module 2 and its dispatcher function is invoked. The dispatcher selects a thread from the thread pool (thread 2 in this example) and passes the request to this thread. The thread performs the required I/O for the request. Once the I/O is complete, it returns the results by generating a response event RES1 which is routed back to module 1 since it registered for this event type.

CPU agents and exploiting multiple CPUs

Similar to I/O agents, CPU agents are another type of modules that have exactly the same internal design as I/O agents. They are also composed of a dispatcher and a thread pool. However, unlike I/O agents they are completely CPU bound. They perform no I/O whatsoever. They communicate with the main application via inter-thread communication mechanisms just like I/O agents. The reason for creating CPU agents might be to exploit parallelism on a hardware platform with multiple CPUs. If the application developer could isolate pieces of code e.g. procedures or collection of procedures that perform heavy CPU operations on input data, she could create a CPU agent with the number of threads in the pool at least equal to the number of CPUs or cores available. This would let the OS exploit multiple cores to schedule threads, thus enhancing parallelism and consequently, performance. However, it must be ensured that the isolated procedures must not share any data with the rest of the application. In other words, they must not require modifying session or any other global state. As an example, the SIP parser or at least some of its core parts in the SIP server application is a good candidate for converting into CPU agents. Another example might be a trans-coder or encoder/decoder function in a media server. Threads that perform infrequent synchronization with each other are usually handled well by the multi-processor OS scheduler in terms of processor affinity and exploitation of parallelism.

Application contexts

In event management systems, state pertaining to sessions is kept in context objects. Context objects are usually kept on the stacks of threads in multi-threaded systems. In event driven systems however, these context objects must be kept globally and passed to event handlers as parameters. In AMPS, an application usually creates a hierarchy of contexts:

System context: At the top level is the system context that is a global context of the complete application. System wide global state is kept in system context.
Application context: System context contains a table of application contexts. This means that AMPS can support several different applications running simultaneously and independently of each other. Within application context is application level global state i.e. state information that applies throughout the application.
Session context: Application context also contains a table of session contexts. For any session oriented protocol, the session context contains state pertaining to a protocol session. The event handlers modify the state in the session context as they implement the protocol session state machine. They also keep session related data structures for book-keeping, accounting and any other information they want to store for correctly performing the protocol functions in the session context. Event handlers may also modify application context in some cases, if there is a state that needs to be made globally available across all sessions e.g. statistics etc. Some event handlers may also modify system context as well, especially those that serve user interface related configuration and provisioning events applicable to the whole system.

 
Event manager

It should be obvious from the description of AMPS architecture so far, that event management system is at the heart of the architecture. Event manager consists of event registration mechanism, event scheduling and notification. Event registration is simple; A list of registered callbacks is maintained per event type. In addition, the event scheduler maintains a queue per event type. Events are generated using an API provided by AMPS. The API function puts the generated event in its respective queue. It does not actually call the handlers immediately. The event scheduler actually calls the registered handlers according to the scheduling policy.

Main application
 

The application built on top of AMPS performs the following functions:

    • It performs the initialization required to setup all the I/O and computation modules required by the application. This usually involves creating all configured modules, including both non-blocking modules and I/O agents.
    • It also sets up network transport descriptors e.g. sockets, file I/O descriptors etc. AMPS internally sets up the descriptors non-blocking for the operating system and adds them to the operating system for polling of external I/O events arriving on them. As part of this initialization, the application also registers an I/O receive function with each descriptor to perform initial application level processing on the received data. AMPS provides the necessary APIs for performing these tasks.
    • After setting up the I/O descriptors and creating all the modules, the main application calls an API that polls the operating system for any pending external events on any of the descriptors. If there is no external event pending, the user may setup the application to wait indefinitely, make it wait till a timeout occurs, or may make it return immediately.
    • When one or more events occur, the application wakes up and enters a processing loop. Inside the loop, it retrieves each event from the operating system, again using an AMPS API. The API retrieves each event and internally calls a network receiver processing function registered during initialization for the descriptor on which the event occurred. Note that the application level event type is not yet known when a network or other I/O event arrives at a descriptor. It is the job of the initial processing function to determine the application level event type for each I/O event arriving on the I/O descriptors. Once the application level event type is known, that event is generated. The event generation mechanism puts the new event into the queue for that event type. This way, arriving I/O events are processed by the main application and corresponding application level events collected in event queues.
    • Once all external I/O events have been retrieved and their corresponding internal events generated, the main application invokes the event scheduler.

On our Linux implementation, we use the new epoll event notification system call instead of conventional select or poll APIs. epoll is far more scalable than select and poll. We have found it to scale to several thousand network events arriving at significant rate (higher than 1000 per second) for several hours. In our opinion therefore, this is the most scalable event notification facility available on Linux today.

Event Scheduling

The event manager calls the event scheduler to actually generate events in a certain order. The scheduler goes through each event queue and calls the registered callbacks or event handlers. It may happen that the event handlers generate further internal events during their processing. Those new events would be put in their respective queues. The scheduler could serve the queues in many possible orderings. For example, it may make multiple passes through all queues repeatedly till it finds no event in any of the queues. Another possibility could be that it knows about the event queues holding events generated as a result of I/O events arriving from the outside (let’s call them external events), and those generated by event handlers internally during their execution (let’s call them internal events). The scheduler may serve one external event queue till it is empty, and then all internal event queues to quickly get the internal events out of the system, then move on to the next external event queue, and so on. How the scheduler works would depend upon the scheduling policy. AMPS provides a default scheduler that classifies queues into external and internal categories and schedules between them as just described. This also means that events collected in each queue are delivered as a batch. This policy may provide good instruction cache behavior for events whose list of registered event handlers has only one entry. This case is quite common for request and response type events. This policy may also result in good data cache behavior if one or more event handlers registered for a particular event access the same data structures. This type of good caching behavior is virtually impossible to achieve via thread scheduling done by OS. However, this policy may result in increased latency for other events if a large number of events of a particular type arrive simultaneously. To circumvent this, the default scheduler in AMPS uses the scheduling policy called Deficit Round Robin (DRR). In DRR, each external events queue is assigned a quantum q, which in our case for now, is specified in the number of events. In addition each external queue has an associated credit c with a maximum limit l. The scheduler algorithm is as follows:

  • The scheduler serves q events from each external queue in each round.
  • After each external queue, it serves all internal queues.
  • If a queue has lesser than q events, the difference is added to its credit c if c is less than l. If there are greater than q events, then c events are served at the most. The credit c thus, serves as the burst size for a particular external event queue in conditions of high load.
  • If c reaches its maximum limit l, then an additional events arriving for the external queue are dropped.
  • here is no limit or credit for the internal event queues.

There is no possibility of starvation since all queues would get their chance eventually. The user can modify the algorithm as desired. The event scheduler provided with AMPS serves as an example only. The user could serve events in an order of priority, do simple FIFO scheduling, or just do a simple round robin if caching behavior is less important than providing round robin properties.

As pointed out earlier, complete control of scheduling of events in AMPS has several advantages over designs that deploy multi-threaded model to let the operating system perform the scheduling. The application knows best what behavior is desirable for its purpose. Once events are collected in queues, new and novel algorithms could be applied to the queues and experimented with to improve different performance metrics of the application. Figure 3 illustrates the flow of information between different components of the application.

Figure 3 illustrates the flow of information between different components of the application.

In figure 3, we have four network I/O descriptors that have as many network receiver functions. When network messages arrive on the descriptors, the receiver functions are called which call initial parser of messages to determine the internal event types for each message. The receiver functions then generate internal events that are en-queue by the event manager into event queues for particular message types. Each event queue holds messages for one particular event type. After all messages have been posted into event queues (queues numbered 1 to 4 in the example), the event manager invokes the scheduler to determine event ordering. It then calls the registered event handlers in an order determined by the scheduler. Event handlers 2, 3 & 4 perform the necessary processing on messages and then generate further messages to the network. Event handler 1 on the other hand, processes the event and then generates another event, passing the message along to that new event. This new event is en-queued in queue 5 by the event manager. The event is finally delivered and the registered event handler 5 is invoked. Handler 5 processes the message further and finally generates a network message to complete the processing. Note that the programmer has to break the processing of event type 1 into two handlers, 1 and 5.

Another important thing to note about figure 3 is that network messages are sent by event handlers directly, without the possibility of registering yet another event handler in case network I/O might block. This is because AMPS takes care of sending messages to sockets and other type of descriptors internally within the middleware. If the API finds that a descriptor would block due to busy error etc., it en-queues the message internally and return success to the application. It sets up the operating system to notify when the descriptor is write-able again. The write- able event occurring on the descriptor is also treated just like another event in AMPS. The registered event handler for this event (registered internally by AMPS) sends en-queued messages in FIFO order. If it again finds blocking condition while queue is being emptied, it repeats the process i.e. re-registers the event handler and sets up OS notification again. If the application sends another message to the same descriptor, the AMPS API for sending messages en-queues the new message in FIFO order if the queue is not empty. Otherwise it tries to send it right away.

News & Events

1st October 2009
A complete Service Delivery Platform for telecommunications applications is released all built on top of AMPS. SDP is showcased name Augur is available at http://Augur.biz

1st July 2009
AdvOSS launches a complete Diameter AAA server built on AMPS. The server is tested with very high load of millions of subscribers and worked well.

1st April 2009
AdvOSS launches full suite of Diameter applications built on top of AMPS. These include a HSS (Home Subscriber Server), Offline Charging and Online Charging. These complete a full suite of AAA applications for IMS (IP Multimedia Sub-System)

1st Jan 2009
Diameter Stack Launched. AdvOSS has launched a full Diameter protocol stack. This protocol is at the heart of next generation AAA and requires implementations that support higher processing and require scalability. This stack is now an integral part of AMPS.