AMPS Architecture |
|
| Memory management |
Efficient memory management is a key issue in any non-trivial software artifact. Memory handling could become a severe performance bottleneck if allocation and freeing operations for buffers involve expensive searching, sorting and re-combining of memory blocks. AMPS memory management sub-system is designed to meet the following goals: |
- The memory management must be fast and bear little overhead for the system.
- The memory management should try to minimize memory leakage without incurring the overhead of a garbage collector. Of course this goal will not be too relevant if AMPS is implemented in Java or any other language in future that provides automatic garbage collection.
- The memory manager should acquire memory from the underlying system e.g. the C library gradually on demand. However, once acquired, it may hold memory for relatively long time to minimize memory freeing overhead.
|
| AMPS meets these goals in the following manner: |
| Application protocol servers usually use memory in burst patterns. They build up relatively large data structures used for the duration of a particular phase of computation, and then discard most or all of those data structures. The surviving data structures represent the results of a phase as opposed to intermediate values. A prime example is the processing of a protocol message that arrives from the network. The message usually enters the server, is parsed and processed, possibly modified, new fields or data may be added while some fields may be removed or updated. Memory is dynamically allocated during all stages of life time of a message within the server. This memory may not be freed till the time message leaves the system e.g. it is forwarded, replied to, or discarded after processing. The individual memory objects allocated during a message processing phase may be freed when the processing moves on to the next phase. However, if the freeing of these objects is delayed till the life time of the whole message, it may prove beneficial for two reasons: |
- If we keep accumulating memory allocated in smaller chunks (for individual objects) and free all objects at once i.e. at only one place in the code, it would substantially decrease the number of memory leaks. This is clear because we would be calling free at one place instead of multiple, possibly large number of places.
- We could pre-allocate a larger buffer in the beginning i.e. when the message first enters the system, and then build a simple, extremely fast allocator on top of this buffer for our smaller objects. This is because we would not have to worry about managing free lists and other overheads for small, individual objects.
|
Taking the example of a protocol message, when it arrives at the server, we allocate a large buffer of any appropriate size using the low level memory manager provided by the programming language. Set a pointer at the start of this buffer. This forms our context or the memory manager object for the particular message. When the processing of message starts and passes through its various stages, the application would allocate memory using a special API that would take the memory manager object just described as an argument. The allocator would simply move the pointer by the requested number of bytes, and return the address where the pointer was before the allocation request. The application could then write that buffer using the returned pointer at will. All subsequent allocations would thus move the pointer forward. Finally we may reach the end of the large buffer before the end of life of the message in our system. If that happens, we must allocate another large buffer, attach it to the previous one, and then start allocations from this new buffer. This way we may keep on building a linked list of larger buffers embedded in the memory manager object. This linked list would be freed when the message is no longer needed in the system i.e. when the manager object would be destroyed. The important thing to note is that the higher level allocator that allocates from the large buffer is extremely fast since it only adds the requested size to a pointer variable. |
| AMPS provides an API to create a higher level memory manager object with a certain buffer size. This API returns an object that is passed to another API for allocations. Memory is allocated as described above i.e. from the linked list of buffers created inside the manager object. Another API is provided to free or destroy the memory manager object. |
For the lower level memory manager, AMPS provides another optimization. Once a larger buffer has been allocated for a manager object using the lower level allocation function provided by C i.e. malloc, AMPS caches that buffer in its internal size-segregated linked lists of free buffers when the manager object is destroyed. When a memory manager object is created, AMPS first checks the cached linked list for that particular size, and if found, takes one from that free list instead of calling malloc . When the manager object is destroyed, AMPS returns the freed linked list of buffers for that object to the respective cache instead of calling the free function of C library.
Note that this design implies that AMPS would keep building internal free lists of relatively large sized buffers and not return them to the application’s heap ever. This may result in memory depletion under heavy loads e.g. when huge number of protocol sessions are concurrently in progress. To avoid this, AMPS provides a configuration API for its internal memory management. This API would free the size-segregated free lists when the total size of the buffers in these lists reaches a certain threshold. This threshold is specified in percentage of the total available physical memory. Of course, the application could additionally perform its own admission control by limiting the number of concurrent sessions based on some application specific criteria. Application could register an event handler for the internal AMPS event that fires when the segregated lists (memory usage) reaches a certain threshold of the available memory. |
Figure 5a below shows the memory management object. It contains a linked list of buffers of a particular size. It also contains other state information including the size of the buffer, total bytes allocated, pointer to head of list, current active buffer etc. Figure 5b shows the internal structure of a single buffer of the linked list. The pointer moves down towards the end of the buffer with each allocation. Figure 6 shows the buffer cache with linked lists of different sizes.
These sizes are a power of 2 starting from 1K bytes. |
 |
| Scalability, clustering and distribution |
One of the major concerns of server application developers is how the system would scale as the load increases. Ideally, the system should seamlessly scale by adding more hardware i.e. computation power and memory to the system. This means that if a server running on a single machine becomes congested, adding another machine should transparently increase the system throughput two-fold. Considering the AMPS design so far, scalability of a server can be achieved as follows: |
The CPU agents and I/O agents as described earlier can be easily distributed across different machines since both types of agents communicate with the main event processor thread via messages. There is no dependency or requirement of being in the same address space with the main application. If the agents can communicate with the main application via inter-thread communication mechanisms, they could communicate via inter-process mechanisms as well. AMPS chooses the best way of passing messages from one event to the other. While making this decision AMPS can consider whether the two processor are in the same address space or not. If the communicating processes are running on different machines, they could communicate via TCP/IP protocol as well. An application developer can use this idea to distribute several CPU and I/O agents on different stand-alone machines. The main application running the event loop would transparently generate events for CPU and I/O agents as before. The registered event handler would act as a dispatcher as before, and send the event to one instance out of possibly several instances of agents, running on different machines, over a TCP connection. The dispatcher may select an instance of an agent based on some load balancing criteria, and would thus have to keep state about the health and load on different agent instances. It must be noted that the protocol session and other global state would still be maintained at only place i.e. the main application. The machines running agents would be executing a simple application only containing an event loop and one or more CPU or I/O agents, each consisting of a dispatcher and a thread pool. The main loop of this simple application would generate an internal event for its local dispatcher when an incoming event arrives. The local dispatcher would distribute the events load to its local thread pool. The rest of the operation would be the same as described before for CPU and I/O agents. This would result in transparently distributing the application across machines and creating a fully distributed application. AMPS currently recommends using TCP for inter-machine connections. This provides the benefits of reliability, congestion control and automatic keep-alives. |
|
| News & Events |
|
1st October 2009
A complete Service Delivery Platform for telecommunications applications is released all built on top of AMPS. SDP is showcased name Augur is available at http://Augur.biz
1st July 2009
AdvOSS launches a complete Diameter AAA server built on AMPS. The server is tested with very high load of millions of subscribers and worked well.
1st April 2009
AdvOSS launches full suite of Diameter applications built on top of AMPS. These include a HSS (Home Subscriber Server), Offline Charging and Online Charging. These complete a full suite of AAA applications for IMS (IP Multimedia Sub-System)
1st Jan 2009
Diameter Stack Launched. AdvOSS has launched a full Diameter protocol stack. This protocol is at the heart of next generation AAA and requires implementations that support higher processing and require scalability. This stack is now an integral part of AMPS.
|
|
|