Designing and implementing IPC mechanisms on an ARM Cortex-M3 Kernel (2/2)

6. PROCESS COMMUNICATION

Process communication refers to schemes or mechanisms that allow processes to exchange information. It can be accomplished in many different ways, all of which depend on process synchronization. [1] ITC or Inter-Task Communication is a common term used on the embedded domain.

Now we have validated our semaphores are able to synchronize tasks running concurrently in our system, it is time to use them to create higher-level communications mechanisms.

Microkernel architecture

The overall design goal is to be a microkernel. That said, there will be tasks which will implement servers, and other tasks are going to be clients. They need to communicate.

Figure 14. The target architecture

Shared Memory, Pipes, and Message Passing are common communication schemes in operating systems [1, 2, 3]. Shared Memory using simple mutexes are effective for direct task communication on the same address space. Pipes are effective to share streams of data. Message Passing is used to exchange messages ready to be processed.

6.1 Synchronous versus Asynchronous mechanisms

Communication can be synchronous or asynchronous. The later means when a client calls a server, it is free to perform other tasks while the client does not answer. The former means after sending it will block until it gets an answer.

Figure 15. General client-server synchronous communication.

Asynchronous communication is intended mainly for loosely-coupled systems, in which interprocess communication is infrequent, i.e., processes do not exchange messages on a planned or regular basis. For such systems, asynchronous communication is more suitable due to its greater flexibility. [1]

Synchronous communication is well suited to tightly-coupled systems in which processes exchange messages on a planned or regular basis. In such a system, processes can expect messages to come when they are needed, and the usage of message buffers is carefully planned. [1]

7. MESSAGE PASSING

Message Passing is a general form of inter-task communication mechanism. On operating systems it is the basis of the so-called microkernel; on distributed systems it is used for RPCs (remote procedure calls), and also the basis for server-client oriented programming on computer networks. [1, 2, 3]

Message Passing can by synchronous or asynchronous. On Asynchronous both send and receive operations are non-blocking. On synchronous, both are blocking.

In this publication I will be showing a synchronous message passing mechanism. As said, Message Passing is the basis of a microkernel, the messages will be conveying requests to be processed by servers.

7.1. Design and Implementation of a Synchronous Message Passing

In a simple client-server communication system, there are two kinds of processes: servers and clients. Clients can only communicate with servers, and servers can only communicate with clients. Each client has either a guaranteed or dedicated message buffer. After sending a request to a server, a client immediately waits for a reply since there is nothing else the client can do. Upon receiving a client request, a server handles the request, which may require some time delay, but it eventually sends a reply to the client. In more complex server-client systems, there are processes that can only be clients, processes that can only be servers and processes that can be both. [1,2]

The design and implementation presented here is based on [1, 2], and processes are either clients or servers, never both.

7.2 Design

We can extend the consumer-producer problem to N messages. [3]

buffer[N], N = number of messages
process producer:
 I=0
 while (TRUE):
  produce contents
  receive buffer[I] from consumer
  put contents on buffer
  send to consumer
  I=(I+1)%N
process consumer:
 for K=0 to K=N-1 send empty messages to producer
 I=0
 while(TRUE)
  receive buffer[I] from producer
  extract contents from message
  put empty item on buffer
  send to producer
  I=(I+1)%N

Based on this algorithm, but for multiple producers and consumers, using a message queue to sequence the message processing, the design is proposed as follows:

  • The message buffers are arranged on a linked list. The next field of the last buffer of the list points to zero, so we know it is the last. The message buffer data structure contains the sender PID, the message contents and the address of the next linked buffer.
  • Every thread has a pointer to a Message Buffer on its Thread Control Block. Since there is a limited number of buffers available on the system, when a client wants to send a message to a server, it must fetch a free buffer from a list. If there is a free buffer, it dequeues its address from the list, writes the message, enqueues it on the receiver message queue and signals the receiver there is a new message. Initially, all buffers are free.
  • When a receiver is signaled, it dequeues the message buffer from its own TCB, reads its contents, and releases the buffer – put it back on the free list. The figure below shows a possible state, for a system with seven message buffers. Three of them are taken, waiting to be read by the receiver threads. It means 3 send operations had ocurred and no receive operations had took place for these messages yet.
  • The number of free buffers is controlled by a counting semaphore, which maximum count is the total number of available buffers on the system. The access to the free buffer list is a critical region (CR), disputed by senders, so it is protected by a mutex. The same reasoning applies to the message queues of every thread control block, each has its own mutex.
Figure 16. Possible state of a Message Passing with three messages waiting to be read by receivers.

Note for asynchronous communication the changes would affect the sending and receiving algorithms, the architecture depicted on Figure 16 would still apply.

7.3 Implementation

Likewise the design, the implementation is simple. The hardest part is the synchronization. Maybe the linked queue need some special attention. The send-receive interface must be a kernel call if the buffers are stored on kernel space (if you got a MMU). I hope the implementation mirrors the design idea:

/******************/
/*@File message.h*/
/*****************/
/*Message Buffer Data Structure*/
#define MSG_SIZE 32
typedef struct mbuff
{
	struct mbuff *next;
	int			 sender_pid;
	char		 contents[MSG_SIZE];
}MBUFF_t;
/*Interface*/
/*initialize free list*/
extern void init_mbuffs(void);
/*send *msg to pid, return OK or NOK*/
extern int sendmsg(char *msg, int pid); 
/*receive msg and store at *msg, return sender PID, 
0 if fails (0 is a reserved PID)*/
extern int recvmsg(char *msg); 
/****************/
/*@File kernel.h*/
/****************/
/*Thread control block with message passing fields*/
struct tcb 
{
	int32_t*		psp;			/*last saved process stack*/
	int32_t*		ksp;			/*last saved kernel stack*/
	struct tcb		*next;			/*next tcb*/
	int32_t			kernel_flag;	/*kernel flag 1=user thread, 0=kernel thread*/
	int32_t			pid;			/*process id*/
	kSemaphore_t*	block_pt;		/*blocked 0=not blocked, semaphore address=blocked*/
	uint32_t		sleeping;		/*sleep counter*/
	kSemaphore_t	nmsg;			/*new msg sema*/
	kSemaphore_t	mlock;			/*thread msg queue sema*/
	MBUFF_t			*mqueue;        /*thread msg queue*/
};
/******************/
/*@File message.c*/
/*****************/
/************************************************************************/
/* Synch Message Passing                                                 */
/************************************************************************/
MBUFF_t mbuff[NMBUF]; /*message buffers*/
MBUFF_t *mbufflist=NULL; /*free message buffers*/
kSemaphore_t nmbuf=NMBUF; /*counting semaphore*/
kSemaphore_t mlock=1; /*sema for mbufflist*/
static inline void copyString_(char *dest, char* src)
{
	for (int i = 0; i<MSG_SIZE; ++i) { dest[i] = src[i] ; }
}
static int menqueue(MBUFF_t **queue, MBUFF_t *p)
{
	MBUFF_t *q  = *queue;
	if (q==NULL)  /*empty list*/
	{
		*queue = p; /* insert buffer address */
		p->next = 0; /*make the last*/
		return OK;
	}
	while (q->next) /*search for the last position*/
	{
		q = q->next;
	}	
	q->next = p; /*take it*/
	p->next = 0; /*make the last*/
	return OK;
}
static MBUFF_t *mdequeue(MBUFF_t **queue)
{
	MBUFF_t *mp = *queue;
	if (mp)	*queue = mp->next;
	return mp;
}
void init_mbuffs(void)
{
	int i;
	for (i=0; i<NMBUF; i++) /*initially all buffers are on the free list*/
	{
		menqueue(&mbufflist, &mbuff[i]);
	}
}
/*Note the System Call API was renamed to kCall (kernel call).*/
static MBUFF_t *get_mbuf() //get a free buffer
{
	MBUFF_t *mp;
	kCall(SemaWait, &nmbuf); /*if no buffers, block*/
	kCall(SemaWait, &mlock); /*CR*/
	mp = mdequeue(&mbufflist);
	kCall(SemaSignal, &mlock); /*CR*/
	return mp;
}
static int put_mbuf(MBUFF_t *mp)
{
	kCall(SemaWait, &mlock); /*CR*/
	menqueue(&mbufflist, mp);
	kCall(SemaSignal, &mlock); /*CR*/
	kCall(SemaSignal, &nmbuf); /*increases counting sema*/
	return OK;
}
int sendmsg(char *msg, int pid)
{
	tcb_t *r_task; 
	r_task = &tcbs[pid]; /*receiver task*/
	MBUFF_t *mp = get_mbuf();
	if (mp == NULL)
	{
		return NOK;
	}
	mp->sender_pid = RunPtr->pid;
	copyString_(mp->contents, msg);
	kCall(SemaWait, &r_task->mlock); /*CR*/
	menqueue(&r_task->mqueue, mp);
	kCall(SemaSignal, &r_task->mlock); /*CR*/
	kCall(SemaSignal, &r_task->nmsg); /*signal a new msg*/
	return OK;
}
int recvmsg(char *msg)
{
	int pid;
	kCall(SemaWait, &RunPtr->nmsg); /*wait for new msg*/
	kCall(SemaWait, &RunPtr->mlock); /*CR*/
	MBUFF_t *mp = mdequeue(&RunPtr->mqueue);
	if (mp == NULL) 
	{
		kCall(SemaSignal, &RunPtr->mlock);
		return 0;
	}
	kCall(SemaSignal, &RunPtr->mlock); /*CR*/
	copyString_(msg, mp->contents);
	pid = mp->sender_pid;
	put_mbuf(mp); /*releases buffer*/
	return pid;
}

7.4 Simple Validation

This small validation shows the proposed Message Passing mechanism is able to send and receive messages synchronously.

Here is the test code:

void Task1(void* args)
{
	init_mbuffs();	/*init buffers*/
	int ret=NOK;
    /*Application Data Units*/
	char apdu2[] = {0xA2, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x02, 0x02};
	char apdu3[] = {0xA3, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0x03, 0x03};
	while(1)
	{
		ret = sendmsg(apdu2, 2);
		if (ret == OK) printline_("Sent to 2\n\r");
		else assert(0);
		ret = sendmsg(apdu3, 3);
		if (ret == OK) printline_("Sent to 3\n\r");
		else assert(0);	
	}
}
void Task2(void* args)
{
	 int32_t sender_pid=0;
	 char msgrcvd[MSG_SIZE]={0};
	while(1)
	{  
		sender_pid = recvmsg(msgrcvd);	
		if (sender_pid == 1) { printline_("2 rcvd from 1\n\r");}
		else assert(0);
		(void)msgrcvd;
	}
}
void Task3(void* args)
{
	int32_t sender_pid=0;
	char msgrcvd[MSG_SIZE]={0};
	while(1)
	{
		sender_pid = recvmsg(msgrcvd);
		if (sender_pid == 1) { printline_("3 rcvd from 1\n\r");}
		else assert(0);
		(void)msgrcvd;
	}
}

Note tasks are sending and receiving in loops and being blocked/preempted. No additional synchronization was added besides those provided by the implemented mechanism.

The total number of Message Buffers is 2, since we are going synchronous. The message size was set to 8 bytes.

Below a picture of the debugging window of Task 2 with the proper message copied from its queue.

Most important are the printed lines showing the expected synchronous behaviour:

Below another demonstration of message passing. An interrupt-driven system receives a 2-byte command from the keyboard via UART. The first byte is the receiver PID, the server. The server job is to blink an LED on certain frequency. The number of blinks is given by the second byte.

In this example, the Message Passing interface is implemented as a Kernel Call.

/*
 Message Passing Usage:
 This example receives a 2-byte message from UART
 The first byte is the PID of the receiving task
 The second byte is the number of led toggles
 Each server toggles the led on a different frequency 
*/
#include <stdio.h>
#include <bsp.h> 
#include "kernel.h"
#include "tasks.h"
MUTEX_t print;
SEMA_t rcvd;
MSG_t uartbuffer[2];
volatile int norb = 0;
volatile char rcvdbyte;
ISR(UART_Handler)
{
	if((uart_get_status(UART) & UART_SR_RXRDY) == UART_SR_RXRDY)
	{
		uart_read(UART, &rcvdbyte);
		uartbuffer[norb] = rcvdbyte;
		norb++;
		if (norb == 2) 
		{
			kSemaSignal(&rcvd);
			norb=0;
		}		
	}
}
static inline void printline_(const char* string)
{
	kCall(LOCK, &print);
	uart_write_line(UART, string);
	kCall(RELEASE, &print);
}
/* Client */
/* init_mbuffs() was called on kernel init */
void Task1(void* args)
{	
    kSemaInit(&rcvd, 0);
    kMutexInit(&print, 0);
	while(1)
	{
		WAIT(&rcvd); /*block until signaled by ISR*/
		if (uartbuffer[0] == 2) 
		{ 
			kCall2(SENDMSG, uartbuffer, (int)uartbuffer[0]);
        }
		else if (uartbuffer[0] == 3) 
		{ 
			kCall2(SENDMSG, uartbuffer, (int)uartbuffer[0]);
        }
		else
		{
			printline_("No route\n\r");
            SUSPEND(); //yield
		}
	}
}
/* Server */
void Task2(void* args)
{
	MSG_t msgrcvd[MSG_SIZE]={0};
	while(1)
	{
		kCall(RECVMSG, msgrcvd);
		if (SYSRET == 1) /*rcvd from PID 1*/
		{
			printline_("2\n\r");
			for (int i = 0; i<msgrcvd[1]; i++)	
			{
				gpio_toggle_pin(LED0_GPIO);
				delay_ms(500);
			}
		}
	}
}
/* Server */
void Task3(void* args)
{
	MSG_t msgrcvd[MSG_SIZE]={0};
	while(1)
	{
		kCall(RECVMSG, msgrcvd);
		if (SYSRET == 1)
		{
			printline_("3\n\r");
			for (int i = 0; i<msgrcvd[1]; i++)
			{
				gpio_toggle_pin(LED0_GPIO);
				delay_ms(1000);
			}
		}
	}
}

This usage demonstrates Message Passing conveying commands to be processed by servers. From this “Hello World” to the design of a system performing more complex tasks on a client-server pattern, it might take some effort but it is pretty doable because the mechanisms are set up.

8. FINAL REMARKS

This article ends a series of four publications that demonstrate the design and implementation of a functional preemptive kernel on an ARM Cortex-M3 MCU, sitting on the top of the CMSIS layer. GNU ARM toolchain was used.

I have started from creating a simple preeemptive round-robin scheduler to preempt concurrent tasks running on kernel space. Then, user and kernel threads were split, and because of that a system call mechanism was needed. Synchronization primitives were designed and implemented, and with them it was possible to create inter-task communication schemes.

Starting by exploring the ARM Cortex-M3 MCU, this series evolved into concepts and practice on embedded operating systems design, and while researching to write it I could learn a lot. Although the result is a useful functional multitasking engine, from this stage to a full-fledged (RT)OS kernel there is a distance. This is an ongoing project, on a very early phase. The full source code can be found on https://github.com/antoniogiacomelli/k0ba.

References

[1] Embedded and Real-Time Operating Systems, (K.C. Wang, 2017)

[2] Design and Implementation of the MTX Operating System, (K.C. Wang, 2015)

[3] Modern Operating Systems (Andrew Taneunbaum, Herbert Boss, 2014)

Author: Antonio Giacomelli de Oliveira

Embedded Systems Engineer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: