Plan
Phase 1
Initial Requirement: to be able to run and test the 0MQ Guide examples automatically, i.e. without any user intervention. The minimum requirement for this is a timeout on the process.
The next feature needed is to kill servers when the client examples complete, since the server isn't needed any more. This requires some conditional kill operations. Diagnostics will just go to standard output, so they can be redirected and examined after a test run is complete.
Next: we want to log the status of the examples, to ensure the programs didn't crash or terminate prematurely or in error. We'll need some kind of logging mechanism.
Next: we want to gather the outputs and compare with what is expected to check for correctness.
Phase 2
Here we'll generalise the overall capabilities to provide better monitoring and control, possibly including a way to view the network status on a web browser.
Phase 3
This phase will extend the capabilities to remote process management. We'll use ssh (secure shell) to start daemons on target machines, and 0MQ to communicate with them. This will allow management of a network remotely without using ssh once the daemons are started.
Phase 4
Add security and encryption to the network management.
Status
At present I have a simple process manager which launches a process together with a monitoring thread. The thread polls the process status to see if it is running.

Here is a sample configuration for the current manager, which runs the Guide's hello world example.
The core routine is:
Urrgg .. the colouring sucks. This part of the Wiki uses an unmaintained PHP script with a fixed set of languages.
The system above is enough to actually run all the guide examples.
The guide examples need to be modified to accept command line arguments so that network addresses can be supplied on the command line instead of being hard coded.
The code above relies solely on a timeout to kill servers. The plan is to use the process names so one can write something like:
Generally: to provide event based triggers. The current process group manager is simple enough
but we need to think now about how to synchronise termination. Felix pthreads are always detached. There are two obvious possibilities: use a pchannel to communicate termination status to the group manager: this is the intended way to simulate joinable threads.
The second technique is modify the process manager to manage multiple processes. This reduces the number of threads required, and supports various synchronisation mechanisms such as event based triggers directly, without needing channels or locking. But it also reduces modularity, and has the major disadvantage that great care must be taken never to block, since that would cause monitoring of the processes to fail.
The previous account of termination only allowed one process group to run. The mainline would hang until all pthreads were dead. The following code provides proper termination, so the mainline can continue on with the next test group.
First, the process manager interface is changed to add an output pchannel:
Now, when the process is finished, the monitoring pthread writes to this pchannel:
This is the first half of the standard way to implement a join in Felix. The write operation blocks until the message is read by another pthread, in this case it will be the mainline.
Next we provide a monitor for the channels:
You should note that reading a pchannel is a blocking operation. It block the containing pthread.
** This code only works by a fluke: the spawned fthreads don't run until after the main fthread starts waiting. They then block, contrary to my prior belief (it's been a while!)
Finally, a modified process group manager:
You will note there is no locking around the process count: fthreads provide cooperative multi-tasking. No locking is required.
Here's the output of an actual run on the hello world client/server from the Guide, Felix version. Note the diagnostics are scrambled a bit, due to asynchronous writing to stdout, which is buffered.
I have found a bug based on my own misunderstanding of my own code. The fthreads that wait on the DEAD signals block. When an fthread is spawned, Felix does not specify if the spawner or spawnee runs next. The current implementation continues the spawner. This is the only reason the above code works.
In fact it may deadlock because the Faio::sleep() is asynchronous, that is, it actually does block an fthread and not a pthread. And of course this is true for S-channels. P-channel communication should be the same but it isn't.
I need to review how Felix spawns pthreads; some stuff is shared and some not. Clearly the stack isn't shared. Also the queues managing f-threads clearly can't be shared either. However procedural stack frames and the global "thread_frame_t" object are shared. The garbage collector (and allocator) are shared.