2 thread wait-set
Cost (CPU Cycles)
Table 1: Micro benchmark performance
ping pong 2 thread message ping pong
practical to port existing software written in C++ or other unverifiable languages. We are currently investigating the combination of language-based safety with hardware pro- tected address spaces in order to isolate unverified code from verified code using hardware sandboxes.
The channel implementation described in this paper has been in use in Singularity for the last 8 months. The code base relies on 43 different channel contracts with roughly 500 distinct messages, and 135 different protocol states.
Passing ownership of memory blocks and endpoints over channels is used frequently. Among the 500 messages, 146 carry pointers into the exchange heap, and 42 messages carry channel endpoints themselves. Thus, the ability to config- ure the communication network dynamically by transferring endpoints is quite important.
The static verification of ownership has so far delivered on the promise of eliminating errors related to dangling pointer accesses and leaks of exchange heap blocks. We use two redundant verifiers checking the ownership: one running as part of compilation of the source language to MSIL , and a second verifier checking the MSIL. This redunancy helps shake out errors in the verification itself that would allow ownership errors to pass the compilation undetected.
One serious shortcoming of our current implementation is that we do not check exceptional program paths. This short- coming stems from the fact that we don’t have exception specifications in our code base. Without exception specifi- cations on methods, the analysis of exceptional paths would lead to too many programs being rejected. We are actively working on specifying exceptional behaviors and integrating them with the ownership programming model and verifica- tion.
This section contains micro benchmarks comparing the per- formance of Singularity channel operations against other systems. All systems ran on AMD Athlon 64 3000+ (1.8 GHz) on an NVIDIA nForce4 Ultra chipset and 1GB RAM. We used Red Hat Fedora Core 4 (kernel version 2.6.11- 1.1369 FC4), and Windows XP (SP2). Singularity ran with a concurrent mark-sweep collector in the kernel, a non-con- current mark-sweep collector in processes (including drivers), and a minimal round-robin scheduler.
Table 1 reports the cost of simple operations in Singularity and two other systems. On the Linux system, the process- kernel call was getpid(), on Windows, it was GetProces- sId(), and on Singularity, it was ProcessService.GetCycles-
Windows XP SP2
Cycles per send-receive
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
Message size in bytes
Figure 5: Send-receive cost vs. buffer size
PerSecond(). All these calls operate on a readily available data structure in the respective kernels. The Linux thread test ran on user-space scheduled pthreads. Kernel sched- uled threads performed significantly worse. The “wait-set ping pong” test measured the cost of switching between two threads in the same process through a synchronization ob- ject. The “2 message ping pong” measured the cost of send- ing a 1-byte message from one process to another and then back to the original process. On Linux, we used sockets, on Windows, we used a named pipe, and on Singularity, we used a channel.
As the numbers show, Singularity is quite competitive with systems that have been tuned for years. Particularly encouraging are the message ping pong numbers on Singu- larity as they show that the cost of using a channel between distinct processes is only about 15% slower than the cost of the wait-set ping pong on two threads of the same pro- cess. Channels on Singularity outperform the mechanisms available on other systems by factors of 4 to 5. These im- provements don’t even factor in the ability to pass pointers rather than copies of memory blocks.
Figure 5 shows the cost in cycles for sending and receiving a block of memory ranging from 1 byte to 64K. As expected, the cost on Singularity is constant around 1200 cycles. On the other systems, the cost starts to increase around buffer sizes of 1K. These measurements do not show the perfor- mance improvements possible if buffers of page size or larger (4K on x86) are remapped rather than copied. Such an optimization would of course require another IPC interface rather than sockets and named pipes.
7. RELATED WORK
There is a long history in the systems community of drawing on programming languages and safety guarantees to make systems more reliable, maintainable, and/or more perfor- mant. For example, SOLO was written in concurrent Pas- cal instead of assembly . Early work around the Mesa language focused on memory safety through garbage collec- tion, type safety, and modularity as well as features such as coroutines and recursion that would be burdensome to implement without runtime and language support [18, 30, 35]. The SPIN micro-kernel used Modula-3 to grant appli- cation specific extensions to run directly in the kernel to optimize performance . These systems had very weak iso-