per-JVM totals of 140.8%, 38.1%, 42.1% and 38.6% (the geometric means shown in Figure 7). We can see from this figure that on average the most expensive part is the accounting itself (Step 3), followed by the obtaining of the consumption counter (Step 1), and finally the polling (Step 2). For the purely interpreted platform, Step 1 is clearly the most expensive. This comes from the fact that here we insert an unconditional method invocation (for obtaining the reference to the right ThreadCPUAccount), and invocations are relatively costly on that platform. The second most expensive step is the accounting (Step 3), while polling (Step 2) is a relatively light operation.
It must be noted that these numbers can only be considered as an approximation. Indeed, the storing of the ThreadCPUAccount reference into a local variable at the beginning of each method may be discarded by the JIT, and never executed (hence never measured), unless this store is followed by code for reading the variable, e.g., for polling and accounting. This means that the overhead calculated for Step 1 may be slightly underestimated, while the overhead for Step 2 would be slightly overestimated, except on the JVM without a JIT (the Sun JVM in purely interpreted mode).
Based on the numbers of Figure 8, we introduce differentiated optimizations in the next two sections.
5. Reducing the Overhead of Finding the Proper Instruction Counter
The per-thread instruction counter is encapsulated inside a ThreadCPU- Account object, itself to be found via a reference to the current Thread. We have explained in Section 3.3 how the getCurrentAccount() method is already optimized for obtaining this information by patching the Thread class. As part of our standard optimization settings, we also decided to directly inline the contents of getCurrentAccount() instead of generating an invocation to it.
In order to avoid these repetitive executions of getCurrent- Account(), we instead pass the ThreadCPUAccount as additional argument from method to method, by changing all method signatures during the transformation process. We describe this approach in the following section, and thereafter we present an enhanced version of it.
5.1. Wrapper Rewriting
Figure 9 illustrates how the method shown in Figure 4 is transformed using a CPU accounting scheme that passes the ThreadCPUAccount as extra argument.