Saturday, August 16, 2008

What is a thread?

Some of my friends are confused by all kinds of concepts around thread, such as system thread, kernel thread, native thread, user-level thread, application thread, java thread, software thread, hardware thread, simultaneous multithread (SMT), hyperthread (HT), helper thread, etc, etc.

So what a thread is?

A thread is nothing but a control flow of execution. It is an concept only valid in control-flow machine, because only then is there control flow. Then what is control flow? Or in other words, how to represent a control flow. In my opinion, only two entities are essential to represent a control flow. They are the program counter and the stack.

Program counter points to next instruction to execute. Stack stores the temporary execution result. To be a meaningful stack, a stack pointer is needed pointing to the next location to store the execution result.

Program counter and the stack can uniquely identify a control flow of an execution. They cannot be shared with other threads (except for some extreme cases). All of other computing resources can be shared between threads, such as heap, code, processor, etc. Hence, program counter and the stack are called the thread context.

That means, if a system provides threading support, it should at least provide a way to distinct one thread context from another, be it in software, hardware or hybrid. If processor hardware provides thread context support, it is hardware thread. Different hardware threads can share same processor pipeline (SMT) or use different pipelines, depending on the design. HT is an implementation of SMT. Any control-flow processor must provide at least one thread context; otherwise, there would be no control flow.

If the processor has only one thread context, threading support still can be provided by software. That is, multiple software threads can multiplex over the same thread context. When a software thread is scheduled to run, its context is loaded from the memory into the hardware thread context. If it is scheduled off the processor, its context is saved into the memory.

Software thread design has an implication. Since the thread context loading/storing (or switch) is conducted by software, it requires the software thread design to guarantee that there are chances to conduct the switch operation (or thread scheduling). An easy way to implement is to leverage hardware interrupt. Once the software receives a hardware interrupt (timer or whatever), it executes the interrupt handler and within the handler, it schedules the threads.

Sometimes, the timer is too long to wait. For example, when a thread is sleeping, before a timer handler is executed, no other thread can be scheduled. This is not desirable. A straightforward solution is, if a thread wants to sleep, it always invokes the scheduler, then the scheduler can switch on another thread.

Now that multiple software threads can share the same hardware context, it is not hard to think that, a software thread context can also be multiplexed by another level of multiple software threads. This is true. So conceptually, software threads can be built with infinite levels, every higher level threads multiplex the contexts of its next level threads.

Also a natural corollary is, a thread is only a thread in your level of discussion. It could contain multiple threads in a higher-level of discussion. Well, although this is true, people do not really build many levels of software threading libraries. Usually there are only two levels, one level shares the hardware context, and the other level shares the software context.

This is reasonable. The most important reason is, all the software threads in one level are treated as a single thread in the next level, so they are scheduled as one thread in the next level. That means, they total only share the time slice of a single thread in the next level. If the next level thread is scheduled off the processor, none of them can be continuing. This is inconvenient.

More inconvenient is, sometimes, only one thread wants to sleep, but all the other threads have to sleep with it together, because they are treated as a single thread in the next level scheduler who sees the sleep operation. This issue can be partially solved with non-blocking sleep. That is, when a thread wants to sleep, it does not really sleep in the sense of the next level scheduler. It only sleeps in the eyes of its level's scheduler. This scheduler will schedule another thread at the same level. From the next-level thread scheduler's point of view, the thread is just continuing without sleep at all. In threading terminology, all the blocking operations (such as sleeping) in one level are implemented as non-blocking in its next level. (Well, this requires the system support for non-blocking operations, such as socket snooping, etc.)

Only operating system kernel really takes control of the execution engine (i.e., the processor or the pipeline). So the time slice concept is only really meaningful to the kernel. That means, only the threading at kernel level can really manipulate all the resources. Higher levels of software threads should always try to leverage the support of kernel threading. This is the fundamental reason why we want at most one additional level of threading above kernel threads.

Kernel threads are exposed to user applications through threading APIs. They are called native threads by the applications, such as NPTL or Linuxthreads in Linux, and WinThreads in Windows. The threading library implemented on top of native threads is called user-level threading. For the inconvenience we discussed above, not so many software today employ user-level threads.

User-level threads have its own advantages in certain scenarios. For example, multiple user threads never run in parallel on multiple processors/cores, because they are actually just single thread from OS' point of view.

Java thread is thread in another dimension. It is actually a language concept. It can implemented in any of threading mechanisms discussed above. Previously before Java, all the threading supports are kind of independent of programming languages. They are just system supports, and any languages can utilize if they want. Java takes a different approach that, it builds threading concept in its language. This is important for program semantic correctness. Hans had a PLDI paper with title "Threads Cannot be Implemented as a Library" [1]. And people are trying to introduce threading as a language construct into more languages.

[1] Hans Boehm, Threads Cannot be Implemented as a Library,

Saturday, May 24, 2008

Apache Harmony 5.0 M6 release

The Apache Harmony team are pleased to announce the immediate availability of Apache Harmony 5.0M6. Apache Harmony is the Java platform project of the Apache Software Foundation. This is the latest stable build of the Harmony project's implementation of the Java SE specification, and contains numerous enhancements and bug fixes including:

* new JIT optimizations
* functional and coverage enhancements throughout the class libraries
* improved VM threading design
* support for full hardware addressability on 64 bit platforms
* ...and much more

Source code and binary builds are available from the Harmony download site:

Apache Harmony welcomes your help. For more information on how to report problems, successes, and to get involved in Apache Harmony visit the project website at

The Apache Harmony Team

Monday, March 3, 2008

Apache Harmony 5.0M5 available

The Apache Harmony team are pleased to announce the immediate availability of Apache Harmony 5.0 Milestone 5.

Apache Harmony is the Java platform project of the Apache Software Foundation, working towards a full compliant implementation of the Java SE specification.

Apache Harmony 5.0 Milestone 5 is the latest stable build with numerous enhancements and bug fixes, including a new applet viewer tool and the initial implementation of unpack200. We recommend that everyone update their current version to Apache Harmony 5.0M5.

Source code and binary builds are available from the Harmony download site.

Apache Harmony welcomes your help. For more information on how to report problems, successes, and to get involved in Apache Harmony visit the project website.

The Apache Harmony Team

Thursday, November 22, 2007

Will Apache Harmony succeed?

There was a survey [1] two and half years ago when Apache Harmony was started, to get people's comments on Apache Harmony's fate. Reading through it, I felt more pessimistic or negative opinions than optimistic or positive ones. Many people believed Harmony was either useless or going to fail, although they had different reasons.

About nine months ago, an article "How To Tell The Open-Source Winners From The Losers" [2] tried to summarize why an open source project could fail. Those are nine points to check:

  1. A thriving community: A handful of lead developers, a large body of contributors, and a substantial--or at least motivated--user group offering ideas.
  2. Disruptive goals: Does something notably better than commercial code. Free isn't enough.
  3. A benevolent dictator: Leader who can inspire and guide developers, asking the right questions and letting only the right code in.
  4. Transparency: Decisions are made openly, with threads of discussion, active mailing list, and negative and positive comments aired.
  5. Civility: Strong forums police against personal attacks or niggling issues, focus on big goals.
  6. Documentation: What good's a project that can't be implemented by those outside its development?
  7. Employed developers: The key developers need to work on it full time.
  8. A clear license: Some are very business friendly, others clear as mud.
  9. Commercial support: Companies need more than e-mail support from volunteers. Is there a solid company employing people you can call?

Using this checklist to measure Harmony, though Harmony has good scores for most of the points, Charles doubted "what passionate user community will form around Harmony when open Java is available on the Net?"

I have to say Charles has made very valid points largely for open source projects, but I can't agree that Harmony is losing developers due to OpenJDK. I won't elaborate my arguments, just one point here: Harmony is not necessarily existing only as an alternative Java implementation. So Harmony is not necessarily losing its developers, because they are not just looking for an alternative Java implementation. For this specific point, I have a couple of examples:

  • Google Android uses Apache Harmony for its class libraries;
  • People are porting Harmony GC(s) to other runtime systems;
  • Some Java applications do not care if Harmony is Java certified, using Harmony as default runtime environment.

Let's see how Apache Harmony is going to evolve. It's still too young (less than three years old). Stay tuned.


Monday, November 19, 2007

Google Android, Apache Harmony and Java Packaging

Last week Google released SDK for Android, an open and free mobile platform to write most of the code in Java. However, you do not call it Java, which makes you free from licensing. Stefano Mazzocchi shows how this move by Google is really elegant. He uncovers tricky details about likely strategies of both companies. Stefano's thoughts are really impressing and made me shout "Aha! Now I see who is who!".

Same time, folks working on the Apache Harmony are really impressed to see what is behind the release. Especially we are glad to meet Android as a big "customer" that uses a significant part of the Apache Harmony class library (detected by Geir Magnusson, the original chair of the project). Still, I believe, there were people who used Harmony class library for their proprietary app development based on JavaME. But this time I can joyfully jump and tell all friends "Apache Harmony is considered not pointless, wow, wow!".

Now Google is going to give away it's largest Open Source piece of code. The important question is here to raise: how Google and other companies in the Open Handset Alliance are going to develop the platform? Will they throw new code "over the wall" or are they going to grow a community that works based on collaborative development?

As a Google employee I cannot describe what the Open Source strategy of the company is. The strategy is obviously just too tricky to describe on one page and not specified in details for projects of that scale. The thing of special hardness is to predict how other companies in the Alliance are going to cooperate here. I hope, by signing a non-fragmentation agreement they mean it, but it is completely not clear how seriously they consider it.

As of special note. Apache Harmony, as a project, has always been welcome for companies to combine their efforts on development of class libraries, VMs and other Java components. We believe that the best strategy for everyone would be the open development of platforms for innovation. At least, this is the easiest way to maintain all components in compatible state.

As for me, there is another really interesting implication behind the announce: Google thus showed everybody that it no more considers Java certification important on mobile phones. Hey, what can JavaME conformance give you except stripped functionality and extra cost for so called Sun's certification? It appears that Java itself does not have a nature of open source software stack yet. Jilles van Gurp, a member of research staff at the Nokia Research Center (Helsinki) advocates compatibility and is disappointed about reinvention of many wheels. Yes, people have been taught by Sun for 10+ years that Sun's certification is important. And many among Java people really care about Java compatibility. But does this certification guarantee you no bugs of some fair level of performance? Obviously, it does not.

Why this does not happen for Linux people? When I run my favorite Linux distribution I do not care about full-featured, certified Linux with all certified libraries. I just get some open source software, and when I need some more, I apt-get install them into my system. With broadband Internet it is very convenient.

But Java is not like that. It is huge, all ways certified. It takes a while to download once you accidentally decide to get a Java plugin in your browser. What happens when I want to start developing something? I download the same chunk with extra bits called JDK. Do I need all that stuff twice? In Linux you would just install a package with headers and one with a compiler, right? So, Java is still lacking bazaar, a naturally open source way of living.

JPackage project addresses this for some extent, but it limits itself to provide extra packages and leave the core Java where it is. Even Sun knows that Java must be leaner, but their solution of on-the-fly downloading all necessary packages appears to be just another brick in the cathedral's wall.

Hey, I want to play my favorite pacman game on my phone, why do I need this completeness and certification of Java? Leave that story for enterprise! How about Java packaging and binary distributions? I know, I know, it is out of the Harmony's scope now, but given that no open letter to Sun can make our implementation the official Java SE, maybe we should go the Google way?

A small disclaimer: This is my personal opinion. The views expressed in this article are mine alone and not those of my employer.

Tuesday, October 23, 2007

What language to choose for Harmony GC development?

During Harmony GC development starting from GCv5, I made a design decision that we should try to keep the programming language "C" compatible. Although GCv5 uses certain C++ style, e.g., the source file names are using cpp suffix, we tried to avoid any C++ specific things.

The reasons I chose C for GCv5 development are:

1. We want GCv5 to control all its own memory, i.e., there is no hidden memory management brought in by the language. Writing GC in C++ doesn’t cause serious problem in this issue, but the problem is obvious when writing GC in Java, where there are lots of hidden objects allocated.

2. We try to keep GCv5’s capability for other runtimes written in C, which is common in open source community, such as Linux kernel, gcj, Ruby, etc. We expect one day GCv5 can be applied to some of them, although I don’t know when it will. We had successfully ported a version of GC to Ruby 1.9 last year.

3. We want to make GCv5 self-sufficient with all its own encapsulated utils, so that it can be easily ported to other languages if we want. For example, this design makes it very easy to write GC Runtime Helpers in Java. (I will talk about the runtime helpers later.)

4. In the early stage of GCv5 development, I used some C++ data structures such as linked list, vector, etc., then I removed them gradually, because we want to have delicate control on synchronous accesses to the data structures, such as sync-queue, sync-list, etc.

The arguments above are not strong enough though. I have to say it’s also kind of perfectionism. :)

Monday, October 15, 2007

Debugger for Java*/JNI Environments

Everybody who writes Java*/Native applications faces debugging problem. It is really difficult to write large applications since one have to use at least 2 debuggers to debug the program. One have to start application from Java* debugger and attach to the process with native one and switch between them during debugging process [1]. Until now!

The good news is Intel released integrated debugger for Java*/JNI environments. It allows to easily debug mixed mode application and pretty easy to install since is implemented as plug-in for Eclipse platform. Mixed mode debugger needs NCAI (Native Call Access Support) implementation in VM which is an expansion of JVMTI interface. Currently the only VM supporting this interface is Apache Harmony VM.

The debugger has pretty straight forward installation guide and intuitive interface. It’s a great step to make Java* easier to use with native libraries.

[1] There is number of links dedicated to the problem:,,,

Saturday, October 13, 2007

EIOffice with Harmony

It's a good news to Apache Harmony community that, a developer testing version (v0.02) of "EIOffice with Harmony" bundle was released on sourceforge at

EIOffice is an office suite written in pure Java based on Java swing. It has complete functionalities for document processing, presentation creation, and spreadsheet generation. EIOffice is developed by Evermore Software Co., a company locating in Wuxi city, Jiangsu province of China. EIOffice is the abbreviation of "Evermore Integrated Office", so sometimes it's called EIO as well. It's "integrated" because it is a single application that support all the three types of office documents processing (document, presentation and spreadsheet). The data can be "linked" between them, such as from a spreadsheet into a report presentation. One an update is made to one site, all its linked sites will be updated accordingly and automatically.

The real appealing feature of EIOffice to a software developer is, it's written in pure Java, but there is no obvious performance issue in my using experience. And the memory footprint is acceptable (or surprisingly lower than expected).

To make EIOffice to work with Harmony is a serious exercise on Harmony graphics classlib support (swing/awt/Java2D). It's said EIOffice the world-largest single (desktop) application written in Java. Once Harmony can run all its functionalities smoothly, that probably means Harmony is ready for any Java desktop applications. The current bundle is version 0.02. It's still a long way to go, but considering the fast progress of Apache Harmony development, I believe a version 1.0 could be expected in a couple of quarters.

Friday, September 21, 2007

VM errors handling.

Current post is dedicated to behavior of different implementations in extreme cases and particularly in case of stack overflow. JVM spec says (see

“The class Error and its standard subclasses are exceptions from which ordinary programs are not ordinarily expected to recover.”

but what VM behavior we can rely on that is the question. First of all let’s understand, that this is actually important. Everybody who writes some resource allocation or synchronization code could face with this problem. Let’s imagine that you have some code like this:

resource.acquire (); //Acquire some resources
//e.g. file handler or lock
Do some work here (which could include calls etc))
resource.release(); //Release resources

this code actually is not so good as might be. If some exception were thrown in main block the resources wouldn’t release which could lead to resource leakage or deadlock. To solve this let’s put resource releasing into the finally block.

try {
Do some work here...
} finally

Of course that doesn’t guarantee that resources are released, but it will be noticed in this case. To investigate the problem I’ve done some experiments on different implementations of Java 1.5 (Sun JRE 1.5 and JRockit JRE 1.5) and Apache Harmony (which is still haven’t got JCK to pass)

I started experiments with simple example:

public class SOETest1 {
public int i = 0;

public static void main (String[] args) {
SOETest1 test = new SOETest1();
try {;
} finally {
System.out.println("The final i value is: " + test.i);

public void foo() {
try {
} finally {

In this example we have infinite recursion, which apparently will produce StackOverflowError at some step. It was expected that variable i is equal to 0 at the end of the test, but the situation is actually differs for different VMs. In fact SUN JRE 1.5 and Harmony behave as expected whereas JRockit JRE 1.5 value of i was 1025. It seems that in case of stack overflow JRockit silently unwinds a stack frame and thus skips execution for number of finally blocks. Definitely this is not the best way of doing things since one can’t rely on even simplest actions in finally blocks are done.

Then I've tried more complex example with method call in finally block:

import java.util.concurrent.locks.ReentrantReadWriteLock;

public class SOETest2 {
public static ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
public static void main (String[] args) {
try {
Thread[] t = new Thread[2];
for (int i = 0; i < t.length; i++) {
t[i] = new SOEThread();
} catch (Exception e) {
} finally {
System.out.println("The end!");

static class SOEThread extends Thread {
public int i = 0;
public int j = 0;
public void run() {
System.out.println("Thread: " + this + " started");
try {
} finally {
System.out.println("Thread: " + this + " finished, final [i,j] values are: [" + i + "," + j + "]");

public void foo() {
try {
} finally {

In this example we have 2 threads which acquire lock on the same object, lock is released in finally block. There are two counters i and j; i allows to check that finally block was executed for each foo()and j shows how many times unlock() was successfully called. If lock is released fewer times than obtained then the test gets into the deadlock.

The results for this test are quite expected:

  • SUN JRE 1.5 returns [i, j] = [0, 1]. j = 1 is o.k., because VM just got another SOE while trying to call unlock(). Second thread is failed to finish due to deadlock (since unlock() was called fewer times than lock()).

  • For JRockit JRE 1.5 [i, j] = [507, 507], this behavior conforms to the first case, JRockit again skips number of finally blocks (which produces deadlock situation) and then successfully executes method unlock().

  • Harmony returns [i, j] = [0, 53], throwing new SOE for the deepest unlock() calls. IMO this is the most correct behavior since finally block was executed exactly once for each foo() and new SOE was thrown for every unsuccessful call of unlock(). Anyway test still got into deadlock.

Deeper investigation showed that Harmony’s behavior could be improved. Such a huge number of failures during unlock() execution is due to the method compilation (method is compiled only at the first call). If compilation is accomplished earlier (it could be done by adding preliminary calls of unlock()) then Harmony works just fine. It returns [i, j] = [0, 0] and both threads are finished successfully. SUN’s and JRockit’s behavior hasn’t changed after such change.

This simple investigation shows that handling VM errors is not so straightforward. Different VMs have different and not always predictable behavior. JRockit JRE 1.5 in case of StackOverflowError could skip number of finally blocks which leads to unexpected results. SUN JRE 1.5 and Harmony showed predictable behavior, but still it doesn’t guarantee that calls in finally blocks will be successful, such cases must be handled in specific way.

Wednesday, September 5, 2007

Apache Harmony scalability analysis

I’ve spend a while experimenting with Harmony scalability. There was a discussion on the dev mail list about Harmony’s Thread Manager quality and I’ve decided to make a simple analysis of current TM behavior. The idea was to compare Harmony behavior on multithreaded workloads with different contention level with other Java 1.5 implementations (Sun JRE 1.5 and JRockit JRE 1.5).

I started with a simple benchmark where several threads operate (trying to get or update generated random sequence of elements) on a single HashMap object. As standard HashMap implementation doesn’t provide synchronization mechanism I used Collections.synchronizedMap(Map m) to make synchronized HashMap. Of course this is not the best way of doing things in parallel, but such approach emulates behavior with very high contention level (actually only single thread could operated with the object at the moment independently on operation type). For second benchmark I used ConcurrentHashMap class which has internal synchronizations mechanisms. In this case we could vary contention level by changing operations type (read or update). In case of reads there are no conflicts and scalability should be ideal, in case of update contention level depends on class implementation (hashing quality, number of groups used, etc) and consequence of elements to be updated.

The following environment was used for the benchmarking:

Hardware: Dual processor Quad Core Xeon® 5355 (8 Cores total) 2.67 GHz, 4 Gb of RAM

Software: Windows Server 2003 OS, Harmony r571439, SUN JRE build 1.5.0_06-b05, JRockit JRE build R27.1.0-109-73164-1.5.0_08-20061129-1428-windows-ia32

Java execution options: all the benchmarks were run with the -server -Xms900m -Xmx900m options.

Synchronized HashMap

Results of benchmark for synchronized version of HashMap showed that SUN JRE has large overhead on thread management (see Chart 1 and Chart 2). But Harmony showed very good result. As you can see it has bigger initial overhead on thread management than JRockit (i.e. overhead on switching form single threaded execution to 2 threads), but after that it performs just fine, with almost no additional overhead, which is very good in case of using of large number of threads. Chart 1 and Chart 2 show similar pictures as actually there is no big difference in operation used due to synchronization model as I previously mentioned.

Chart 1: Synchronized HashMap: 100% reads

Chart 2: Synchronized HashMap: 100% updates

To check that the difference observed is due to VM implementation I’ve made additional experiment for SUN and JRockit with Harmony’s java.util package implementation (except Vector class since it introduces Internal Error in SUN’s VM). Charts 3 and 4 shows that usage of different implementation of the classes doesn’t affect the situation for synchronized HashMap with read operations (for updates there is exactly the same situation).

Chart 3: Synchronized HashMap: 100% reads

Chart 4: Synchronized HashMap: 100% reads

To find out hotspots of different implementation I’ve performed VTune sampling for synchronized HashMap get operations in 16 threads. Looking at the Table 1 and Table 2 we could make some assumptions about the behaviors. At first we could notice the large difference in number of instructions retired for different implementation, so we could guess that SUN JRE spend much more time waiting than JRockit and Harmony (an this was confirmed by CPU usage level, which was much lower for SUN). Another interesting observation is that JRockit spends more than half of time (and retires more than half of instruction) in Other32 module which is actually JITed code. So our second assumption is that JRockit has some synchronization mechanisms inside the JITed code which could be actually area for deeper analysis in Harmony.

Table 1: Synchronized HashMap clockticks brakedown

Time (%%)
JRockit JRE 1.5





Table 2: Synchronized HashMap instructions retired brakedown

Instructions retired
JRockit JRE 1.5






Experiments with ConcurrentHashMap showed that in case of reads Harmony works 20-30% slower than SUN and JRockit (see Chart 5). Replacement of java.util.concurrent (note that java.util.concurrent.locks was not replaced, cause it needs VM support) package in SUN and JRockit showed that classlib implementation is not the case (see Chart 6, Chart 7), usage of Harmony’s classes gave 10% benefit for SUN JRE and the same results for JRockit.

Chart 5: ConcurrentHashMap: 100% reads

Chart 6: ConcurrentHashMap: 100% reads

Chart 7: ConcurrentHashMap: 100% reads

Collecting sampling data for execution in 16 threads showed that 100% of time (and instructions retired) is spent in JITed code, thus we could guess that Harmony CG generates not optimal code in this case. This could be caused by lack of loop versioning optimization in Harmony JIT which is currently under implementation and this is definitely area for deeper investigation.

On Chart 8 for ConcurrentHashMap with updates operations you can see that situation is pretty similar. Harmony works 20% slower than SUN JRE and 10% slower than JRockit. Replacement of java.util.concurrent package showed that it could be due to ConcurrentHashMap implementation (see Charts 9 and 10).

Chart 8: ConcurrentHashMap: 100% updates Chart 9: ConcurrentHashMap: 100% updatesChart 10: ConcurrentHashMap: 100% updates

Deeper analysis using VTune sampling (for 16 threads) also showed that 100% of time (and instructions retired) was spent in JITed code, there is no additional synchronization overhead in VM.


The conducted experiments proved that Harmony synchronizations mechanism is mature enough, for high contention case it outperforms such well-known JREs as SUN and JRockit. In case of low contention there are some performance issues concerned to implementation of ConcurrentHashMap and JIT optimizations that need deeper investigation.