I'm giving another talk tomorrow at the SD West conference:
Here are the slides
Thousands of Threads and Blocking I/O: The Old Way to Write Java Servers Is New Again (and Way Better)
I've encountered some very strong misperceptions in the world that:
1) Java asynchronous NIO has higher throughput than Java IO (false)
It doesn't. It loses by 20-30%. Even with single thread against single thread. If multiple threads enter the equation (and multiple cores) which of course blocking I/O is intent on using - its skews even farther.
2) Thread context switching is expensive (false)
New threading libraries across the board make this negligble. I knew Linux NPTL was fast, but I was quite surprised how well Windows XP did (graphs inside notes).
3) Synchronization is expensive (false, usually)
It is possible for synchronization to be fully optimized away. In cases where it couldn't it did have a cost - however given we have multicore systems now its uncommon to write a fully singly-threaded server (synch or asynch), in other words every server design will pay this cost - but, non-blocking-data-structures ameliorate this cost significantly (again graphs inside show this).
4) Thread per connection servers cannot scale (false)
Thats incorrect at least up to an artificial limit set by JDK 1.6. 15k (or 30k depending on the JVM) threads is really no problem (note linux 2.6 with NPTL using C++ is fully happy with a few hundred-thousand threads running, Java sadly imposes an arbitrary limit). If you need more connections than this (and aren't maxing your CPU or bandwidth) - you can still use blocking IO but must depart from thread-per-connection. Or fall back to NIO.
I'll try to spruce up the benchmarks I used and try to post them. I'd like to point out that writing Java benchmarks is very hard. I spent a great deal of time making sure I warmed up the VM and insured there were no positional biases or other overzealous or skewing optimizations.
I was then *extremely* lucky to get help from Cliff Click of Azul systems (if you want to write a benchmark, a VM engineer is the right kind of person to get help from). He spent half a saturday tweaking my benchmark in ways I never thought of. Then ran them for me on his 768core Azul box (graph inside)!! thanks Cliff !