Ben J. Christensen

Software Development and Other Random Stuff

ARL Radix Sort

Very cool whitepaper on ARL Radix Sort showing how it’s better than Quicksort and normal Radix sort.

http://www.nik.no/2002/Maus.pdf

… (April 2nd) … never was able to find a working implementation to try out though …

Filed under: Code, Performance

High Performance Java Collections

I’ve been using Gnu Trove collections for years very happily, but someone just sent me a link to something new …

http://fastutil.dsi.unimi.it/

It looks quite interesting to go try out.

Another related project by the same guys is: http://mg4j.dsi.unimi.it/

Filed under: Code, Performance

32-bit versus 64-bit JDK Memory Usage

Summary of analysis performed September 2006.

Testing Strategy

To ensure the tests performed were reliable and behaved the same every time with as few variables as possible, a simple program (a single java class) was written which loops indefinitely, adding objects to a collection until it runs out of memory.

This test was performed with both Integer and String objects, and then executed with 3 different JVMs with varying heap settings.

The results and details of these tests are documented in later chapters.

JVMs tested:
- JDK 1.4.2 32-bit
- JDK 1.5 32-bit
- JDK 1.5 64-bit

Operating Systems Tested
- Solaris 10 on Opterons
- Suse Linux 10 on Opterons

The tests are far from exhaustive, but were enough to derive numbers and patterns whereby the JVMs can be sized accurately and recommendations made.

Findings

- Tests performed consistently on both Linux and Solaris (except for maximum heap sizes)
- JDK 5 64-bit takes between 40% – 50% more memory on average than either 32-bit JVM
- JDK 1.4 32-bit is slightly more memory efficient than JDK 5 32-bit
o It is slightly better than JDK 5 32-bit in Integer tests
o It is exactly the same as JDK 5 32-bit in String tests

- Solaris 10 can run a 32-bit JVM up to 3.5GB
- Suse Linux can run a 32-bit JVM up to 2GB
- A 64-bit JVM was successfully run at 20GB on Suse Linux
- A 2GB 32-bit JVM is approximately equivalent to a 3GB 64-bit JVM in number of objects stored
o ie. On Suse Linux, if more than 2GB heap is needed, then one must jump to a 3GB 64-bit JVM to achieve the same amount of storage
o ie. On Solaris, if more than 3.5GB heap is needed, then one must jump to a 5GB 64-bit JVM to achieve the same amount of storage

Recommendations

Based upon these findings, it is recommended that JDK 5 32-bit be used as the default JVM, and 64-bit used if more memory is needed than what can be allocated to a 32-bit JVM with the understanding of the ratio one must increase to account for the change to a 64-bit data model.

The reasons for choosing JDK 5 over JDK 1.4 are:

- JDK 5 is the current recommended production JVM from Sun Microsystems
o It has been out for more than 2 years and is at its 8th maintenance release (JDK 1.5_08)

- General performance improvements from JDK 1.4 to JDK 5
o http://java.sun.com/j2se/1.5.0/docs/guide/performance/speed.html
o http://java.sun.com/j2se/1.5.0/docs/guide/vm/index.html

- Improved garbage collection algorithms, particularly for multi-processor machines
o http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

- 64-bit JVM available on AMD Opterons when larger heap sizes required (not available for JDK 1.4)
o http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#platform_proc64

Summary

The tests performed confirmed that the 64-bit JVM does indeed use more memory than the 32-bit JVMs, but that it behaves consistently and can be calculated, and therefore planned for.
It is recommended to use JDK 5 32-bit as the default JVM, and the 64-bit variant when larger heap sizes are required.

Java Source File: MemoryTest.java

Filed under: Code, Performance

Sun x4200 vs Sun T1000 => inQuireServer & inQuireWeb

inQuireServer Tests

A test was done using JMeter using an inQuire Client accessing an inQuire server running on the same local machine.

This is done over JINI using JERI.

Configuration was:
- Catalog: AIDC
- Server Heap: 1GB
- Client Heap: 512MB
- Server JVM: 1.5
- Client JVM: 1.5
- inQuire Build: 4th Jan 07
- Client side caching turned on: Yes

The following image shows the results (times in ms):


First Test – DC, Keyword and Complex

The T1000 performs very poorly when the thread count is low.

Interestingly however, even though it’s gap with the X4200 does reduce, it never gets better, or even close to as good as the X4200.

And surprisingly, it crashes and burns when we try and hit 5000 threads — a level that is ridiculously slow on the X4200, but at least it runs.

Second Test – DC Only

The second round of tests showed an interesting result — that the T1000 performs MUCH better when IO intensive tasks are avoided — such as keyword search which hits the filesystem.

It’s still not as good as the X4200 at low thread counts, but much more reasonable — and does end up scaling better than the X4200.

However, in my opinion, the scale it shows is not nearly superior enough to the X4200 to sacrifice the performance at low or single thread counts, and especially not the lack of Filesystem IO.

inQuireWeb Tests

The next round of testing as against Tomcat running inQuireWeb backed by inQuireServer instances as already tested above.

Client side caching was enabled in inQuireWeb so as to reduce hits to the servers and test Tomcat specifically.

The following image shows how if Tomcat and the inQuireServer were on the same machine, the X4200 was superior, while if Tomcat was on the T1000, and inQuireServer on the X4200, then the T1000 came out on top when it started to scale.

So the conclusion from these tests are that the T1000 does indeed have certain niches where it can perform and scale better than the Opteron based X4200 — but they are hard to find, and if you don’t fit that exact niche, a huge performance hit will be incurred.

At this point, databases and the inQuireServer are out of the questions for the T1000 — relegating it to a glorifed webserver … able to host Apache and/or Tomat, as long as the heavy lifting of searches and IO are done elsewhere — such as on the X4200.

Ben

Filed under: Performance

Sun x4200 vs Sun T1000 => Load Server

Testing the Sun T1000 and Sun X4200 continues …

… with the databases loaded, I’m now starting the inQuire Server with various types of configuration involving indexing in MEMORY, FILESYSTEM or DATABASE.

I understand that these processes are still primarily single-threaded … but I’m so far not very impressed by the T1000.

If it takes 7+ times longer to initialize a process before I can even get to a point where parallel processing can occur, it’s questionable if it’s worth it.

And … the inQuire Keyword FILESYSTEM implementation is really slow and needs to be replaced.


Filed under: Performance

Sun x4200 vs Sun T1000 => Importing Content

Here are some numbers showing performance of our inQuire Importer against a MySQL 5.x database using our IT North American catalog with 500k+ products.

Basically it seems the T1000 is HORRIBLE for this type of application.

Of it’s 32 available CPU cores, generally only 1 of them is being used during these processes — when multiprocessing maybe 4 or 5 of them.

The X4200 basically ends up being 7x faster than the T1000.

Next tests are using Tomcat on the T1000 to try it’s multiprocessing.

Filed under: Performance

Web Service Performance with Netli/Akamai

Web Service tests performed against Netli and Akamai.

First column is origin, second is Netli, third is Akamai.

Each test executed the same request/response 25 times and gave the average.

It alternated between each of the 3 sources, doing 1 request at a time, so a total of 75 in a single test.

NOTE: Servers are located in Los Angeles — so the times for the client also being in Los Angeles are deceiving.

November Web Service Tests

Dec 5/6 Web Service Tests

Filed under: Performance

WebApp Performance Optimizations

Playing around with real-world performance optimizations found JSMin to work quite well … and very big gains by doing the following:

- combining multiple Javascript files into a single file
- using a JSMin servlet filter to minimize and cache in memory the Javascript file to return
- adding a filter to add HTTP Header caching to static files
- reducing file sizes for product images

Filed under: Performance, User Interface

Javascript optimization?

From what I can tell … the best bets are:

Use this to minimize file size: http://alex.dojotoolkit.org/shrinksafe/

Then enabled mod_deflate rather than mod_gzip

… make sure Apache is doing compression in memory, and not on the filesystem.

Then, actually change JS and CSS filenames each time they change so we can forcefully set very long cache refresh times so they never expire.

Filed under: Performance, User Interface

Javascript and CSS Optimization

Some good links on the subject …

http://yuiblog.com/blog/2006/10/16/pageweight-yui0114/

http://www.crockford.com/javascript/jsmin.html

http://dojotoolkit.org/docs/compressor_system.html

http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

http://www.die.net/musings/page_load_time/

Filed under: Performance, User Interface

Twitter Updates

View Ben Christensen's profile on LinkedIn
Follow

Get every new post delivered to your Inbox.