Ben J. Christensen

Software Development and Other Random Stuff

Statistics for Why Web Performance Matters

Following are some good posts with further evidence of the impact of webpage/webapp responsiveness to the user experience, amount of usage and ultimately revenue.

Good introductory quote to the external links and images shown below:

There’s no longer any debate. There’s reliable, reproducible evidence that web page latency is directly tied to the bottom line. At Velocity, Microsoft, Google and Shopzilla made this abundantly clear in a series of awesome presentations: detailed, controlled testing proves that slower pages hurt the bottom line. In Google’s case, adding delay reduces the average number of searches a visitor does each day even after the delay is removed.

Regarding “smaller scale” sites more typical than Google and Bing:

The results of their analysis show how significant a reduction in page latency can be. In addition to reducing bounce rates, and increasing pages per visit & time on site, they found a 16.07% increase in conversion rates and a 5.50% increase in average order value.http://radar.oreilly.com/2009/10/watching-websites.html

External Links:

http://radar.oreilly.com/2009/10/watching-websites.html
http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
http://www.watchingwebsites.com/archives/proof-that-speeding-up-websites-improves-online-business

Good summary PDF showing metrics of performance impact.

conversion-rate-and-order-value

bing-delayimpact

Filed under: Performance, Production, User Interface

Initial Impressions on Ruby Performance

I’ve spent the past day playing with Ruby and decided to test some basic performance – iterating and string parsing – to get an idea of what the performance is really like.

Ruby is not doing so well in my tests.

Note to Ruby Experts: If anyone can demonstrate what I’m doing wrong in my code or testing, I would love to be corrected.

My approach was to write the exact same code in Java and Ruby that loads up a file, reads each lines, tokenizes it into words using whitespace as the delimiter and counts up the tokens.

This avoids network or database IO and other external resources – except the filesystem which I don’t consider a significant variable in the test.

Further testing I’m planning on doing will test Rails, multi-threading and other common things I do in my apps.

Picture 3

On my laptop, a MacBook Pro 2.53Ghz Core 2 Duo with 4GB memory, the average times are:

  • Ruby 1.8.6 app: 8022ms
  • Java 5 app: 2986ms
  • Java 6 app: 1443ms

Source Code

After downloading, strip the .doc ending off of the files.

Program Output

macbook-pro:src benjc$ /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java FileReadParse
Starting to read file…
The number of tokens is: 7764115
It took 1502 ms
macbook-pro:performance benjc$ ruby file_read_parse.rb
Starting to read file …
The number of tokens is: 7764115
It took 7999.955 ms

macbook-pro:src benjc$ /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java FileReadParse

Starting to read file…

The number of tokens is: 7764115

It took 1502 ms

macbook-pro:performance benjc$ ruby file_read_parse.rb

Starting to read file …

The number of tokens is: 7764115

It took 7999.955 ms


Filed under: Code, Performance

Speed of Thought

I’ve focused on performance for several years in my server-side and web application development – as much as I’ve been able to fit into the timelines. It has involved digging into minute details of Java and JVM tuning that rarely get explored by most java developers (from what I can tell anyways) and focusing on tuning the CSS, images, caching, GZIP and other settings of the front-end. It has generally paid off. Today my team operates servers processing millions of complex, dynamic, uncacheable web service transactions completing on average in around 250ms each (server side, not including network transport to client). I believe with further investment we could improve that even more.

I have read comments from companies such as Google and Amazon how the performance of an application can dramatically affect how much people use it. I agree. The slightest friction in searching makes me search less, or shop less, etc.

This past week I’ve been using the new iPhone 3GS which is at least 2x faster than the previous iPhone 2G I had. In some cases it’s 4x and 6x faster.

I already used the iPhone a lot. The increase in speed has further reduced the “friction” of use to the point that if I even have a thought of quickly looking something up or performing some other action, I am much more likely to do it.

On my last iPhone, I consciously chose to not bother at certain times because of the time it would take. Yes, I’m talking in seconds and even milliseconds here — but when it’s a “thought”, if the tool doesn’t work at the same speed, then it’s friction. Same goes for another application I use which involves looking up reference materials and documents. Before I kind of had to avoid “flipping around’ while someone was referring to things. It was actually faster to use the paper documents. Now, I can keep up or be faster with my iPhone than the paper version ‘users’. Therefore it encourages use.

The new user experience of using the iPhone 3GS, so significantly improved just by the performance improvement, has reminded me as a developer and architect how critical it is to design, plan for and develop to achieve high performance. Functionality isn’t enough — we should be aiming for the “speed of thought”.

Interestingly, Google has just launched a new site just for “speeding up the web“.

The following video shows “the experts” talking about how the human mind perceives changes of 100ms (one tenth of a second).

It’s my belief that this isn’t just a “nice to have” feature. If a product, service or application wants to be adopted and deemed “necessary” by its users, its performance must reduce friction as much as technically feasible to the point where it approaches or achieves “speed of thought”.

Filed under: Architecture, Performance, Production, User Interface

Java JDK 1.5 vs 1.6 Performance

In a benchmarking of our server software I found about a 34% speed improvement (if my math is correct) on using JDK 1.6.0_12 over JDK 1.5.0_17.

picture-8

Filed under: Performance

ConcurrentHashMap vs HashMap

A simple test of performance for ConcurrentHashMap.

The insert is only a little slower, but the retrieval is 3x slower in this quick and dirty single-threaded test.

 

TimeUtil timer = new TimeUtil();

        timer.start();

        HashMap map = new HashMap();

        for (int i = 0; i < 1000000; i++) {

            map.put(i, i);

        }

        timer.printCurrent();

        

        timer.start();

        ConcurrentHashMap cmap = new ConcurrentHashMap();

        for (int i = 0; i < 1000000; i++) {

            cmap.put(i, i);

        }

        timer.printCurrent();

        

 

        

        timer = new TimeUtil();

        timer.start();

        map = new HashMap();

        for (int i = 0; i < 1000000; i++) {

            map.get(i);

        }

        timer.printCurrent();

        

        timer.start();

        cmap = new ConcurrentHashMap();

        for (int i = 0; i < 1000000; i++) {

            cmap.get(i);

        }

        timer.printCurrent();

 

Action completed in: 458 ms 0.458 seconds

Action completed in: 574 ms 0.574 seconds

Action completed in: 30 ms 0.03 seconds

Action completed in: 113 ms 0.113 seconds

Filed under: Code, Performance

MySQL JDBC Memory Usage on Large ResultSet

I recently came across the problem of large resultsets being pulled into a java app via MySQL JDBC. I had dealt with this years ago but forgotten about it.

The test case below shows how the entire ResultSet is buffered in memory by default — which can be a “very bad thing” when dealing with hundreds or thousands of megabytes of data when it’s intended to be processed row by row.

Using mysql-connector-java-3.1.12-bin.jar and a JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Using mysql-connector-java-5.1.6-bin.jar and the same JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Thus we see that both the old and new versions of the MySQL JDBC driver by default attempt to load the entire resultset into memory.

 

I now increase the heap to 1GB to allow it to grow and find that the test query uses up > 500MB of heap before it even starts the rs.next() loop.

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 298  Used: 183  Free: 115

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 527  Used: 381  Free: 146

Starting to retrieve data. Memory Used: 517

Done retrieving data => 2318284   Memory Used: 551

 

Here is the code for this:

 

            ResultSet rs = conn.createStatement().executeQuery(“<sql query that returns lots of data>”);

            System.out.println(“Starting to retrieve data. Memory Used: “ + getUsedMemory());

            while (rs.next()) {

                rs.getString(1);

                rowsReturned++;

            }

            System.out.println(“Done retrieving data => “ + rowsReturned + ”   Memory Used: “  

+ getUsedMemory());

 

Thus you can see that the “executeQuery()” method loads up 500MB of data before it passes on the “rs.next()” loop. The full ResultSet is being buffered in memory.

 

Solution

To make the JDBC driver stream the results instead of buffer them all first we do the following:

            stmt.setFetchSize(Integer.MIN_VALUE);

Then we get this result instead:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Starting to retrieve data. Memory Used: 2

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 2  Free: 31

Done retrieving data => 2318284   Memory Used: 2

 

Now it behaves like we expect it to … only 2MB used instead of > 500MB.

 

There are some caveats:

  • http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
  • http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
In the second link of official documentation we read (emphasis in red added by myself):
———————————————————————-

ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, you need to create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.

There are some caveats with this approach. You will have to read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.

The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.

If the statement is within scope of a transaction, then locks are released when the transaction completes (which implies that the statement needs to complete first). As with most other databases, statements are not complete until all the results pending on the statement are read or the active result set for the statement is closed.

Therefore, if using streaming results, you should process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.

 

 

Filed under: Code, Performance

Java Memory Usage – Ints

As I work on another set of indexes with large numbers of ints I’ve revisited how different storage mechanisms behave.

I’ve set the JVM to just over 64MB (70MB to be precise).

An int being 4 bytes means that 64MB of ints is as follows:

       int sixtyFourMB = 64 * 1024 * 1024 / 4;

That is 16,777,216 ints.

An int array can successfully store 64MB in the JVM meaning it is properly assigning each int to 4 bytes without overhead.

      int[] vals = new int[sixtyFourMB];

However, if you use ArrayList<Integer> you run out of memory after 3,392,918 ints being added.

That is 1/5 of what can be stored in the int[]!

Now, assigning the space to ArrayList is fine:

                ArrayList<Integer> ints = new ArrayList<Integer>(sixtyFourMB);

It’s when you add the Integer objects (int is cast to Integer behind the scenes in JDK 5) that it dies 1/5 in.

So the overhead is obviously in the Integer object.

I next try the Trove library and use the TIntArrayList.

If I leave it to assign its memory space by default, it runs out of memory at 10,485,760 (62%).

If however I tell it how many ints to expect it sizes itself properly and does not try to grow too large and fits the entire 64MB.

     TIntArrayList ints = new TIntArrayList(sixtyFourMB);

Thus, when backed by an int[] the memory can be efficiently stored, but when using an Object[] memory is very inefficient when attempting to store primitive ints.

Out of curiosity for what an Object() really does with memory I did the following:

            ArrayList<Object> ints = new ArrayList<Object>(sixtyFourMB);

            for (i = 0; i < sixtyFourMB; i++) {

                ints.add(null);

            }

That works, it all fits in memory.

If however I change the “null” to “new Object()” such as:

                ints.add(new Object());

It fails at 702065 … which means that the object[] has already taken up the space and there is no room for the objects on the heap.

Thus, for storage of simple data primitive arrays is MUCH more efficient.

If the flexibility of the Collections API is needed for dynamic manipulation of those arrays then use GNU Trove … but carefully or the dynamic sizing of the arrays in the background can cause a lot of waste.

 

An old but interesting article specifies memory usage by normal objects and classes:

http://www.roseindia.net/javatutorials/determining_memory_usage_in_java.shtml

Here are two points from that article of note:

  1. The class takes up at least 8 bytes. So, if you say new Object(); you will allocate 8 bytes on the heap.
  2. Each data member takes up 4 bytes, except for long and double which take up 8 bytes. Even if the data member is a byte, it will still take up 4 bytes! In addition, the amount of memory used is increased in 8 byte blocks. So, if you have a class that contains one byte it will take up 8 bytes for the class and 8 bytes for the data, totalling 16 bytes (groan!).

Filed under: Code, Performance

Sort Performance

We did some simple testing on various sort algorithms we found and compared them with the Java Arrays.sort QuickSort implementation.

This is rudimentary testing since it’s not attempting to test all the variety of distributions of numbers that could exist and should be tested.

That being said, we are creating a collection of 50k, 500k, or 5million numbers randomly generated, and then cloning that collection to be sorted by each of the algorithms.

This first test had 1/2 the numbers as 4 digits, the other half as 5 digits meaning that in the larger collects, there should be a lot of duplicates.

The second test below had 1/2 the numbers as 4 digits, the other half as 8 digits.

This change in number of digits makes dramatic changes for FlashSort and completely breaks it.

Radix is just bad all the time … I have yet to figure out where it’s strength lies.

This is not thorough enough to make any conclusion, but I found it interesting nonetheless.

***Test 1 with 4 & 5 digit numbers***

—————————–
Run with 50k
—————————–
Arrays.sort(int) => 15ms
FlashSort.sort(int) => 11ms Valid: true
fastqs.sort(int) => 10ms Valid: true
eqsort.sort(int) => 16ms Valid: true
radix.sort(int) => 175ms Valid: true
radix.sort(int, 5) => 130ms Valid: true
—————————–
Run with 500k
—————————–
Arrays.sort(int) => 101ms
FlashSort.sort(int) => 56ms Valid: true
fastqs.sort(int) => 72ms Valid: true
eqsort.sort(int) => 96ms Valid: true
radix.sort(int) => 2164ms Valid: true
radix.sort(int, 5) => 1203ms Valid: true
—————————–
Run with 5million
—————————–
Arrays.sort(int) => 748ms
FlashSort.sort(int) => 717ms Valid: true
fastqs.sort(int) => 780ms Valid: true
eqsort.sort(int) => 1625ms Valid: true
radix.sort(int) => 82218ms Valid: true
radix.sort(int, 5) => 11387ms Valid: true
—————————–

***Test 1 with 4 & 8 digit numbers***

—————————–
Run with 50k
—————————–
Arrays.sort(int) => 14ms
FlashSort.sort(int) => 662ms Valid: true
fastqs.sort(int) => 10ms Valid: true
eqsort.sort(int) => 11ms Valid: true
radix.sort(int) => 294ms Valid: true
radix.sort(int, 8 ) => 225ms Valid: true
—————————–
Run with 500k
—————————–
Arrays.sort(int) => 91ms
FlashSort.sort(int) => 17832ms Valid: true
fastqs.sort(int) => 72ms Valid: true
eqsort.sort(int) => 95ms Valid: true
radix.sort(int) => 2066ms Valid: true
radix.sort(int, 8 ) => 2003ms Valid: true
—————————–
Run with 5million
—————————–
Arrays.sort(int) => 909ms
FlashSort.sort(int) => 197796ms Valid: true
fastqs.sort(int) => 825ms Valid: true
eqsort.sort(int) => 1494ms Valid: true
radix.sort(int) => 22957ms Valid: true
radix.sort(int, 8 ) => 20041ms Valid: true
—————————–

Filed under: Code, Performance

ARL Radix Sort

Very cool whitepaper on ARL Radix Sort showing how it’s better than Quicksort and normal Radix sort.

http://www.nik.no/2002/Maus.pdf

… (April 2nd) … never was able to find a working implementation to try out though …

Filed under: Code, Performance

High Performance Java Collections

I’ve been using Gnu Trove collections for years very happily, but someone just sent me a link to something new …

http://fastutil.dsi.unimi.it/

It looks quite interesting to go try out.

Another related project by the same guys is: http://mg4j.dsi.unimi.it/

Filed under: Code, Performance

Twitter Updates

  • I *really* wish iBooks and Kindle would let me copy/paste text so I can quote a sentence or paragraph! Ridiculous that I can't. 22 hours ago
  • Great weekend (and ‘food tourism’) in Los Angeles. "Sooo fun!" as described by a certain short person when asked on the drive home. 23 hours ago
  • Small world! Just ran into a colleague from work - a 7hr drive from the office! 1 day ago
  • We made it to LA! Feels like returning home :-) 3 days ago
  • Peter Pan Baby http://twitpic.com/2kg6n4 5 days ago
View Ben Christensen's profile on LinkedIn