Ben J. Christensen

Software Development and Other Random Stuff

Technical Debt Quadrant

Martin Fowler wrote a blog entry on technical debt this week that communicates the concepts of “technical debt” and classifies them very well.

techDebtQuadrant

Some favorite portions:

“A mess is a reckless debt which results in crippling interest payments or a long period of paying down the principal.”

“The prudent debt to reach a release may not be worth paying down if the interest payments are sufficiently small – such as if it were in a rarely touched part of the code-base.”

“Not just is there a difference between prudent and reckless debt, there’s also a difference between deliberate and inadvertent debt. The prudent debt example is deliberate because the team knows they are taking on a debt, and thus puts some thought as to whether the payoff for an earlier release is greater than the costs of paying it off. A team ignorant of design practices is taking on its reckless debt without even realizing how much hock it’s getting into.”

“while you’re programming, you are learning. It’s often the case that it can take a year of programming on a project before you understand what the best design approach should have been. Perhaps one should plan projects to spend a year building a system that you throw away and rebuild, but that’s a tricky plan to sell. Instead what you find is that the moment you realize what the design should have been, you also realize that you have an inadvertent debt.”

Filed under: Architecture, Code, Management & Leadership

Mac OSX 10.6 Java – java.io.tmpdir

Migrating from OSX 10.5 to Snow Leopard, 10.6 caused all of my java projects using embedded MySQL MXJ to fail.

I found that this was because Java returns a very odd path for the “java.io.tmpdir” property:

/private/var/folders/b4/b44×97M0GFydt3jCKcowsU+++TI/-Tmp-/

This causes issues with MySQL MXJ as it converts the + signs into spaces so the path becomes:

/private/var/folders/b4/b44×97M0GFydt3jCKcowsU   TI/-Tmp-/

The original code was:

   private static File tmpDir = new File(System.getProperty(“java.io.tmpdir”));

To fix it I put in the following hack so that MySQL MXJ will now work again and I can still use the “java.io.tmpdir” property on other systems such as Windows:

   private static File ourAppDir;

   static {
      String tempPath = System.getProperty(“java.io.tmpdir”);
      // a fix to handle the crazy path the Mac JVM returns
      if (tempPath.startsWith(“/var/folders/”)) tempPath = “/tmp/”;
      ourAppDir = new File(tempPath);
   }

Filed under: Code

Initial Impressions on Ruby Performance

I’ve spent the past day playing with Ruby and decided to test some basic performance – iterating and string parsing – to get an idea of what the performance is really like.

Ruby is not doing so well in my tests.

Note to Ruby Experts: If anyone can demonstrate what I’m doing wrong in my code or testing, I would love to be corrected.

My approach was to write the exact same code in Java and Ruby that loads up a file, reads each lines, tokenizes it into words using whitespace as the delimiter and counts up the tokens.

This avoids network or database IO and other external resources – except the filesystem which I don’t consider a significant variable in the test.

Further testing I’m planning on doing will test Rails, multi-threading and other common things I do in my apps.

Picture 3

On my laptop, a MacBook Pro 2.53Ghz Core 2 Duo with 4GB memory, the average times are:

  • Ruby 1.8.6 app: 8022ms
  • Java 5 app: 2986ms
  • Java 6 app: 1443ms

Source Code

After downloading, strip the .doc ending off of the files.

Program Output

macbook-pro:src benjc$ /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java FileReadParse
Starting to read file…
The number of tokens is: 7764115
It took 1502 ms
macbook-pro:performance benjc$ ruby file_read_parse.rb
Starting to read file …
The number of tokens is: 7764115
It took 7999.955 ms

macbook-pro:src benjc$ /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java FileReadParse

Starting to read file…

The number of tokens is: 7764115

It took 1502 ms

macbook-pro:performance benjc$ ruby file_read_parse.rb

Starting to read file …

The number of tokens is: 7764115

It took 7999.955 ms


Filed under: Code, Performance

ConcurrentHashMap vs HashMap

A simple test of performance for ConcurrentHashMap.

The insert is only a little slower, but the retrieval is 3x slower in this quick and dirty single-threaded test.

 

TimeUtil timer = new TimeUtil();

        timer.start();

        HashMap map = new HashMap();

        for (int i = 0; i < 1000000; i++) {

            map.put(i, i);

        }

        timer.printCurrent();

        

        timer.start();

        ConcurrentHashMap cmap = new ConcurrentHashMap();

        for (int i = 0; i < 1000000; i++) {

            cmap.put(i, i);

        }

        timer.printCurrent();

        

 

        

        timer = new TimeUtil();

        timer.start();

        map = new HashMap();

        for (int i = 0; i < 1000000; i++) {

            map.get(i);

        }

        timer.printCurrent();

        

        timer.start();

        cmap = new ConcurrentHashMap();

        for (int i = 0; i < 1000000; i++) {

            cmap.get(i);

        }

        timer.printCurrent();

 

Action completed in: 458 ms 0.458 seconds

Action completed in: 574 ms 0.574 seconds

Action completed in: 30 ms 0.03 seconds

Action completed in: 113 ms 0.113 seconds

Filed under: Code, Performance

Complex and Simple XML Types

Excellent article which clearly explains how they relate to each other:

http://www.xml.com/pub/a/2001/08/22/easyschema.html

Filed under: Code

MySQL JDBC Memory Usage on Large ResultSet

I recently came across the problem of large resultsets being pulled into a java app via MySQL JDBC. I had dealt with this years ago but forgotten about it.

The test case below shows how the entire ResultSet is buffered in memory by default — which can be a “very bad thing” when dealing with hundreds or thousands of megabytes of data when it’s intended to be processed row by row.

Using mysql-connector-java-3.1.12-bin.jar and a JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Using mysql-connector-java-5.1.6-bin.jar and the same JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Thus we see that both the old and new versions of the MySQL JDBC driver by default attempt to load the entire resultset into memory.

 

I now increase the heap to 1GB to allow it to grow and find that the test query uses up > 500MB of heap before it even starts the rs.next() loop.

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 298  Used: 183  Free: 115

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 527  Used: 381  Free: 146

Starting to retrieve data. Memory Used: 517

Done retrieving data => 2318284   Memory Used: 551

 

Here is the code for this:

 

            ResultSet rs = conn.createStatement().executeQuery(“<sql query that returns lots of data>”);

            System.out.println(“Starting to retrieve data. Memory Used: “ + getUsedMemory());

            while (rs.next()) {

                rs.getString(1);

                rowsReturned++;

            }

            System.out.println(“Done retrieving data => “ + rowsReturned + ”   Memory Used: “  

+ getUsedMemory());

 

Thus you can see that the “executeQuery()” method loads up 500MB of data before it passes on the “rs.next()” loop. The full ResultSet is being buffered in memory.

 

Solution

To make the JDBC driver stream the results instead of buffer them all first we do the following:

            stmt.setFetchSize(Integer.MIN_VALUE);

Then we get this result instead:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Starting to retrieve data. Memory Used: 2

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 2  Free: 31

Done retrieving data => 2318284   Memory Used: 2

 

Now it behaves like we expect it to … only 2MB used instead of > 500MB.

 

There are some caveats:

  • http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
  • http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
In the second link of official documentation we read (emphasis in red added by myself):
———————————————————————-

ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, you need to create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.

There are some caveats with this approach. You will have to read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.

The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.

If the statement is within scope of a transaction, then locks are released when the transaction completes (which implies that the statement needs to complete first). As with most other databases, statements are not complete until all the results pending on the statement are read or the active result set for the statement is closed.

Therefore, if using streaming results, you should process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.

 

 

Filed under: Code, Performance

Java Memory Usage – Ints

As I work on another set of indexes with large numbers of ints I’ve revisited how different storage mechanisms behave.

I’ve set the JVM to just over 64MB (70MB to be precise).

An int being 4 bytes means that 64MB of ints is as follows:

       int sixtyFourMB = 64 * 1024 * 1024 / 4;

That is 16,777,216 ints.

An int array can successfully store 64MB in the JVM meaning it is properly assigning each int to 4 bytes without overhead.

      int[] vals = new int[sixtyFourMB];

However, if you use ArrayList<Integer> you run out of memory after 3,392,918 ints being added.

That is 1/5 of what can be stored in the int[]!

Now, assigning the space to ArrayList is fine:

                ArrayList<Integer> ints = new ArrayList<Integer>(sixtyFourMB);

It’s when you add the Integer objects (int is cast to Integer behind the scenes in JDK 5) that it dies 1/5 in.

So the overhead is obviously in the Integer object.

I next try the Trove library and use the TIntArrayList.

If I leave it to assign its memory space by default, it runs out of memory at 10,485,760 (62%).

If however I tell it how many ints to expect it sizes itself properly and does not try to grow too large and fits the entire 64MB.

     TIntArrayList ints = new TIntArrayList(sixtyFourMB);

Thus, when backed by an int[] the memory can be efficiently stored, but when using an Object[] memory is very inefficient when attempting to store primitive ints.

Out of curiosity for what an Object() really does with memory I did the following:

            ArrayList<Object> ints = new ArrayList<Object>(sixtyFourMB);

            for (i = 0; i < sixtyFourMB; i++) {

                ints.add(null);

            }

That works, it all fits in memory.

If however I change the “null” to “new Object()” such as:

                ints.add(new Object());

It fails at 702065 … which means that the object[] has already taken up the space and there is no room for the objects on the heap.

Thus, for storage of simple data primitive arrays is MUCH more efficient.

If the flexibility of the Collections API is needed for dynamic manipulation of those arrays then use GNU Trove … but carefully or the dynamic sizing of the arrays in the background can cause a lot of waste.

 

An old but interesting article specifies memory usage by normal objects and classes:

http://www.roseindia.net/javatutorials/determining_memory_usage_in_java.shtml

Here are two points from that article of note:

  1. The class takes up at least 8 bytes. So, if you say new Object(); you will allocate 8 bytes on the heap.
  2. Each data member takes up 4 bytes, except for long and double which take up 8 bytes. Even if the data member is a byte, it will still take up 4 bytes! In addition, the amount of memory used is increased in 8 byte blocks. So, if you have a class that contains one byte it will take up 8 bytes for the class and 8 bytes for the data, totalling 16 bytes (groan!).

Filed under: Code, Performance

How to strip invalid XML characters

http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html   

  /**
     * This method ensures that the output String has only
     * valid XML unicode characters as specified by the
     * XML 1.0 standard. For reference, please see
     * <a href=”http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char”>the
     * standard</a>. This method will return an empty
     * String if the input is null or empty.
     *
     * @param in The String whose non-valid characters we want to remove.
     * @return The in String, stripped of non-valid characters.
     */
    public String stripNonValidXMLCharacters(String in) {
        StringBuffer out = new StringBuffer(); // Used to hold the output.
        char current; // Used to reference the current character.

        if (in == null || (“”.equals(in))) return “”; // vacancy test.
        for (int i = 0; i < in.length(); i++) {
            current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
            if ((current == 0×9) ||
                (current == 0xA) ||
                (current == 0xD) ||
                ((current >= 0×20) && (current <= 0xD7FF)) ||
                ((current >= 0xE000) && (current <= 0xFFFD)) ||
                ((current >= 0×10000) && (current <= 0×10FFFF)))
                out.append(current);
        }
        return out.toString();

    }     

Filed under: Code

Websphere Multi-JVM jsessionid

Works just like it should :-)

IBM Support Link

Two JVMs with different contexts but the same domain now use the same jsessionid so they can talk back and forth in the same browser without jsessionid schizophrenia.

Filed under: Architecture, Code, Production

C3P0 Configuration

http://www.hibernate.org/214.html

testConnectionOnCheckout Must be set in c3p0.properties, C3P0 default: false

Don’t use it, this feature is very expensive. If set to true, an operation will be performed at every connection checkout to verify that the connection is valid. A better choice is to verify connections periodically using c3p0.idleConnectionTestPeriod.

Filed under: Code

Twitter Updates

View Ben Christensen's profile on LinkedIn