Ben J. Christensen

Software Development and Other Random Stuff

MySQL JDBC Memory Usage on Large ResultSet

I recently came across the problem of large resultsets being pulled into a java app via MySQL JDBC. I had dealt with this years ago but forgotten about it.

The test case below shows how the entire ResultSet is buffered in memory by default — which can be a “very bad thing” when dealing with hundreds or thousands of megabytes of data when it’s intended to be processed row by row.

Using mysql-connector-java-3.1.12-bin.jar and a JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Using mysql-connector-java-5.1.6-bin.jar and the same JDK 5 with 32MB heap:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Ran out of memory at row: 0

java.lang.OutOfMemoryError: Java heap space

at com.mysql.jdbc.ByteArrayBuffer.getBytes(ByteArrayBuffer.java:128)

at com.mysql.jdbc.ByteArrayBuffer.readLenByteArray(ByteArrayBuffer.java:248)

at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1304)

at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2272)

at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:423)

at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1962)

at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1385)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1728)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2988)

at com.mysql.jdbc.Connection.execSQL(Connection.java:2917)

at com.mysql.jdbc.Statement.executeQuery(Statement.java:824)

at JDBCTest.main(JDBCTest.java:26)

 

Thus we see that both the old and new versions of the MySQL JDBC driver by default attempt to load the entire resultset into memory.

 

I now increase the heap to 1GB to allow it to grow and find that the test query uses up > 500MB of heap before it even starts the rs.next() loop.

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 298  Used: 183  Free: 115

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 527  Used: 381  Free: 146

Starting to retrieve data. Memory Used: 517

Done retrieving data => 2318284   Memory Used: 551

 

Here is the code for this:

 

            ResultSet rs = conn.createStatement().executeQuery(“<sql query that returns lots of data>”);

            System.out.println(“Starting to retrieve data. Memory Used: “ + getUsedMemory());

            while (rs.next()) {

                rs.getString(1);

                rowsReturned++;

            }

            System.out.println(“Done retrieving data => “ + rowsReturned + ”   Memory Used: “  

+ getUsedMemory());

 

Thus you can see that the “executeQuery()” method loads up 500MB of data before it passes on the “rs.next()” loop. The full ResultSet is being buffered in memory.

 

Solution

To make the JDBC driver stream the results instead of buffer them all first we do the following:

            stmt.setFetchSize(Integer.MIN_VALUE);

Then we get this result instead:

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

Retrieving data …

Starting to retrieve data. Memory Used: 2

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 1  Free: 32

ET-COMMONS INFO: JVM MEMORY MONITOR => Total: 33  Used: 2  Free: 31

Done retrieving data => 2318284   Memory Used: 2

 

Now it behaves like we expect it to … only 2MB used instead of > 500MB.

 

There are some caveats:

  • http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
  • http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
In the second link of official documentation we read (emphasis in red added by myself):
———————————————————————-

ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, you need to create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.

There are some caveats with this approach. You will have to read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.

The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.

If the statement is within scope of a transaction, then locks are released when the transaction completes (which implies that the statement needs to complete first). As with most other databases, statements are not complete until all the results pending on the statement are read or the active result set for the statement is closed.

Therefore, if using streaming results, you should process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.

 

 

Filed under: Code, Performance

31 Responses

  1. TT says:

    Hi, thanks for your blog post. It was useful for me. I was wondering why the hell my code was throwing an out of memory exception when the javadoc for resultset explicitly says that it uses a cursor.

  2. Ovi says:

    Thanks for the info!

  3. Ben Christensen says:

    Another note … if you are writing code that is to be used by different databases, you’ll have to put the MySQL “fix” in a conditional block otherwise it will cause the code to blow up on other databases such as Oracle.

  4. Nicole says:

    Hi,

    Thanks for this info. I have a question, though.

    Do you mean that such a limit on the fetch size cannot be done for Oracle? What can we do, then, to avoid the out of memory error?
    Or, do you mean that, for Oracle, ResultSets are NOT completely retrieved and stored in memory, and so the fix is not needed?

    Thanks in advance,
    Nicole

  5. Ben Christensen says:

    I have spent more time working with MySQL than Oracle, but in my research and experience with the Oracle JDBC drivee, this fix is not needed because Oracle seems to stream the results by default and not load the entire resultset like MySQL does.

    I can not say with absolute certainty that this is the case since I have not played with Oracle in the same way, but I have not had the issue with Oracle like I did with MySQL.

  6. SunnyFr says:

    Hi guys,

    I’ve got the same problem with MySql so I added the parameter BatchSize=-1 in my dataconfig to pass it to the connector jdbc.
    -1 for Integer.MIN_VALUE. How can I know what is the amount of this value, because if it’s row by row, it should be one?? And I tried to put 1 or either 10 or 100, how come even 1 drive me Out Of Memory??
    I’ve definitely a lot of data 8,5M rows bring back, but I just don’t get why when I change -1 or Integer.MIN_VALUE by one it doesn’t work? I’m using mysql-connector-java-5.1.6-bin.jar either.

    Thanks to let me know :)
    Wish you guys a very nice day,

    I’m using

  7. Ben Christensen says:

    SunnyFr,

    It’s not BatchSize, but FetchSize you need to set.

    stmt.setFetchSize(Integer.MIN_VALUE);

    Ben

  8. Zunil says:

    But from 5.0.2 onwards mysql supports server-side cursor by setting the property • useCursorFetch=true and defaultFetchSize.
    This means that all rows wil not be pushed to client . Only defaultFetchSize number of rows will be send to client at a time.

    In that case why we need this??

  9. [...] MySQL JDBC Memory Usage on Large ResultSet (tags: jdbc java mysql) Posted in Personal | Leave a Comment [...]

  10. Mark Matthews says:

    Zunil,

    Because cursor fetching is experimental (on the server side), and it *always* fully materializes your query into a temporary table (often not what you want).

  11. thanks a lot for this useful info I was searching exactly the same…
    well I do have another question here… if suppose my query fetches very large data say 10,000 recs and as per your note resultset will try to dump entire data in memory and will give out of memory error…
    if this is the case then how does database manages to fetch that many record in cursor/query result which returned to application why don’t it fail…?
    how they do it?…

    Thank you in advance for such an important note.

    • Ben Christensen says:

      I don’t know MySQL’s internals, but my understanding is that it either streams it or caches the entire resultset in a temporary table — not memory.

      The JDBC driver on the other hand will by default cache the entire resultset in memory before turning it over to the application. In small resultsets this default behavior makes sense — but in large results it does not, hence the ability to change the default behavior to stream.

      In my original post I linked to the official documentation at: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

      It states:

      “By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.”

      This to me confirms that it is purely a JDBC design to cache the whole resultset, whereas the MySQL server will use one of it’s many mechanisms for handling a resultset which can include in memory temporary tables, or if needed it will use temporary tables on the filesystem.

      Ben

  12. umar says:

    Hello Ben,

    am still confused about this isue(Java heap space),
    i have more then 1 milion record (MySQL), i use jdbc driver 5.1.7, when i try retrieve all, showing error:
    java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1624)
    at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1401)
    at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)
    at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:468)
    at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2534)
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2159)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1912)
    at com.dao.TestDaoImpl.getAllData(TestDaoImpl.java:1225)
    at com.test.source.TestGetLotData.testGetAll(TestGetLotData.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at org.springframework.test.ConditionalTestCase.runBare(ConditionalTestCase.java:76)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:232)
    at junit.framework.TestSuite.run(TestSuite.java:227)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

    this my code:
    PreparedStatement st = con.prepareStatement(“select * from table”);
    ResultSet rs = st.executeQuery();
    while (rs.next()) {
    Data rd = new RealData();
    rd.setUserName(rs.getString(“USERNAME”));
    rd.setPassword(rs.getString(“PASSWORD”));
    list.add(rd);
    }
    st.close();

    And i try change “PreparedStatement” to “Statement” :
    Statement st = con.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
    java.sql.ResultSet.CONCUR_READ_ONLY);
    st.setFetchSize(Integer.MIN_VALUE);
    ResultSet rs = st.executeQuery(“select * from table”);

    result same error:
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Unknown Source)
    at java.lang.StringValue.from(Unknown Source)
    at java.lang.String.(Unknown Source)
    at com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:327)
    at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:787)
    at com.mysql.jdbc.BufferRow.getString(BufferRow.java:539)
    at com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5614)
    at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5491)
    at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5531)
    at com.dao.TestDaoImpl.getAllData(TestDaoImpl.java:1234)
    at com.test.source.TestGetLotData.testGetAll(TestGetLotData.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at org.springframework.test.ConditionalTestCase.runBare(ConditionalTestCase.java:76)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:232)
    at junit.framework.TestSuite.run(TestSuite.java:227)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

    thx for advice

    • Ben Christensen says:

      Making MySQL stream the results instead of loading into memory is only half the problem.

      The other issue is that you are adding every row you retrieve into a list — so it’s still ending up in memory.

      list.add(rd);

      If you plan on storing all of the RealData() objects in memory, then you’ll need to increase your memory and recognize that your system will only scale to a certain point.

      If you need to be able to handle scaling to larger list sizes, or can’t increase your memory, you’ll need to change your design to act on each data element as needed without storing the entire list in memory.

      • umar says:

        Hello again Ben,

        Thank you for the advice, that’s work if i remove object and throw away list.add(rd).
        I decided not to show data’s, but I print directly to a file for efficiency,
        Another question, i have to compare every record of 3 tables and each table have more than 1 million data, with my query finish more then 2 hour, how to optimized time comparison(example, finish in less 1 hour :) )??

        Thax for advice

  13. Ben Christensen says:

    Umar,

    Glad it helped.

    On the new question, can you please provide:

    - schema for the 3 tables
    - a row of sample data for each table
    - what logic you’re trying to perform

    In other words, give me everything I need to replicate what you’re trying to accomplish from a business perspective. What is the end result you need to accomplish?

    • umar says:

      Ben,
      Thanks for your help.
      I want to display records that have a different value on field, for the purpose of data synchronization.

      Ok, this schema represent other tables
      CREATE TABLE `table_a` (
      `ID` int(11) NOT NULL auto_increment,
      `USERNAME` varchar(30) default NULL,
      `PASSWORD` varchar(15) default NULL,
      `STATUS` varchar(10) default NULL,
      `IPADDRESS` varchar(15) default NULL,
      `PHONE` varchar(20) default NULL,
      `PORT` varchar(70) default NULL,
      `IDCARD` varchar(15) default NULL,
      PRIMARY KEY (`ID`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

      Sample data:
      INSERT INTO `table_a` (`ID`,`USERNAME`,`PASSWORD`,`STATUS`,`IPADDRESS`,`PHONE`,`PORT`,`IDCARD`) VALUES (1,’asdfgh@google.com’,'blabla’,'enable’,'192.168.1.2′,’098278379837′,’1234′,’jh3hhi138908′);

      this a sample query:
      select A.USERNAME, A.STATUS ‘STATUS_A’, B.STATUS ‘STATUS_B’, C.STATUS ‘STATUS_C’, A.IPADDRESS ‘IPADDRESS_A’, B.IPADDRESS ‘IPADDRESS_B’, C.IPADDRESS ‘IPADDRESS_C’
      from table_a A inner join table_b B inner join table_c C
      on (A.STATUS B.STATUS or A.STATUS C.STATUS or B.STATUS C.STATUS OR A.IPADDRESS B.IPADDRESS or A.IPADDRESS C.IPADDRESS or B.IPADDRESS C.IPADDRESS )
      and (A.USERNAME = B.USERNAME and A.USERNAME = C.USERNAME and B.USERNAME = C.USERNAME)

      Thanks for your advice.

      • Ben Christensen says:

        What indexes do you have on your tables?

        To confirm, all 3 tables are the same schema, and you’re just trying to find all rows that are the same between the 3 tables?

      • umar says:

        I use ID and USERNAME as an index, cause USERNAME is unique too.
        Yupz, 3 table are the same schema, i’m trying to find all record that have different value such a STATUS and IPADDRESS.

        Sorry, my english not good :)

      • umar says:

        Ben,
        Finally, i found my mistake, when i create index for all field (coz all field will be use by another query) result less then 1 minute on 1 million records.

        That’s amazing using index.
        Thx a lot Ben, for direct me about index.

        Umar

      • Ben Christensen says:

        Yep, indexes make a big difference and are particularly necessary when you’re doing joins.

        Glad you figured it out Umar.

        Have a nice day.

  14. [...] Zdrojový článek, který mě navedl na řešení OOM v případě rozsáhlých ResultSetů [...]

  15. Emm says:

    Nice suggestion.

  16. Marc says:

    I have been using fetchSize(Integer.MIN_VALUE) for a long time and it was working perfect until now. I use MySQL, java 1.6, mysql-connector-java-5.1.7-bin.jar and InoDB tables.
    Since a month ago when the query is executed it will take a long time untill it starts processing the results from the resultSet. The query matches about 2M rows. It use to take 10 min untill rows processing started. Now it’s taking about 2 hours.
    Parallelly I have another process wich is doing lots of inserts to that table (I also had it before but with less number of inserts). Could this be causing some bloking that makes the query take that long?
    Thanks in advance.

    • Ben Christensen says:

      It could be many things, row locking, new indexes, thread-thrashing in MySQL, CPU or IO constraints.

      Anytime you’re doing something in parallell on a shared resource any of these or other issues can cause performance issues.

      If you do the query without the parallel inserts does it perform in the 10 minute range?

      If I was in your position I’d start profiling and testing the different individual components and then combining them to see the impact of parallel execution to isolate what the issue really is.

  17. Amit says:

    Great article, Thanks!
    So is there no away to allow concurrent access to the tables while streaming the result set? Why this limitation if i dont care about dirty reads?

    • Ben Christensen says:

      I don’t know anything that would prevent you from doing this as long as each resultset has a separate connection. Are you experiencing table locks when doing so?

  18. Thanks a lot for this post.

  19. I need to read from one huge table, do some processing and insert into another small table.
    I opened 2 connections, one for reading and another for updating.
    My program starts correctly but very soon encounters a deadlock.

Leave a Reply

Twitter Updates

View Ben Christensen's profile on LinkedIn