Question

I've build a crawler which is currently experiencing Memory leaks at a speed of 600MB/day. I think the cause is database connection. Currently, I'm creating a single database connection as a static method as follows:

static
{       
    try
    {
        String hostname = PropertyReader.getProperty("hostname");
        String port = PropertyReader.getProperty("port");
        String dbname = PropertyReader.getProperty("dbname");
        String username = PropertyReader.getProperty("username");
        String password = PropertyReader.getProperty("password");

        Class.forName("com.mysql.jdbc.Driver");
        String url = "jdbc:mysql://"+hostname+":"+port+"/"+dbname+"";
        conn = DriverManager.getConnection(url, username, password);
        System.out.println("conn built");
    }
    catch (SQLException e)
    {
        e.printStackTrace();
    } catch (ClassNotFoundException e)
    {
        e.printStackTrace();
    }
}

and I'm using this variable in multiple methods as follows:

public static void getSeed()

public static void processPage(String URL)

for retrieving and inserting data into database.

What is the best alternate to avoid memory leaks ?

Was it helpful?

Solution 2

Here are my thoughts:

  1. I would not guess about a memory leak. The only reliable way to figure it out is to profile your code. Be a scientist - get data. Use Visual VM with all the plugins installed. It's a great tool and free.
  2. I would not do a database connection this way. Connections are not thread safe. A single connection doesn't feel very scalable, either. Better to have a connection pool that you check in and out in the narrowest scope possible.
  3. You don't show any of the database code, but you need to close all Statement, Connection, and ResultSet instances in a finally block in the scope of the method that creates them. Each one should be closed in an individual try/catch block. This will show up as a problem beyond memory leaks in your crawler: Your database server will eventually run out of cursors. It's a problem both for the client and the database server if you fail to clean up properly.

Different threads should NOT be sharing a single connection. This will lead to grief. Set up a connection pool and then profile. You might be surprised at where the leak shows up.

OTHER TIPS

The biggest source of memory leaks is forgetting to close ResultSets and PreparedStatements. Can't tell if you are doing that elsewhere in your code, but that is where I would start if you do.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top