我正在编写一个分析MySQL数据库的应用程序,我需要同样执行几个DML;例如:

// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
   id = rsA.getInt("id");
   // Retrieve data from table B: Select * from B where B.Id=" + id;
   // Crunch some numbers using the data from B
   // Close resultset B
}

我正在宣布一系列数据对象,每个数据对象都有自己与数据库的连接,这又调用了几种用于数据分析的方法。问题是所有线程都使用相同的连接,因此所有任务都引发了异常:“超越锁定超时;尝试重新启动事务”

我相信有一种方法可以以某种方式编写代码,以使任何给定对象都有自己的连接并执行独立于任何其他对象的所需任务。例如:

DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database, 
// so each instance of the object should use its own connection. 
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);

for(i=0;i<=N;i++) {
   ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).

你能为我指出正确的方向吗?

谢谢

有帮助吗?

解决方案 2

After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go

I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.

With that first issue corrected, I went back to the design table, and realized that my process was:

  1. Read an Id from an input table
  2. Take a block of data related to the Id read in step 1, stored in other input tables
  3. Crunch numbers: Read the related input tables and process the data stored in them
  4. Save the results in one or more output tables
  5. Repeat the process while I have pending Ids in the input table

Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!

My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:

  • Read the input data
  • Crunch the numbers, and put the results in a bunch of queues (one for each output table)
  • A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.

So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!


I know there are better approaches to this particular problem, but this one works quite fine.

So... if anyone is stuck with a problem like this... I hope this helps.

其他提示

我认为问题在于,您将许多中层,交易和持续的逻辑混淆为一个类。

如果您直接处理Resultset,则不会以非常注重对象的方式考虑事物。

如果您能弄清楚如何使数据库进行一些计算,那么您就很聪明。

如果没有,我建议您在最短时间开放连接。打开连接,获取结果集,将其映射到对象或数据结构中,在本地范围中关闭结果集和连接,然后返回用于处理的映射对象/数据结构。

您将其保持持久性和处理逻辑的方式分开。通过保持短暂的联系,您可以节省很多悲伤。

如果存储过程解决方案较慢,则可能是由于索引差。另一个解决方案的性能将同样差,即使不是更糟。尝试运行解释计划,看看您的任何查询是否正在使用表扫描。如果是,您需要添加一些索引。如果您的交易长期运行,这也可能是由于大量回滚日志所致。您可以并且应该做很多事情,以确保您在切换之前使用的解决方案做了一切可能。您可以付出很多努力,但仍然无法解决根本原因。

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top