문제

I'm new to Geotools and facing this issue : I'm injecting in PostGis about 2MB of shapefile info (about 5800 entries) and surprisingly it takes more or less 6 minutes to complete! Quite annoying because my "real" data set might be up to 25MB by shapefile group (shp, dbf...), 100 groups needed.

I was told that it might be an index issue, because Postgre updates tables' indexes on each INSERT. Is there a way to "disable" these indexes during my mass INSERTs and tell the database to create all indexes on the end? Or is there a better way to do that?

Here is my code snippet :

Map<String, Object> shpparams = new HashMap<String, Object>();
shpparams.put("url", "file://" + path);
FileDataStore shpStore = (FileDataStore) shpFactory.createDataStore(shpparams);
SimpleFeatureCollection features = shpStore.getFeatureSource().getFeatures();
if (schema == null) {
    // Copy schema and change name in order to refer to the same
    // global schema for all files
    SimpleFeatureType originalSchema = shpStore.getSchema();
    Name originalName = originalSchema.getName();
    NameImpl theName = new NameImpl(originalName.getNamespaceURI(), originalName.getSeparator(), POSTGIS_TABLENAME);
    schema = factory.createSimpleFeatureType(theName, originalSchema.getAttributeDescriptors(), originalSchema.getGeometryDescriptor(),
            originalSchema.isAbstract(), originalSchema.getRestrictions(), originalSchema.getSuper(), originalSchema.getDescription());
    pgStore.createSchema(schema);
}
// String typeName = shpStore.getTypeNames()[0];
SimpleFeatureStore featureStore = (SimpleFeatureStore) pgStore.getFeatureSource(POSTGIS_TABLENAME);

// Ajout des objets du shapefile dans la table PostGIS
DefaultTransaction transaction = new DefaultTransaction("create");
featureStore.setTransaction(transaction);
try {
    featureStore.addFeatures(features);
    transaction.commit();
} catch (Exception problem) {
    LOGGER.error(problem.getMessage(), problem);
    transaction.rollback();
} finally {
    transaction.close();
}
shpStore.dispose();

Thank you for your help!


So I tested your solutions but nothing helped me more... The completion time is still the same. Here is my table definition :

  • fid serial 10
  • the_geom geometry 2147483647
  • xxx varchar 10
  • xxx int4 10
  • xxx varchar 3
  • xxx varchar 2
  • xxx float8 17
  • xxx float8 17
  • xxx float8 17

So I do not think that the problem is directly linked to my code or the database, maybe it is due to system limitations (RAM, buffers...). I will have a look at this in the next few days.

Do you have more ideas?

도움이 되었습니까?

해결책

I'm back with the solution for this problem. After many investigations, I found that the physical network was the issue : with a local DB (local to geotools app) there were no problem. The network added 200 or 300 millisec to each INSERT statement request. With the large amount of data injected in DB came the very long response time!

So no problem with the orignal Postgis config or my code snippet...

Thank you all for your participation.

다른 팁

You can check if indexes or PK/FK constraints in the database are really the bottleneck with the following steps:

1) Make sure the data is inserted in a single transaction (disable autocommit)

2) Drop all indexes and re-create them after the data import (you cannot disable an index)

DROP INDEX my_index;
CREATE INDEX my_index ON my_table (my_column);

3) Drop or disable PK/FK constraints and re-create or re-enable them after the data import. You can skip the check of PK/FK constraints during data import without dropping them with

ALTER TABLE my_table DISABLE trigger ALL;
-- data import
ALTER TABLE my_table ENABLE trigger ALL;

The downside of this approach is that the PK/FK constraints are not checked for the data that was inserted/updated while the check was disabled. Of course the PK/FK constraints are enforced also for existing data when you re-create them after the data import.

You can also defer the check of PK/FK constraints to the end of a transaction. This is possible if and only if the PK/FK constraint is defined as deferrable (not the default):

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY DEFERRED;

START TRANSACTION;
-- data import
COMMIT; -- constraints are checked here

or

ALTER TABLE my_table ADD PRIMARY KEY (id) DEFERRABLE INITIALLY IMMEDIATE;

START TRANSACTION;
SET CONSTRAINTS ALL DEFERRED;
-- data import
COMMIT; -- constraints are checked here

EDIT:

To narrow down the cause of the problem you can import the data with your application, make a database dump (with insert statements) and import that database dump again. This should give you an idea of how long the plain import takes and what's the overhead of the application.

Create a data-only dump of the database with INSERT statements (COPY statements would be faster, but your application also uses inserts so this is better for comparison):

pg_dump <database> --data-only --column-inserts -f data.sql

Create the empty database schema again and import the data (with basic timing):

date; psql <database> --single-transaction -f data.sql > /dev/null; date

Maybe you can get a little more insight into the problem with this.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top