LDIF bulk import programmatically

https://stackoverflow.com/questions/21141106

28-09-2022
|

Question

I want to be able to do a bulk import from an LDIF file to an LDAP Server. I have a working implementation(below) that uses UnboundID LDAP SDK. The problem with this is it loops through each entry in the LDIF and will be very slow for large files(millions of entries). Are there any tools/SDKs available for a high speed import? I need to be able to achieve this programmatically(preferably java).

public static void importLdif(){
try{
    LDAPConnection connection = new LDAPConnection("ldapserver.com", 389,
            "uid=admin,ou=system", "secret");
    LDIFReader ldifReader = new LDIFReader("C:/Users/ejamcud/Desktop/LDAP/ldifs/Sailors.ldif");

    int entriesRead = 0;
    int entriesAdded = 0;
    int errorsEncountered = 0;
    Entry entry;
    LDAPResult addResult;
    while (true)
    {
        try
        {
            entry = ldifReader.readEntry();
            if (entry == null)
            {
                System.out.println("All entries have been read.");
                break;
            }

            entriesRead++;
        }
        catch (LDIFException le)
        {
            errorsEncountered++;
            if (le.mayContinueReading())
            {
                // A recoverable error occurred while attempting to read a change
                // record, at or near line number le.getLineNumber()
                // The entry will be skipped, but we'll try to keep reading from the
                // LDIF file.
                continue;
            }
            else
            {
                // An unrecoverable error occurred while attempting to read an entry
                // at or near line number le.getLineNumber()
                // No further LDIF processing will be performed.
                break;
            }
        }
        catch (IOException ioe)
        {
            // An I/O error occurred while attempting to read from the LDIF file.
            // No further LDIF processing will be performed.
            errorsEncountered++;
            break;
        }

        try
        {
            addResult = connection.add(entry);
            // If we got here, then the change should have been processed
            // successfully.
            System.out.println(entry.toLDIFString());

            entriesAdded++;
        }
        catch (LDAPException le)
        {
            // If we got here, then the change attempt failed.
            le.printStackTrace();
            addResult = le.toLDAPResult();
            errorsEncountered++;
        }
    }

}catch(IOException ioe){
    ioe.printStackTrace();
}
catch(LDAPException lde){
    lde.printStackTrace();
}finally{
    //ldifReader.close();
}
}

Solution

It really depends on the directory server that you're using. Some servers provide support for bulk adds under certain conditions, so you might want to look into whether that is the case for the server that you're using. But if you want something that's standard LDAP then your best option is to use multiple threads to parallelize the process of adding entries to the server.

If all the parent entries already exist for the entries you're adding (i.e., you're not adding hierarchical structures but only leaf entries), then it would be very simple to use the UnboundID LDAP SDK to parallelize the process across multiple threads. The LDIF reader already supports using multiple threads to parallelize the process of reading and decoding LDIF records (use an LDIFReader constructor that allows you to specify the number of parse threads), and you can use that in conjunction with an LDIFReaderEntryTranslator that performs an LDAP add of the entry that was read.

If you need to add data with hierarchy, then it's more complicated to parallelize the process because you can't add a child until its parent has been added. However, you can still achieve pretty good parallelism by keeping track of the entries you're currently adding and using some kind of locking mechanism so that you can't add a child until you're done adding its parent. This probably won't be as fast as if you don't need any locking, but you can still parallelize adds in different subtrees.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow