How could I optimize database insertion times when processing big ammount of information?

Question 1

First approach

Have you tried:

$this->db->insert_batch('table', $data);

Where $data is an array with the objects/information you want to insert. I do not know the internals of that method (although looking at the code should not be hard) but I'm almost sure that this method does the whole insertion in a single transaction.

The way you are doing it right now by calling an insert for each line means openening a socket/connection, doing checks and everything that each transaction needs to do in order to do it. So doing a bulk insert is the way to go in those cases, and that function from CI does exactly that, meaning that it will generate a single insert command which is going to be executed on the same transaction.

You even have the advantage to roll back it if one of the inserts failed so the people that generate those files can massage or fix the data.

Second approach

If you know that those files have a specific format you could easily use the LOAD DATA INFILE utility from mysql which is going to have better performance than any tool you can write yourself.

The beauty of it is that you might be able to call it with:

$this->db->query($bulk_insert_command);

Where $bulk_insert_command is actually a string with something like:

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [CHARACTER SET charset_name]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var,...)]
    [SET col_name = expr,...]

As shown in the provided link above. Of course you'd have a function to sanitize this string and replace filename and options and whatever you need.

And finally, make sure that whatever user you set up in database.php on your CI app has the file role permission:

GRANT FILE on *.* TO user@localhost IDENTIFIED  BY 'password';

So that the CI app does not generate an error when running such query.

Question 2

You should profile your code to determine where the bottleneck(s) are.

You can probably speed things up by splitting up the IO and the CPU tasks. There's no point in having multiple processes doing IO unless you've saved the files to multiple disks or something along those lines, so dedicate one IO process to reading in the files into memory and putting them in a queue; then you can have multiple CPU processes pull files from the queue and process them. If possible (i.e. if you have enough RAM), add this processed data to an in-memory queue, and when your IO process has finished reading all of the files into memory you can then have it write the processed data back to disk; if you don't have enough RAM to store your files + processed data in memory then have the IO process alternate between reading and writing. You should run enough CPU processes to utilize your hardware threads, which is probably the number of cores you've got on your CPU, or the number of cores * 2 if your CPU and OS support hyperthreading - run a few timing experiments with various numbers of processes to arrive at a good number.

If you profile the code and find that IO is the problem, then see if you can do something like save the files to a couple of zip files when they're first generated - this will lessen the amount of data you're reading from disk and will also make it more contiguous, at the cost of additional CPU processing when you unzip the data.