As others have said, profile your code to see why it is slow. The cProfile
module in conjunction with the gprof2dot
tool can produce nice readable information
Without seeing your slow code, I can guess a few things that might help:
First is you can probably get away with using the builtin string methods instead of a regex - this might be marginally quicker. If you need to use regex's, it's worthwhile precompiling outside the main loop using re.compile
Second is to not do one insert query per line, instead do the insertions in batches, e.g add the parsed info to a list, then when it reaches a certain size, perform one INSERT query with executemany
method.
Some incomplete code, as an example of the above:
import fileinput
parsed_info = []
for linenum, line in enumerate(fileinput.input()):
if not line.startswith("#DEBUG"):
continue # Skip line
msg = line.partition("MSG")[1] # Get everything after MSG
words = msg.split() # Split on words
info = {}
for w in words:
k, _, v = w.partition(":") # Split each word on first :
info[k] = v
parsed_info.append(info)
if linenum % 10000 == 0: # Or maybe if len(parsed_info) > 500:
# Insert everything in parsed_info to database
...
parsed_info = [] # Clear