Sounds like (I may be misunderstanding your question) you just need to capture runs of lowercase letters, rather than each individual lowercase letter. This is easy: just add the +
quantifier to your regular expression.
for seq in sequences:
lower_output.append(re.findall("[a-z]+", seq)) # add substrings
The +
quantifier specifies that you want "at least one, and as many as you can find in a row" of the preceding expression (in this case '[a-z]'
). So this will capture your full runs of lowercase letters all in one group, which should cause them to appear as you want them to in your output lists.
It gets a little big uglier if you want to preserve your list-of-list structure and get the indices as well, but it's still very simple:
for seq in sequences:
matches = re.finditer("[a-z]+", seq) # List of Match objects.
lower_output.append([match.group(0) for match in matches]) # add substrings
lower_indx.append([match.start(0) for match in matches]) # add indices
print lower_output
>>> [['defgdefgdefg'], ['wowhello', 'onemore'], []]
print lower_indx
>>> [[9], [9, 23], []]