One problem I see with your code is the use of index
:
ini = dicionario1[chave][1][0].index(i) + 2
fim = dicionario1[chave][1][0].index(')')
index
returns the index of the first occurrence of the character. So if you have two (
's in your string, then both times it will give you the index of the first one. That (and your break
statement) is why in your example you've got ['2.1', '2.2', '2.3']
correctly but also have '(#5.1', '5.2', '5.3)'
.
You can get around this by specifying a starting index to the index
method, but I'd suggest a different strategy. If you don't have any commas in the parsed strings, you can use a fairly simple regex to find all your groups:
'\([^)]*\)|[^,]+'
This will find everything inside parenthesis and also everything that doesn't contain a comma. For example:
>>> import re
>>> teststr = "'1',$,#41,(#10,#5)"
>>> re.findall('\([^)]*\)|[^,]+', teststr)
["'1'", '$', '#41', '(#10,#5)']
This leaves you will everything grouped appropriately. You still have to do a little bit of processing on each entry, but it should be fairly straightforward.
During your processing, the startswith
method should be helpful. For example:
>>> '(something)'.startswith('(')
True
>>> '(something)'.startswith('(#')
False
>>> '(#1,#2,#3)'.startswith('(#')
True
This will make it easy for you to distinguish between (...)
and (#...)
. If there are commas in the (...)
, you could always split on comma after you've used the regex.