Edit content of a list with split and find

https://stackoverflow.com/questions/23614042

20-07-2023
|

Вопрос

I have a dictionary named dicitionario1. I need to replace the content of dicionario[chave][1] which is a list, for the list lista_atributos. lista_atribtutos uses the content of dicionario[chave][1] to get a list where:

All the information is separed by "," except when it finds the characters "(#" and ")". In this case, it should create a list with the content between those characters (also separated by ","). It can find one or more entries of '(#' and I need to work with every single of them.

Although this might be easy, I'm stuck with the following code:

dicionario1 = {'#998' : [['IFCPROPERTYSET'],["'0siSrBpkjDAOVD99BESZyg',#41,'Geometric Position',$,(#977,#762,#768,#754,#753,#980,#755,#759,#757)"]],
               '#1000' : [['IFCRELDEFINESBYPROPERTIES'],["'1dEWu40Ab8zuK7fuATUuvp',#41,$,$,(#973,#951),#998"]]}



for chave in dicionario1:
    lista_atributos = []
    ini = 0
    for i in dicionario1[chave][1][0][ini:]:
        if i == '(' and dicionario1[chave][1][0][dicionario1[chave][1][0].index(i) + 1] == '#':
            ini = dicionario1[chave][1][0].index(i) + 1
            fim = dicionario1[chave][1][0].index(')')  
            lista_atributos.append(dicionario1[chave][1][0][:ini-2].split(','))
            lista_atributos.append(dicionario1[chave][1][0][ini:fim].split(','))
            lista_atributos.append(dicionario1[chave][1][0][fim+2:].split(','))

            print lista_atributos

Result:

[["'1dEWu40Ab8zuK7fuATUuvp'", '#41', '$', '$'], ['#973', '#951'], ['#998']]
[["'0siSrBpkjDAOVD99BESZyg'", '#41', "'Geometric Position'", '$'], ['#977', '#762', '#768', '#754', '#753', '#980', '#755', '#759', '#757'], ['']]

Unfortunately I can figure out how to iterate over the dictionario1[chave][1][0] to get this result:

[["'1dEWu40Ab8zuK7fuATUuvp'"], ['#41'], ['$'], ['$'], ['#973', '#951'], ['#998']]
[["'0siSrBpkjDAOVD99BESZyg'", ['#41'], ["'Geometric Position'"], ['$'], ['#977', '#762', '#768', '#754', '#753', '#980', '#755', '#759', '#757']]

I need the"["'1dEWu40Ab8zuK7fuATUuvp'", '#41', '$', '$']..." in the result, also to turn into ["'1dEWu40Ab8zuK7fuATUuvp'"], ['#41'], ['$'], ['$']...

Also If I modify "Geometric Position" to "(Geometric Position)" the result becomes:

[["'1dEWu40Ab8zuK7fuATUuvp'", '#41', '$', '$'], ['#973', '#951'], ['#998']]

SOLUTION: (thanks to Rob Watts)

import re

dicionario1 =["'0siSrBpkjDAOVD99BESZyg',#41,'(Geometric) (Position)',$,(#977,#762,#768,#754,#753,#980,#755,#759,#757)"]

dicionario1 =  re.findall('\([^)]*\)|[^,]+', dicionario1[0])

for i in range(len(dicionario1)):
    if dicionario1[i].startswith('(#'):
        dicionario1[i] = dicionario1[i][1:-1].split(',')
    else:
        pass

print dicionario1

["'0siSrBpkjDAOVD99BESZyg'", '#41', "'(Geometric) (Position)'", '$', ['#977', '#762', '#768', '#754', '#753', '#980', '#755', '#759', '#757']]

Решение

One problem I see with your code is the use of index:

ini = dicionario1[chave][1][0].index(i) + 2
fim = dicionario1[chave][1][0].index(')')

index returns the index of the first occurrence of the character. So if you have two ('s in your string, then both times it will give you the index of the first one. That (and your break statement) is why in your example you've got ['2.1', '2.2', '2.3'] correctly but also have '(#5.1', '5.2', '5.3)'.

You can get around this by specifying a starting index to the index method, but I'd suggest a different strategy. If you don't have any commas in the parsed strings, you can use a fairly simple regex to find all your groups:

'\([^)]*\)|[^,]+'

This will find everything inside parenthesis and also everything that doesn't contain a comma. For example:

>>> import re
>>> teststr = "'1',$,#41,(#10,#5)"
>>> re.findall('\([^)]*\)|[^,]+', teststr)
["'1'", '$', '#41', '(#10,#5)']

This leaves you will everything grouped appropriately. You still have to do a little bit of processing on each entry, but it should be fairly straightforward.

During your processing, the startswith method should be helpful. For example:

>>> '(something)'.startswith('(')
True
>>> '(something)'.startswith('(#')
False
>>> '(#1,#2,#3)'.startswith('(#')
True

This will make it easy for you to distinguish between (...) and (#...). If there are commas in the (...), you could always split on comma after you've used the regex.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow