Frage

i have a list of strings. If any of these strings has a 4-digit year, i want to truncate the string at the end of the year. Otherwise I leave the strings alone.

I tried using:

    for x in my_strings:   
      m=re.search("\D\d\d\d\d\D",x)  
      if m: x=x[:m.end()]  

I also tried:

my_strings=[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) for x in my_strings]  

Neither of these is working.

Can you tell me what I am doing wrong?

War es hilfreich?

Lösung

Something like this seems to work on trivial data:

>>> regex = re.compile(r'^(.*(?<=\D)\d{4}(?=\D))(.*)')                         
>>> strings = ['foo', 'bar', 'baz', 'foo 1999', 'foo 1999 never see this', 'bar 2010 n 2015', 'bar 20156 see this']
>>> [regex.sub(r'\1', s) for s in strings]
['foo', 'bar', 'baz', 'foo 1999', 'foo 1999', 'bar 2010', 'bar 20156 see this']

Andere Tipps

Looks like your only bound on the result string is at the end(), so you should be using re.match() instead, and modify your regex to:

my_expr = r".*?\D\d{4}\D"

Then, in your code, do:

regex = re.compile(my_expr)
my_new_strings = []
for string in my_strings:
    match = regex.match(string)
    if match:
        my_new_strings.append(match.group())
    else:
        my_new_strings.append(string)

Or as a list comprehension:

regex = re.compile(my_expr)
matches = ((regex.match(string), string) for string in my_strings)
my_new_strings = [match.group() if match else string for match, string in matches]

Alternatively, you could use re.sub:

regex = re.compile(r'(\D\d{4})\D')
new_strings = [regex.sub(r'\1', string) for string in my_strings]

I am not entirely sure of your usecase, but the following code can give you some hints:

import re

my_strings = ['abcd', 'ab12cd34', 'ab1234', 'ab1234cd', '1234cd', '123cd1234cd']

for index, string in enumerate(my_strings):
    match = re.search('\d{4}', string)
    if match:
        my_strings[index] = string[0:match.end()]

print my_strings

# ['abcd', 'ab12cd34', 'ab1234', 'ab1234', '1234', '123cd1234']

You were actually pretty close with the list comprehension, but your syntax is off - you need to make the first expression a "conditional expression" aka x if <boolean> else y:

[x[:re.search("\D\d\d\d\d\D",x).end()] if re.search("\D\d\d\d\d\D",x) else x for x in my_strings]

Obviously this is pretty ugly/hard to read. There are several better ways to split your string around a 4-digit year. Such as:

[re.split(r'(?<=\D\d{4})\D', x)[0] for x in my_strings]
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top