Вопрос

I am trying to replace variable length items in a list using regex. For example this item "HD479659" should be replaced by "HD0000000479659". I need just to insert 7 0s in between.I have made the following program but every time I run it I got the following error:"TypeError: object of type '_sre.SRE_Pattern' has no len()". Can you please help me how to solve this error.

thank you very much

Here is the program

import xlrd  
import re
import string

wb = xlrd.open_workbook("3_1.xls") 

sh = wb.sheet_by_index(0) 

outfile=open('out.txt','w')


s_pat=r"HD[1-9]{1}[0-9]{5}"

s_pat1=r"HD[0]{7}[0-9]{6}"

pat = re.compile(s_pat) 

pat1 = re.compile(s_pat1)



for rownum1 in range(sh.nrows): 

  str1= str(sh.row_values(rownum1))

  m1=[]

  m1 = pat.findall(str1)

  m1=list(set(m1))

  for a in m1:

      a=re.sub(pat,pat1,a)

  print >> outfile, m1
Это было полезно?

Решение

I think your solution is quite to complicated. This one should do the job and is much simpler:

import re

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, "HD479659")

See also: http://docs.python.org/library/re.html#re.sub

Update:

To transform a list of values, you have to iterate over all values. You don't have to search the matching values first:

import re

values_to_transform = [
    'HD479659',
    'HD477899',
    'HD423455',
    'does not match',
    'but does not matter'
]

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

for value in values_to_transform:
    print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, value)

The result is:

HD0000000479659
HD0000000477899
HD0000000423455
does not match
but does not matter

Другие советы

What you need to do is extract the variable length portion of the ID explicitly, then pad with 0's based on the desired length - matched length.

If I understand the pattern correctly you want to use the regex

r"HD(?P<zeroes>0*)(?P<num>\d+)"

At that point you can do

results = re.search(...bla...).groupdict()

Which returns the dict {'zeroes': '', 'num':'479659'} in this case. From there you can pad as necessary.

It's 5am at the moment or I'd have a better solution for you, but I hope this helps.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top