문제

I am trying to replace variable length items in a list using regex. For example this item "HD479659" should be replaced by "HD0000000479659". I need just to insert 7 0s in between.I have made the following program but every time I run it I got the following error:"TypeError: object of type '_sre.SRE_Pattern' has no len()". Can you please help me how to solve this error.

thank you very much

Here is the program

import xlrd  
import re
import string

wb = xlrd.open_workbook("3_1.xls") 

sh = wb.sheet_by_index(0) 

outfile=open('out.txt','w')


s_pat=r"HD[1-9]{1}[0-9]{5}"

s_pat1=r"HD[0]{7}[0-9]{6}"

pat = re.compile(s_pat) 

pat1 = re.compile(s_pat1)



for rownum1 in range(sh.nrows): 

  str1= str(sh.row_values(rownum1))

  m1=[]

  m1 = pat.findall(str1)

  m1=list(set(m1))

  for a in m1:

      a=re.sub(pat,pat1,a)

  print >> outfile, m1
도움이 되었습니까?

해결책

I think your solution is quite to complicated. This one should do the job and is much simpler:

import re

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, "HD479659")

See also: http://docs.python.org/library/re.html#re.sub

Update:

To transform a list of values, you have to iterate over all values. You don't have to search the matching values first:

import re

values_to_transform = [
    'HD479659',
    'HD477899',
    'HD423455',
    'does not match',
    'but does not matter'
]

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

for value in values_to_transform:
    print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, value)

The result is:

HD0000000479659
HD0000000477899
HD0000000423455
does not match
but does not matter

다른 팁

What you need to do is extract the variable length portion of the ID explicitly, then pad with 0's based on the desired length - matched length.

If I understand the pattern correctly you want to use the regex

r"HD(?P<zeroes>0*)(?P<num>\d+)"

At that point you can do

results = re.search(...bla...).groupdict()

Which returns the dict {'zeroes': '', 'num':'479659'} in this case. From there you can pad as necessary.

It's 5am at the moment or I'd have a better solution for you, but I hope this helps.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top