Replacing variable length items in a list using regex in python
문제
I am trying to replace variable length items in a list using regex. For example this item "HD479659" should be replaced by "HD0000000479659". I need just to insert 7 0s in between.I have made the following program but every time I run it I got the following error:"TypeError: object of type '_sre.SRE_Pattern' has no len()". Can you please help me how to solve this error.
thank you very much
Here is the program
import xlrd
import re
import string
wb = xlrd.open_workbook("3_1.xls")
sh = wb.sheet_by_index(0)
outfile=open('out.txt','w')
s_pat=r"HD[1-9]{1}[0-9]{5}"
s_pat1=r"HD[0]{7}[0-9]{6}"
pat = re.compile(s_pat)
pat1 = re.compile(s_pat1)
for rownum1 in range(sh.nrows):
str1= str(sh.row_values(rownum1))
m1=[]
m1 = pat.findall(str1)
m1=list(set(m1))
for a in m1:
a=re.sub(pat,pat1,a)
print >> outfile, m1
해결책
I think your solution is quite to complicated. This one should do the job and is much simpler:
import re
def repl(match):
return match.group(1) + ("0"*7) + match.group(2)
print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, "HD479659")
See also: http://docs.python.org/library/re.html#re.sub
Update:
To transform a list of values, you have to iterate over all values. You don't have to search the matching values first:
import re
values_to_transform = [
'HD479659',
'HD477899',
'HD423455',
'does not match',
'but does not matter'
]
def repl(match):
return match.group(1) + ("0"*7) + match.group(2)
for value in values_to_transform:
print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, value)
The result is:
HD0000000479659
HD0000000477899
HD0000000423455
does not match
but does not matter
다른 팁
What you need to do is extract the variable length portion of the ID explicitly, then pad with 0's based on the desired length - matched length.
If I understand the pattern correctly you want to use the regex
r"HD(?P<zeroes>0*)(?P<num>\d+)"
At that point you can do
results = re.search(...bla...).groupdict()
Which returns the dict {'zeroes': '', 'num':'479659'} in this case. From there you can pad as necessary.
It's 5am at the moment or I'd have a better solution for you, but I hope this helps.