Question

I have to open an xml file, trim it of whitespace (other than newlines), remove all lines that match a regular expression, and then remove all lines that match another regular expression. Right now this is uses 3 separate temp files, which I know is unnecessary.

# Trim whitespace from xml
f2 = open(fname + '.xml','r')
f3 = open(fname + 'temp.xml', 'w')
subprocess.call(["tr", "-d", "'\t\r\f'"], stdin=f2, stdout=f3)
f2.flush()
f3.flush()

# Remove the page numbers from the file                         
f4 = open(fname + 'temp2.xml', 'w')
subprocess.call(["sed",
"/<attr key=\"phc.line_number\"><integer>[0-9]*<\/integer><\/attr>/d",
                                    fname + 'temp.xml'], stdout=f4)
f4.flush()

# Remove references to filename from the file
--not implemented--

Is there a way for me to do all of this with one file?

Was it helpful?

Solution

$ sed -i -e 's/[ \r\t\f]//g' -e /pattern1/d -e /pattern2/d x.xml

Note the multiple -e arguments. -i leaves the result in x.xml.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top