Question

I am parsing one file which has some html tag and changing into latex tag.

cat text

  <Text>A &lt;strong&gt;ASDFF&lt;/strong&gt; is a &lt;em&gt;cerebrovafdfasfscular&lt;/em&gt; condifasdftion caufadfsed fasdfby tfdashe l
 ocfsdafalised &lt;span style="text-decoration: underline;"&gt;ballooning&lt;/span&gt; or difdaslation of an arfdatery in thdfe bfdasrai
 n. Smadfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (&lt;span style="text-decoration: underline;"&gt;&lt;em&gt;&lt;str
 ong&gt;asymptomatic&lt;/strong&gt;&lt;/em&gt;&lt;/span&gt;) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd

  sed -e 's|&lt;strong&gt;\(.*\)&lt;/strong&gt;|\\textbf{\1}|g' test

cat out

 <Text>A \textbf{ASDFF&lt;/strong&gt; is a &lt;em&gt;cerebrovafdfasfscular&lt;/em&gt;    condifasdftion caufadfsed fasdfby tfdashe locfsda
    falised &lt;span style="text-decoration: underline;"&gt;ballooning&lt;/span&gt; or    difdaslation of an arfdatery in thdfe bfdasrain. Sma
      dfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (&lt;span style="text-decoration: underline;"&gt;&lt;em&gt;&lt;strong&gt
      ;asymptomatic}&lt;/em&gt;&lt;/span&gt;) bfdasut lfdsaarger afdasneurysms maydas besda   asfdsasociated widfth sdsfudd

Expected outputs should be \textbf{ASDFF} while i observe \textbf{ASDFF .........}. How to get expected result?

Regards

Was it helpful?

Solution

You may want to use perl regex instead.

perl -pe  's|&lt;strong&gt;(.*?)&lt;/strong&gt;|\\textbf{\1}|g'

Your problem is similar with non-greedy-regex-matching-in-sed. And next time you may want to simplify your case to point out the real problem. For example, don't just paste the raw html code, use this instead:

fooTEXT1barfooTEXT2bar

Update

If you just want the greedy approach, just ignore this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top