Pergunta

I have a demo.html file as its content is :

<html>
<header><header>
<body><table>
   some content here 
</table>
<body>
</html>

I want use Linux sed command to subString and extract its content between and as:

<table>
  some content here 
</table>

Which command will be more simple for this requirement ?

Foi útil?

Solução

Not sure about sed but it is doable in awk:

awk '/<table>/ { cf=1; print "<table>"; getline } /<\/table>/ {cf = 0 } {if(cf == 1){print $0}}END{ print "</table>" }' demo.html

I tried this on the demo.html and it seems to work as expected.

Some assumptions:

1) All the content begins in the line following the opening tag

2) The closing tag appears in its own line - i.e. no partial content in that line before the closing tag appears.

The code might be more apparent in this readable format:

awk ' /<table>/    { cf=1 
                     print "<table>" 
                     getline 
                   }

      /<\/table>/  { cf = 0 
                   } 

                   { if(cf == 1)
                     {
                       print $0
                     }
                   }

      END          { print "</table>" 
                   }' 

Assuming you are familiar with awk, on the first pattern observed in a line, '', it sets the 'cf' (content flag) to 1 (by default all variables are initialized to 0). It then prints the opening '' tag, and triggers the next line to be read via 'getline'

Now the second-last action (the one before 'END' which is equivalent to /*/ which can be omitted) will be active as the 'cf' is set - it just prints all the original line. Earlier as 'cf' is 0, none of the earlier html gets printed out.

Once the closing '' tag is seen, it flips back the cf variable to 0, and any following html is never printed out by the 'match all' action.

The special pattern, 'END' is only invoked after all the lines are parsed, and all it does is print our the closing '' tag.

Hope this is clear.

Outras dicas

I resolved this by awk as :

awk '/^<table>/,/<\/table>$/ { print }' demo.html
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top