How to remove the html tags using sed

https://stackoverflow.com/questions/23548205

18-07-2023
|

Pergunta

Input is:

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>

Expected Ouput:

This is heading 1
This is heading 2
This is heading 3
This is heading 4
This is heading 5
This is heading 6

I tried sed -n 's/<[^>].*>//gp' example.html but get nothing on screen, it seems the regular expression is not right

Solução 2

sed -n 's/<[^>]*>//gp' test.csv | sed '/^$/d'

You are almost there, the dot(.) you used could match a ">" character, so remove it from you command

the command after pipe is to clear all blank lines

Outras dicas

grep should be enough for this if your version supports -P option for PCRE.

$ grep -oP '(?<=>)(.[^<]+)(?=<)' file
This is heading 1
This is heading 2
This is heading 3
This is heading 4
This is heading 5
This is heading 6

Work on your sample

sed -n 's|</\{0,1\}h[0-9]>||gp' YourFile

replace any and on line and if there is a modification, print the line

to be more exact (assuming tag

sed -n 's|^[[:space:]]*<\(h[0-9]>\)\(.*\)</\1|\2|p' YourFile

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow