How to parse HTML?

質問

I have a table

id txt
1  <html> ... a lot of different html tags
2  <html> ... a lot of different html tags
3   <html> ... a lot of different html tags

How can I parse txt so that I get plain text without all these tags?

解決

If you're on TD14 you might use REGEXP_REPLACE.

REGEXP_REPLACE(txt, '<[^>]*>', ' ', 1, 0, 'i')

This will return wrong results if you got '<' and '>' within, you should search for a better RegExp then.

他のヒント

If you are on TD 14, then it has an inbuilt REPLACE function for the same purpose. (www.info.teradata.com/eDownload.cfm?itemid=113480017)

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow