Question

I'm new to HTML coding and I know HTML has some reserved characters for its use and it also displays some characters by their character code. For example -:

Œ  is   Œ
©  is   ©
®  is    ®

I have the HTML source in std::string. how can i decipher them into their actual form and replace from std::string? is there any library with source available or can it be done using macros preprocessors?

Was it helpful?

Solution

I would recommend using some HTML/XML parser that can automatically do the conversion for you. Parsing HTML correctly by hand is extremely difficult. If you insist on doing it yourself, Boost String Algorithms library provides useful replacement functions.

OTHER TIPS

Œ  is   Œ

No it isn't. Œ is 'PARTIAL LINE BACKWARD'. The correct numeric entities for Œ are Œ and Œ.

One method for the numeric entities would be to use a regular expression like &#([0-9]+);, grab the numeric value and convert it to the ASCII character (probably with sprintf in C++).

For the named entities you would need to build a mapping. You could probably do a simple string replace to convert to the numbers, then use the method above. W3C has a table here: http://www.w3.org/TR/WD-html40-970708/sgml/entities.html

But if you're trying to read or parse a bunch of HTML in a string, you should use an HTML parser. Search for the many questions on SO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top