The solution I posted is naive.
Plus:
on random pages, it seems to handle about 9 out of 10 xhtml web pages on the internet. it can handle ordinary stack xhtml files, but can fail on more unusual features (such as DTDs, etc.). if another program generated your xhtml output, it may work all the time.
the learning curve here is about 1/10 compared to real DOM parsing
the code here is about 1/10 the size compared to real DOM parsing.
familiar perl regex knowledge can then be used.
be prepared that this tool is rather limited. if you outgrow its capabilities, you may have to learn a better DOM parser, anyway.
Minus:
it is completely unsuitable if perfect DOM parsing is required. this code is breakable. it follows the berkeley rather than the at&t approach.
but perfect DOM parsers can also fail on bad HTML documents.
and if you already know DOM parsing, then there is little time cost to do it right. use Mojolicious or XML::LibXML. you may as well stick to the better solution then..
giving this code a reflexive -1 vote ignores that it has its uses. sometimes, an ordinary screwdriver can do a job where a philips would be better. this code is an ordinary screwdriver for a philips screw. stackoverflow is a site to which novices come in need of quick solutions, too; not just the experts. this is why I posted it to begin with.
simple improvement fixes are appreciated, though the goal here is explicitly not to deal with all possible valid and invalid, sane and insane, correct and incorrect permutations of xml and xhtml.
/iaw