Question

I'm using HTML Tidy, and out of something like

<a href="http://www.äöü.com/">Link</a>

it makes

<a href="http://www.%C3%A4%C3%B6%C3%BC.de/">Link</a>

How do I tell Tidy to keep the links the way they are?

My configuration:

'output-xhtml' => true,
'numeric-entities' => true,
'hide-comments' => false,
'show-body-only' => true,
'doctype' => 'transitional',
'wrap' => 0,
'alt-text' => '',
'word-2000' => true,
'drop-proprietary-attributes' => true

I've also tried to disable fix-uri and to set char-encoding to utf8, but to no avail.

The reason why I want to do this is because Firefox has recently had trouble with this kind of URL encoding. Try to visit www.v%C3%A4terwiderstand.de in the latest version of Firefox (I can't make an actual link out of it, since stackoverflow seems to be unable to handle domains with umlauts as well), and you'll see a "server not found" error. Simply hitting enter in the URL bar afterwards will open the website fine, though. This does not happen in other browsers, as far as I'm aware.

Any help would be appreciated!

Was it helpful?

Solution

I'm not sure what your ENV looks like etc but this works; fix-uri=no|0 is all that was required for me. I added the char-encoding argument for good measure. Perl's tidy wraps the tidy lib so this should be valid for most/any tidy use after converting the arguments.

#!/usr/bin/env perl
use strictures;
use utf8;
use open qw( :std :utf8 );
use HTML::Tidy;

print HTML::Tidy
    ->new({ "show-body-only" => 1,
            "char-encoding" => "utf8",
            "fix-uri" => 0, })
    ->clean(q{<a href="http://www.äöü.com/">Link</a>});

__END__
<a href="http://www.äöü.com/">Link</a>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top