Domanda

The following small program should be taking the formatted XML file and print it to another file with no new lines or tabs. However I can't figure out how the resulting file always contains tabs and new lines, instead of just a string of XML.

When I print to console the new lines and tabs are removed but the file always contains tabs and new lines.

open FH, ">tst.out";
MakeSourceFile($ARGV[0]);
close FH;

sub MakeSourceFile
{
    my $sourceFile  = shift;

    eval { require XML::Parser; import XML::Parser; };
    return if $@;

    my $parser = new XML::Parser();
    $parser->setHandlers(
        Start   => \&start,
        End     => \&end,
        Char    => \&data
    );
    $parser->parsefile($sourceFile);
}

sub start
{
    my ($parseinst, $element, %attrs) = @_;
    print FH "<$element";
    my $attrStr = "";
    map { $attrStr .= " $_=\"$attrs{$_}\""; } keys %attrs;
    print FH "$attrStr>";
}

sub data
{
    my ($parseinst, $data) = @_;
    print FH $data;
}

sub end
{
    my ($parseinst, $element, %attrs) = @_;
    print FH "</$element>";
}

input file (test.xml):

<stuff>
    <Profile id="a"></Profile>
    <Profile id="b"></Profile>
    <Profile id="theprofile" extends="a"></Profile>
    <Group>
        <Group>
            <elem stuff="st">stuff here</elem>
        </Group>
    </Group>
</stuff>

output file (tst.out):

<stuff>
    <Profile id="a"></Profile>
    <Profile id="b"></Profile>
    <Profile id="theprofile" extends="a"></Profile>
    <Group>
        <Group>
            <elem stuff="st">stuff here</elem>
        </Group>
    </Group>
</stuff>

expected file output (tst.out):

<stuff><Profile id="a"></Profile><Profile id="b"></Profile><Profile id="theprofile" extends="a"></Profile><Group><Group><elem stuff="st">stuff here</elem></Group></Group></stuff>

I considered that when I open the file in VI there is some kind of auto formatting but that isn't the case; I can also tell perl to just write the output to a file when XML::Parser is not involved and it is not formatted. What is going on here?

È stato utile?

Soluzione

Whitespace is character data just the same as any other text content.

If you want to remove whitespace-only nodes then write

print FH $data if $data =~ /\S/;

You may want to go further and remove leading and trailing whitespace from $data.

Altri suggerimenti

It seems that (I don't know XML spec perfectly) whitespace is considered data by either XML spec or the library.

if ($data =~ /\S/){ 
    print FH $data;   
}

That fixes your specific issue.

XML::Twig will automatically strip extraneous whitespace when parsing and printing an XML file.

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t = XML::Twig->new();
$t->parse( $data );
$t->print;

__DATA__
<stuff>
    <Profile id="a"></Profile>
    <Profile id="b"></Profile>
    <Profile id="theprofile" extends="a"></Profile>
    <Group>
        <Group>
            <elem stuff="st">stuff here</elem>
        </Group>
    </Group>
</stuff>

Outputs:

<stuff><Profile id="a"></Profile><Profile id="b"></Profile><Profile extends="a" id="theprofile"></Profile><Group><Group><elem stuff="st">stuff here</elem></Group></Group></stuff>

In fact, to get it to use whitespace, you must pass the following to the constructor: pretty_print => 'indented',

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top