ok some problems with your command:
less test.txt | grep -Po 'java.lang.String value="\K[^"]*' | awk -F: '{print "<animal>" $1 "</animal>"}'
to begin with, there's a useless use of less
, grep can take a file as a parameter:
grep -Po 'java.lang.String value="\K[^"]*' test.txt | awk -F: '{print "<animal>" $1 "</animal>"}'
then you're using grep
to select lines that matches a string, so basically, your sequence of commands is explicitely keeping only the lines that have the java.lang...
string, taking everything else out... A simpler solution would be to use sed
:
sed -r 's,<java.lang.String value="([^"]*)"\s*/>,<animal>\1</animal>,g' test.txt
which uses the substitution syntax of sed to replace the match, while extracting what's in the parenthesis (
and )
as \1
in the right part. The [^"]
part is for matching everything that is not a "
character, and the *
operator is to apply the match 0 or more times. The \s
is to match a space, *
, 0 or more times.
A regex is an automaton that uses states and transitions to match a given string. Here's a visual of how the regex works:
demo of the regex on an example
Though in your particular case that simple regex works out, keep in mind that this is only a hack. You should instead use an XML parser and replace the nodes to match your needs, using XSLT/XSLFO that are tools designed to transform an XML into another one (or something else).
To do that, you could use a tool such as xsltproc
and look at this Q for an example that transforms all foo
nodes into bar
nodes in an XML tree, here's how to do it:
test.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!--Identity Template. This will copy everything as-is.-->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!--Change "java.lang.String" element to "animal" element.-->
<xsl:template match="java.lang.String">
<animal>
<!-- get the attribute 'value' of java.lang.String -->
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</animal>
</xsl:template>
</xsl:stylesheet>
run:
xsltproc test.xsl test.xml
result:
<?xml version="1.0"?>
<test time="60" id="01">
<animal value="cat"/>
<animal value="dog"/>
<animal value="mouse"/>
<animal value="cow"/>
</test>
and by the way, given your XML, it looks like it has been generated by Java, and there's multiple ways to apply that XSL from within your code, even before you need to handle it using command line tools.