XML 命令行处理的 Grep 和 Sed 等效项

https://stackoverflow.com/questions/91791

01-07-2019
|

题

在进行 shell 脚本编写时，通常数据将位于单行记录的文件中，例如 csv。处理这些数据非常简单 grep 和 sed. 。但我必须经常处理 XML，因此我非常想要一种通过命令行编写对 XML 数据的脚本访问的方法。最好的工具是什么？

解决方案

我发现 xmlstarlet 在这类事情上非常擅长。

http://xmlstar.sourceforge.net/

大多数发行版存储库中也应该提供。介绍性教程在这里：

http://www.ibm.com/developerworks/library/x-starlet.html

其他提示

一些有前途的工具：

诺科吉里: ：使用 XPath 和 CSS 选择器在 ruby 中解析 HTML/XML DOM
赫普里科特: ：已弃用
fxgrep：使用自己的类似 XPath 的语法来查询文档。用SML编写，因此安装可能很困难。
LT XML：XML 工具包源自 SGML 工具，包括 sggrep, sgsort, xmlnorm 和别的。使用自己的查询语法。文档是非常正式的。用C写成。LT XML 2声称支持XPATH，Xinclude和其他W3C标准。
xmlgrep2：使用 XPath 进行简单而强大的搜索。使用XML :: libxml和libxml2编写。
XQ锐利：支持 XQuery（XPath 的扩展）。为 .NET Framework 编写。
xml-coreutils：Laird Breyer 的工具包相当于 GNU coreutils。在一个有趣的散文理想的工具包应包含哪些内容。
xmldiff：用于比较两个 xml 文件的简单工具。
xmltk: ：debian、ubuntu、fedora 或 macports 中似乎没有软件包，自 2007 年以来就没有发布过，并且使用不可移植的构建自动化。

xml-coreutils 似乎是文档最齐全且最面向 UNIX 的。

还有 xml2 和 2xml 一对。它将允许常用的字符串编辑工具处理 XML。

例子。q.xml:

<?xml version="1.0"?>
<foo>
    text
    more text
    <textnode>ddd</textnode><textnode a="bv">dsss</textnode>
    <![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>

xml2 < q.xml

/foo=
/foo=   text
/foo=   more text
/foo=   
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo=    asfdasdsa <foo> sdfsdfdsf <bar> 
/foo=

xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml

<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>

附：还有 html2 / 2html.

在 Joseph Holsten 的优秀列表中，我添加了 Perl 库 XML::XPath 附带的 xpath 命令行脚本。从 XML 文件中提取信息的好方法：

 xpath -q -e '/entry[@xml:lang="fr"]' *xml

您可以使用 xmllint：

xmllint --xpath //title books.xml

应该与大多数发行版捆绑在一起，并且还与 Cygwin 捆绑在一起。

$ xmllint --version
xmllint: using libxml version 20900

看：

$ xmllint
Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        ...
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
        --oldxml10: use XML-1.0 parsing rules before the 5th edition
        --xpath expr: evaluate the XPath expression, inply --noout

还有 NetBSD xmltools 的 xmlsed 和 xmlgrep！

http://blog.huoc.org/xmltools-not-dead.html

如果您正在寻找 Windows 上的解决方案，Powershell 具有用于读取和写入 XML 的内置功能。

测试.xml：

<root>
  <one>I like applesauce</one>
  <two>You sure bet I do!</two>
</root>

Powershell脚本：

# load XML file into local variable and cast as XML type.
$doc = [xml](Get-Content ./test.xml)

$doc.root.one                                   #echoes "I like applesauce"
$doc.root.one = "Who doesn't like applesauce?"  #replace inner text of <one> node

# create new node...
$newNode = $doc.CreateElement("three")
$newNode.set_InnerText("And don't you forget it!")

# ...and position it in the hierarchy
$doc.root.AppendChild($newNode)

# write results to disk
$doc.save("./testNew.xml")

测试新.xml：

<root>
  <one>Who likes applesauce?</one>
  <two>You sure bet I do!</two>
  <three>And don't you forget it!</three>
</root>

来源： https://serverfault.com/questions/26976/update-xml-from-the-command-line-windows

具体取决于您想做什么。

XSLT 可能是可行的方法，但有一个学习曲线。尝试 xslt过程并注意您可以提交参数。

还有 saxon-lint 从命令行能够使用 XPath 3.0/XQuery 3.0。（其他命令行工具使用 XPath 1.0）。

例子：

http/html:

$ saxon-lint --html --xpath 'count(//a)' http://stackoverflow.com/q/91791
328

xml：

$ saxon-lint --xpath '//a[@class="x"]' file.xml

XQuery 可能是一个很好的解决方案。它（相对）容易学习，并且是 W3C 标准。

我会推荐 XQ锐利对于命令行处理器。

我第一次使用 xmlstarlet 并且仍在使用它。当查询变得困难时，我需要 XML 路径2 和查询我求助的功能支持 希德尔 http://www.videlibri.de/xidel.html

JEdit 有一个名为“XQuery”的插件，它提供 XML 文档的查询功能。

不完全是命令行，但它可以工作！

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow

XML 命令行处理的 Grep 和 Sed 等效项

例子 ：

例子：