문제

A colleague of mine needs to develop an Eclipse plugin that has to parse multiple XML files to check for programming rules imposed by a client (for example, no xsl:for-each, or no namespaces declared but not used). There are about a 1000 files to be parsed regularly, each file containing about 300-400 lines.

We were wondering which solution was faster to do it. I'm thinking JDOM, and he's thinking RegEx.

Anyone can help us decide which is best ?

Thanks

도움이 되었습니까?

해결책

If all checks are simple "no " or no namespace, a StAX parser would be best, as you are just streaming the documents through it, get all the start elements 'events' and then do your checking. For this, the parser needs relatively little memory.

If you need to referential checking, DOM may be better, as you can easily walk the tree (perhaps via xpath).

다른 팁

DOM, hands down. RegEx would be madness. Use the tool that was intended for the job.

You can't parse recursive structures with RegEx. So unless you have really simple XML files, XML parsing will be much faster and the code will be somewhat sane (so you won't spend endless hours to locate bugs).

Since the files are pretty small, JDom will make your job much easier. For larger files, you will have to use a SAX or similar parser (so you don't have to keep the whole file in RAM).

I you try to parse XML using regular expressions, you are entering a world of pain. If speed is important, using a event-based API might be a tad faster than DOM/JDOM.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top