Question

Is it possible to use HTML Tidy to just indent HTML code?

Sample Code

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Desired Result

<form action="?" method="get" accept-charset="utf-8">
    <ul>
        <li>
        <label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"/>
        </li>
        <li><input class="submit" type="submit" value="Search"/></li>
    </ul>
</form>

If I run it with the standard command, tidy -f errs.txt -m index.html then I get this

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 15.3.6), see www.w3.org">
<title></title>
</head>
<body>
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li><label class="screenReader" for=
"q">Keywords</label><input type="text" name="q" value="" id=
"q"></li>
<li><input class="submit" type="submit" value="Search"></li>
</ul>
</form>
</body>
</html>

How can I omit all the extra stuff and actually get it to indent the code?

Forgive me if that's not a feature that it's supposed to support, what library / tool am I looking for?

Was it helpful?

Solution

Use the indent, tidy-mark, and quiet options:

tidy \
  -indent \
  --indent-spaces 2 \
  -quiet \
  --tidy-mark no \
  index.html

Or, using a config file rather than command-line options:

indent: auto
indent-spaces: 2
quiet: yes
tidy-mark: no

Name it tidy_config.txt and save it the same directory as the .html file. Run it like this:

tidy -config tidy_config.txt index.html

For more customization, use the tidy man page to find other relevant options such as markup: no or force-output: yes.

OTHER TIPS

I didn't found a possibility "only reindent - without any changes". The next config file will "repair" as low as possible and (mostly) only re-indent the html. Tidy still correcting some errorish conditions, like duplicated (repeated) attributes.

#based on http://tidy.sourceforge.net/docs/quickref.html
#HTML, XHTML, XML Options Reference
anchor-as-name: no  #?
doctype: omit
drop-empty-paras: no
fix-backslash: no
fix-bad-comments: no
fix-uri:no
hide-endtags: yes   #?
#input-xml: yes     #?
join-styles: no
literal-attributes: yes
lower-literals: no
merge-divs: no
merge-spans: no
output-html: yes
preserve-entities: yes
quote-ampersand: no
quote-nbsp: no
show-body-only: auto

#Diagnostics Options Reference
show-errors: 0
show-warnings: 0

#Pretty Print Options Reference
break-before-br: yes
indent: yes
indent-attributes: no   #default
indent-spaces: 4
tab-size: 4
wrap: 132
wrap-asp: no
wrap-jste: no
wrap-php: no
wrap-sections: no

#Character Encoding Options Reference
char-encoding: utf8

#Miscellaneous Options Reference
force-output: yes
quiet: yes
tidy-mark: no

For example the next html-fragment

<div>
<div>
<p>
not closed para
<h1>
h1 head
</h1>
<ul>
<li>not closed li
<li>closed li</li>
</ul>
some text
</div>
</div>

will changed to

<div>
    <div>
        <p>
            not closed para
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            <li>closed li
            </ul>some text
    </div>
</div>

As you can notice, the hide-endtags: yes hides the closing </li> from the second bullet in the input. Setting the hide-endtags: no - will get the next:

<div>
    <div>
        <p>
            not closed para
        </p>
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            </li>
            <li>closed li
            </li>
        </ul>some text
    </div>
</div>

so, tidy adds closing </p> and closing </li> to first bullet.

I didn't found a possibility preserve everything on input and only reindent the file.

You need the following option:

tidy --show-body-only yes -i 4 -w 80 -m file.html

http://tidy.sourceforge.net/docs/quickref.html#show-body-only

-i 4 - indents 4 spaces (EDIT: tidy never uses tabs)
or
--indent-with-tabs yes - instead (--tab-size may affect wrapping)

-w 80 - wrap at column 80 (default on my system: 68, very narrow)

-m - modify file inplace

(you may want to leave out the last option, and examine the output first)

Showing only body, will naturally leave out the tidy-mark (generator meta).

Another cool options are: --quiet yes - doesn't print W3C advertisements and other unnecessary output (errors still reported)

To answer the poster's original question, using Tidy to just indent HTML code, here's what I use:

tidy --indent auto --quiet yes --show-body-only auto --show-errors 0 --wrap 0 input.html

input.html

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Output:

<form action="?" method="get" accept-charset="utf-8">
  <ul>
    <li><label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"></li>
    <li><input class="submit" type="submit" value="Search"></li>
  </ul>
</form>

No extra HTML code added. Errors are suppressed. To find out what each option does, it's best to refer to the official reference.

I am very late to the party :)

But in your tidy config file set

tidy-mark: no

by default this is set to yes.

Once done, tidy will not add meta generator tag to your html.

If you'd like to simply format whatever html you receive, ignore errors and indent the code nicely this is a good one liner using tidy

tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null

You can use it with curl too

curl -s someUrl | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top