Question

Simple question - I've got a bucketload of cruddy html pages to clean up and I'm looking for a open source or freeware script/utility to remove any junk and reformat them into nicely laid out consistent code. Any recommendations?

If it's relevant I generally manipulate HTML inside Dreamweaver - but by editing the code and using the wysiwyg window as preview rather than vica-versa - so a Dreamweaver compatible script would be a plus.

Was it helpful?

Solution

I don't think it plugs into Dreamweaver but whenever i need html cleaned up HTML Tidy is my go to guy

OTHER TIPS

I second HTML Tidy.
I just wanted to add it is a library with various ports and bindings. As such it is also integrated in some editors like HTML-Kit or NoteTab, and it has a GUI front end. All these are linked in the page given above.
Note also that the W3C Markup Validation Service has an option to "Clean up Markup with HTML Tidy" (after validation result display).

Dreamweaver CS3 has a built in "Clean up HTML" choice under the "Commands" menu item. I don't think it is nearly as comprehensive as HTML Tidy though.

From the Adobe site:

Clean up code

You can automatically remove empty tags, combine nested font tags, and otherwise improve messy or unreadable HTML or XHTML code.

For information on how to clean up HTML generated from a Microsoft Word document, see Open and edit existing documents.

  1. Open a document:

    • If the document is in HTML, select Commands > Clean Up HTML.
    • If the document is in XHTML, select Commands > Clean Up XHTML. -- For an XHTML document, the Clean Up XHTML command fixes XHTML syntax errors, sets the case of tag attributes to lowercase, and adds or reports the missing required attributes for a tag in addition to performing the HTML cleanup operations.
  2. In the dialog box that appears, select any of the options, and click OK. -- Note: Depending on the size of your document and the number of options selected, it may take several seconds to complete the cleanup.

Remove Empty Container Tags Removes any tags that have no content between them. For example, <b></b> and <font color="#FF0000"></font> are empty tags, but the &ly;b> tag in &ltb>some text</b> is not.

Remove Redundant Nested Tags Removes all redundant instances of a tag. For example, in the code <b>This is what I <b>really</b> wanted to say</b>, the b tags surrounding the word really are redundant and would be removed.

Remove Non-Dreamweaver HTML Comments Removes all comments that were not inserted by Dreamweaver. For example, <!--begin body text--> would be removed, but <!-- TemplateBeginEditable name="doctitle" --> wouldn’t, because it’s a Dreamweaver comment that marks the beginning of an editable region in a template.

Remove Dreamweaver Special Markup Removes comments that Dreamweaver adds to code to allow documents to be automatically updated when templates and library items are updated. If you select this option when cleaning up code in a template-based document, the document is detached from the template. For more information, see Detach a document from a template.

Remove Specific Tag(s) Removes the tags specified in the adjacent text box. Use this option to remove custom tags inserted by other visual editors and other tags that you don’t want to appear on your site (for example, blink). Separate multiple tags with commas (for example, font,blink).

Combine Nested <font> Tags When Possible Consolidates two or more font tags when they control the same range of text. For example, <font size="7"><font color="#FF0000">big red</font></font> would be changed to <font size="7" color="#FF0000">big red</font>.

Show Log On Completion Displays an alert box with details about the changes made to the document as soon as the cleanup is finished.

I use the HTML Formatter...it does exactly what you are looking for.

I definitely think the best tool out there is the HTML Formatter from Logichammer.com. It does exactly what you need and is dead simple to use. Worth it to check out...the guy even has a video on his site showing how easy it is to use. I've been using it for two years now and couldn't live with out it...I get lots of messy code.

I use Cleanup HTML it does the job well cleaning and formatting HTML

I would suggest purehtml.in...it beautifies html, style and JavaScript tags...

You can even buffer your existing HTML through HTML Tidy before it reaches the browser - if it's a low traffic site, then this will make things neat without any effort.

I too recommend HTML Tidy, whilst its not maintained by Dave Ragett anymore the tool is definitely being updated frequently with tweaks.

I use HTML Trim which is a win32 app to cleanup some awful autogenerated blobs of code that some of our devs knock up.

You can also grab the command line version which you may able to integrate into Dreamweaver.

Sorry i cant post more than one hyperlink - still a n00b here.

I've been using Polystyle for a long time, and I'm quite happy. It's fairly flexible about formatting rules and costs around $15. A trial version is available.

I would recommend vim. You could format a block of code with v to select the block and '=' to indent the code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top