Question

Im currently building a comprehensive multi-distro/project/source/OS open source documentation website.

At the moment due to the conversion process necessary to get content from the, not being disrespectful (man pages rock) but, arcane format of standard man pages, I am at 15GB of text and growing.

There has to be more modern source formats I can get the man page content from. Devs aren't still writing 50 year old *roff format file for documentation, are they?

I desperately need to eliminate as many man page sources as I can so heres the list so far, if any devs contributing to these projects can clarify whether the man pages are written raw or generated from another text format as part of the build process that would be a huge help. So on to the list.

Places I hope aren't still using ancient *roff based doc files:

  • FreeBSD
  • NetBSD
  • OpenBSD
  • Debian
  • Ubuntu
  • Gentoo
  • Arch
  • Slackware (I could actually use a better source for slackware 'docs' the docs.slackware.com is kinda 'anemic' compared to other distros)
  • SUSE
  • Red Hat Enterprise Linux

So as a programmer whos spent far to much time wrestling with stupid groff -mandoc 'file' -T html errors in OS based doc file sets, I need help and turn to SO.

Option 1 is alternative far more parse friendly sources for the doc content.

Option 2 is better ways to reformat man page content than running

find . -type f -exec sh -c 'cd $(dirname "$0") && groff -mandoc "$0" -T html > "$0.html"' {} \;

or

find . -type f -exec sh -c 'cd $(dirname "$0") && man2html "$0" > "$0.html"' {} \;

and hoping I dont get a single error in several thousand files. All of that is also pretty much throwing away Unicode multilingual support.

If searching google for technical content involving the words 'man' and 'documentation' wasn’t a nearly useless endeavour I may have found a solution myself but they are so here I am.

Was it helpful?

Solution

I do not know if I get your actual problem.

But today many of the manpages of actively maintained projects are created from docbook (XML) sources that can be found in the source tarballs of the various projects. So they are written and maintained in XML.

Those docbook files are converted by tools like xmlto (or docbook2man) to create manpages or html pages or pdf.. etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top