Question

I'm using an NSTask to grab the output from /usr/bin/man. I'm getting the output but without formatting (bold, underline). Something that should appear like this:

Bold text with underline

(note the italic text is actually underlined, there's just no formatting for it here)

Instead gets returned like this:

BBoolldd text with _u_n_d_e_r_l_i_n_e

I have a minimal test project at http://cl.ly/052u2z2i2R280T3r1K3c that you can download and run; note the window does nothing; the output gets logged to the Console.

I presume I need to somehow interpret the NSData object manually but I have no idea where to start on that. I'd ideally like to translate it to an NSAttributedString but the first order of business is actually eliminating the duplicates and underscores. Any thoughts?

Was it helpful?

Solution

What is your actual purpose? If you want to show a man page, one option is to convert it to HTML and render it with a Web view.

Parsing man’s output can be tricky because it is processed by groff using a terminal processor by default. This means that the output is tailored to be shown on terminal devices.

One alternative solution is to determine the actual location of the man page source file, e.g.

$ man -w bash
/usr/share/man/man1/bash.1.gz

and manually invoke groff on it with -a (ASCII approximation) and -c (disable colour output), e.g.

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -c -a -Tascii -man

This will result in an ASCII file without most of the formatting. To generate HTML output,

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -Thtml -man

You can also specify these options in a custom configuration file for man, e.g. parseman.conf, and tell man to use that configuration file with the -C option instead of invoking man -w, gunzip, and groff. The default configuration file is /private/etc/man.conf.

Also, you can probably tailor the output of the terminal device processor by passing appropriate options to grotty.

OTHER TIPS

Okay, here's the start of my solution, though I would be interested in any additional (easier?) ways to do this.

The output returned from the Terminal is UTF-8 encoding, but the NSUTF8StringEncoding doesn't interpret the string properly. The reason is the way NSTask output is formatted.

The letter N is 0x4e in UTF-8. But the NSData corresponding to that is 0x4e 0x08 0x4e. 0x08 corresponds to a Backspace. So for a bold letter, Terminal prints letter-backspace-letter.

For an italic c, it's 0x63 in UTF-8. The NSData contains 0x5f 0x08 0x63, with 0x5f corresponding to an underscore. So for italics, Terminal prints underscore-backspace-letter.

I really don't see any way around this at this point besides just scanning the raw NSData for these sequences. I'll probably post the source to my parser here once I finish it, unless anybody has any existing code. As the common programming phrase goes, never write yourself what you can copy. :)

Follow-Up:

I've got a good, fast parser together for taking man output and replacing the bold/underlined output with bold/underlined formatting in an NSMutableAttributedString. Here's the code if anybody else needs to solve the same problem:

NSMutableIndexSet *boldChars = [[NSMutableIndexSet alloc] init];
NSMutableIndexSet *underlineChars = [[NSMutableIndexSet alloc] init];

char* bBytes = malloc(1);
bBytes[0] = (char)0x08;
NSData *bData = [NSData dataWithBytes:bBytes length:1];
free(bBytes); bBytes = nil;
NSRange testRange = NSMakeRange(1, [inputData length] - 1);
NSRange bRange = NSMakeRange(0, 0);

do {
    bRange = [inputData rangeOfData:bData options:(NSDataSearchOptions)NULL range:testRange];
    if (bRange.location == NSNotFound || bRange.location > [inputData length] - 2) break;
    const char * buff = [inputData bytes];

    if (buff[bRange.location - 1] == 0x5f) {

        // it's an underline
        //NSLog(@"Undr %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [underlineChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else if (buff[bRange.location - 1] == buff[bRange.location + 1]) {

        // It's a bold
        //NSLog(@"Bold %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [boldChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else {

        testRange.location = bRange.location + 1;
        testRange.length = [inputData length] - testRange.location;
    }
} while (testRange.location <= [inputData length] - 3);

NSMutableAttributedString *str = [[NSMutableAttributedString alloc] initWithString:[[NSString alloc] initWithData:inputData encoding:NSUTF8StringEncoding]];

NSFont *font = [NSFont fontWithDescriptor:[NSFontDescriptor fontDescriptorWithName:@"Menlo" size:12] size:12];
NSFont *boldFont = [[NSFontManager sharedFontManager] convertFont:font toHaveTrait:NSBoldFontMask];

[str addAttribute:NSFontAttributeName value:font range:NSMakeRange(0, [str length])];

__block NSUInteger begin = [underlineChars firstIndex];
__block NSUInteger end = begin;
[underlineChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
    }
}];

begin = [boldChars firstIndex];
end = begin;
[boldChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
    }
}];

Another method would be to convert the man page to PostScript source code, run that through the PostScript-to-PDF converter, and put that into a PDFView.

The implementation would be similar to Bavarious's answer, just with different arguments to groff (-Tps instead of -Thtml).

This would be the slowest solution, but also probably the best for printing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top