문제

Problem

MacOSX comes with dictionaries stored in /Library/Dictionaries. I would like to parse them to obtain dictionary results programmatically (via Terminal, AppleScript, or Automator). The dictionaries are MacOSX packages and all have a Contents folder that contains a file called Body.data. I would like to parse that file for a UTF-8 string (maybe Chinese character double bytes) and return the lines where the string is found.

I've tried the following, which is not returning any results:

find . -name 'Body.data' -exec grep -li '我' {} \;

When I search through the dictionary using the app interface I can find the appropriate text. My objective is to create a workflow service to translate selected Chinese text into the pinyin equivalents which are stored in the system/user dictionaries.

Update

The following worked for me based on the accepted answer:

Created and Archived a command line utility called rdef using Xcode with this code:

#import <Foundation/Foundation.h>

int main(int argc, const char * argv[])
{

    @autoreleasepool {

        if(argc < 2)
        {
            printf("Usage: rdef <word to define>");

            return -1;
        }

        NSString * search =
        [NSString stringWithCString: argv[1] encoding: NSUTF8StringEncoding];

        CFStringRef def =
        DCSCopyTextDefinition(NULL,
                              (__bridge CFStringRef)search,
                              CFRangeMake(0, [search length]));

        NSString * output =
        [NSString stringWithFormat: @"Definition of <%@>: %@", search, (__bridge NSString *)def];

        printf("%s", [output UTF8String]);


    }
    return 0;
}

Added the following to my project frameworks:

frameworks-added

Performed a Build and then deployed manually using the steps below.

To deploy:

Right-clicked the Archived package and chose Show in Finder. Then Show Package Contents and drilled down product folder and copied the executable to /local/usr/bin. Now from a command prompt I can run the utility like so:

rdef 我|awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'

Please see the accepted answer below for extended references.

NB: The github for the utility can be found at https://github.com/mingsai/rdef.git

Next I will just create a Service to call the utility from Automator against selected text.

Service Solution

To pay it forward for the folks who've helped, especially @mklement0: here is the Solution for taking the command utility and converting it to a MacOSX service that can be used to translate Chinese characters to pinyin.

Create a new Automator Service file and make sure to select output replaces selected text.

MacOSX Service - Convert Chinese to Pinyin

Automator Script details

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin/: 
export PATH
LC_CTYPE=UTF-8
x=$1

for ((i=0;i<${#x};i++)); do rdef "${x:i:1}" | awk -F  '\|' 'BEGIN {ORS=" "}{ gsub(/^ | +?/, "", $2); if (length($2) > 0) print $2 ; exit}'; done

To make the Service "live" just delete the "Ask for Text" and save the service with name of your choice (e.g. Convert to Pinyin).

To use the revised service highlight any Chinese characters and right click the context menu then on the bottom under the Services menu select "Convert to Pinyin" ... (as indicated below)

Usage

highlighted text

Service-Selection

Produces this output

output

Hope that helps anyone with this problem.

도움이 되었습니까?

해결책

grep operates on text files, but the Body.data files are not text files, unfortunately.

Your best bet is probably to create your own command-line utility in Xcode, as suggested here (sample code): https://discussions.apple.com/thread/2679911

Here's Apple's dictionary API documentation: https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/DictionaryServicesProgGuide/access/access.html#//apple_ref/doc/uid/TP40006152-CH5-SW1

Update:

Assuming you've created a utility named rdef that returns something like 'Definition of <我>: | wǒ | I me my', use the following awk command to parse out the pinyin:

rdef "我" | awk -F ' *[|] *' '{ print $2 }'

Alternatively, if an online-based solution is an option, you could try a Google Translate-based solution.

At least in interactive use you get a pinyin transcription below the input field.

For instance, your example symbol is transcribed as "Wǒ":

http://translate.google.com/?text=%E6%88%91#zh-CN/en/%E6%88%91

다른 팁

I had a look in the Chinese Simplified and the Oxford English Dictionary and both have a Contents and Body.data file as you say. However, if I run

file Body.data

it just says data (rather than ASCII text, or UTF-8) - meaning that the file is binary rather than ASCII so grep and its friends are not going to work very well on them at all.

In case anyone is good at spotting a filetype from a hex dump, the files start off like this:

0000000      0000    0000    0000    0000    0000    0000    0000    0000
          \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0000100      c9a8    0106    0000    0000    ffff    ffff    0020    0000
         250 311 006 001  \0  \0  \0  \0 377 377 377 377      \0  \0  \0
0000120      0000    0000    0207    0000    ffff    ffff    ffff    ffff
          \0  \0  \0  \0  \a 002  \0  \0 377 377 377 377 377 377 377 377
0000140      8009    0000    8005    0000    8c22    0004    9c78    bddc
          \t 200  \0  \0 005 200  \0  \0   " 214 004  \0   x 234   ܽ  **
0000160      6c6b    db1b    2f7e    e416    49a6    349a    c5b8    902d
           k   l 033 333   ~   / 026 344 246   I 232   4 270 305   - 220
0000200      fda2    7134    7880    d4ef    2cb6    96d9    9dad    f673
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top