Looking for a terminal command to parse MacOSX dictionary data file

Question 1

grep operates on text files, but the Body.data files are not text files, unfortunately.

Your best bet is probably to create your own command-line utility in Xcode, as suggested here (sample code): https://discussions.apple.com/thread/2679911

Here's Apple's dictionary API documentation: https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/DictionaryServicesProgGuide/access/access.html#//apple_ref/doc/uid/TP40006152-CH5-SW1

Update:

Assuming you've created a utility named rdef that returns something like 'Definition of <我>: | wǒ | I me my', use the following awk command to parse out the pinyin:

rdef "我" | awk -F ' *[|] *' '{ print $2 }'

Alternatively, if an online-based solution is an option, you could try a Google Translate-based solution.

At least in interactive use you get a pinyin transcription below the input field.

For instance, your example symbol is transcribed as "Wǒ":

http://translate.google.com/?text=%E6%88%91#zh-CN/en/%E6%88%91

Question 2

I had a look in the Chinese Simplified and the Oxford English Dictionary and both have a Contents and Body.data file as you say. However, if I run

file Body.data

it just says data (rather than ASCII text, or UTF-8) - meaning that the file is binary rather than ASCII so grep and its friends are not going to work very well on them at all.

In case anyone is good at spotting a filetype from a hex dump, the files start off like this:

0000000      0000    0000    0000    0000    0000    0000    0000    0000
          \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0000100      c9a8    0106    0000    0000    ffff    ffff    0020    0000
         250 311 006 001  \0  \0  \0  \0 377 377 377 377      \0  \0  \0
0000120      0000    0000    0207    0000    ffff    ffff    ffff    ffff
          \0  \0  \0  \0  \a 002  \0  \0 377 377 377 377 377 377 377 377
0000140      8009    0000    8005    0000    8c22    0004    9c78    bddc
          \t 200  \0  \0 005 200  \0  \0   " 214 004  \0   x 234   ܽ  **
0000160      6c6b    db1b    2f7e    e416    49a6    349a    c5b8    902d
           k   l 033 333   ~   / 026 344 246   I 232   4 270 305   - 220
0000200      fda2    7134    7880    d4ef    2cb6    96d9    9dad    f673