Question

I downloaded the source package of kakasi from: http://packages.ubuntu.com/precise/kakasi

plee@sos:~/Japanese/kakasi$ l
total 0
plee@sos:~/Japanese/kakasi$ wget http://archive.ubuntu.com/ubuntu/pool/universe/k/kakasi/kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz
--2012-10-08 11:01:00--  http://archive.ubuntu.com/ubuntu/pool/universe/k/kakasi/kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz
Resolving archive.ubuntu.com... 91.189.92.183, 91.189.92.184, 91.189.92.188, ...
Connecting to archive.ubuntu.com|91.189.92.183|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1329263 (1.3M) [application/x-gzip]
Saving to: 'kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz'

100%[======================================================================================================================================================================>] 1,329,263    705K/s   in 1.8s    

2012-10-08 11:01:27 (705 KB/s) - 'kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz' saved [1329263/1329263]

Then I uncompressed it:

plee@sos:~/Japanese/kakasi$ tar zxvf kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz 
kakasi-2.3.5pre1/
kakasi-2.3.5pre1/maintMakefile
kakasi-2.3.5pre1/INSTALL-ja
kakasi-2.3.5pre1/ONEWS
kakasi-2.3.5pre1/kakasi.spec.in
kakasi-2.3.5pre1/src/
kakasi-2.3.5pre1/src/a2.c
kakasi-2.3.5pre1/src/k2.c
kakasi-2.3.5pre1/src/jj2.c
kakasi-2.3.5pre1/src/itaiji.c
kakasi-2.3.5pre1/src/getopt1.c
kakasi-2.3.5pre1/src/getopt.h
kakasi-2.3.5pre1/src/dict.c
kakasi-2.3.5pre1/src/kakasi.c
kakasi-2.3.5pre1/src/kk2.c
kakasi-2.3.5pre1/src/mkkanwa.c
kakasi-2.3.5pre1/src/atoc-conv.c
kakasi-2.3.5pre1/src/conv-util.h
kakasi-2.3.5pre1/src/78_83.c
kakasi-2.3.5pre1/src/conv-util.c
kakasi-2.3.5pre1/src/level.h
kakasi-2.3.5pre1/src/rdic-conv.c
kakasi-2.3.5pre1/src/ee2.c
kakasi-2.3.5pre1/src/Makefile.am
kakasi-2.3.5pre1/src/Makefile.in
kakasi-2.3.5pre1/src/g2.c
kakasi-2.3.5pre1/src/j2.c
kakasi-2.3.5pre1/src/hh2.c
kakasi-2.3.5pre1/src/kakasi.h
kakasi-2.3.5pre1/src/wx2-conv.c
kakasi-2.3.5pre1/src/level.c
kakasi-2.3.5pre1/src/kanjiio.c
kakasi-2.3.5pre1/src/getopt.c
kakasi-2.3.5pre1/config.guess
kakasi-2.3.5pre1/config.rpath
kakasi-2.3.5pre1/INSTALL
kakasi-2.3.5pre1/configure.in
kakasi-2.3.5pre1/AUTHORS
kakasi-2.3.5pre1/config.sub
kakasi-2.3.5pre1/NEWS
kakasi-2.3.5pre1/configure
kakasi-2.3.5pre1/tests/
kakasi-2.3.5pre1/tests/kakasi-6
kakasi-2.3.5pre1/tests/kakasi-5
kakasi-2.3.5pre1/tests/env.sh
kakasi-2.3.5pre1/tests/kakasi-2
kakasi-2.3.5pre1/tests/kakasi-1
kakasi-2.3.5pre1/tests/kakasi-7
kakasi-2.3.5pre1/tests/kakasi-4
kakasi-2.3.5pre1/tests/Makefile.am
kakasi-2.3.5pre1/tests/Makefile.in
kakasi-2.3.5pre1/tests/kakasi-3
kakasi-2.3.5pre1/ltmain.sh
kakasi-2.3.5pre1/THANKS
kakasi-2.3.5pre1/man/
kakasi-2.3.5pre1/man/kakasi.1.ja
kakasi-2.3.5pre1/man/kakasi.cat
kakasi-2.3.5pre1/man/kakasi.1
kakasi-2.3.5pre1/man/kakasi.cat.ja
kakasi-2.3.5pre1/man/Makefile.am
kakasi-2.3.5pre1/man/Makefile.in
kakasi-2.3.5pre1/aclocal.m4
kakasi-2.3.5pre1/kakasi-config.in
kakasi-2.3.5pre1/kakasi.spec
kakasi-2.3.5pre1/install-sh
kakasi-2.3.5pre1/missing
kakasi-2.3.5pre1/COPYING
kakasi-2.3.5pre1/README
kakasi-2.3.5pre1/kakasidict
kakasi-2.3.5pre1/README-ja
kakasi-2.3.5pre1/doc/
kakasi-2.3.5pre1/doc/README.BeOS
kakasi-2.3.5pre1/doc/README.lib
kakasi-2.3.5pre1/doc/JISYO
kakasi-2.3.5pre1/doc/CVS/
kakasi-2.3.5pre1/doc/CVS/Repository
kakasi-2.3.5pre1/doc/CVS/Entries
kakasi-2.3.5pre1/doc/CVS/Root
kakasi-2.3.5pre1/doc/README.wakati
kakasi-2.3.5pre1/doc/README.level
kakasi-2.3.5pre1/doc/ChangeLog.lib
kakasi-2.3.5pre1/doc/README.OS2
kakasi-2.3.5pre1/itaijidict
kakasi-2.3.5pre1/Makefile.am
kakasi-2.3.5pre1/TODO
kakasi-2.3.5pre1/lib/
kakasi-2.3.5pre1/lib/kakasi.def
kakasi-2.3.5pre1/lib/libee2.c
kakasi-2.3.5pre1/lib/libkanjiio.c
kakasi-2.3.5pre1/lib/libkakasi.c
kakasi-2.3.5pre1/lib/libg2.c
kakasi-2.3.5pre1/lib/libhh2.c
kakasi-2.3.5pre1/lib/libjj2.c
kakasi-2.3.5pre1/lib/libdict.c
kakasi-2.3.5pre1/lib/lib78_83.c
kakasi-2.3.5pre1/lib/libj2.c
kakasi-2.3.5pre1/lib/liba2.c
kakasi-2.3.5pre1/lib/libkakasi.h
kakasi-2.3.5pre1/lib/libkk2.c
kakasi-2.3.5pre1/lib/libk2.c
kakasi-2.3.5pre1/lib/Makefile.am
kakasi-2.3.5pre1/lib/Makefile.in
kakasi-2.3.5pre1/lib/libitaiji.c
kakasi-2.3.5pre1/lib/makefile.msc.in
kakasi-2.3.5pre1/Makefile.in
kakasi-2.3.5pre1/magic-kakasi
kakasi-2.3.5pre1/ChangeLog
kakasi-2.3.5pre1/config.h.in
plee@sos:~/Japanese/kakasi$ l
total 1304
drwxr-xr-x 7 plee plee    4096 2010-03-21 19:36 kakasi-2.3.5pre1
-rw-r--r-- 1 plee plee 1329263 2010-05-09 09:06 kakasi_2.3.5~pre1+cvs20071101.orig.tar.gz
plee@sos:~/Japanese/kakasi$ cd kakasi-2.3.5pre1/
plee@sos:~/Japanese/kakasi/kakasi-2.3.5pre1$ l
total 3520
-rw-r--r-- 1 plee plee  365083 2010-03-21 19:35 aclocal.m4
-rw-r--r-- 1 plee plee     356 2001-04-12 02:36 AUTHORS
-rw-r--r-- 1 plee plee   19779 2007-11-01 00:00 ChangeLog
-rwxr-xr-x 1 plee plee   44959 2010-03-21 19:35 config.guess
-rw-r--r-- 1 plee plee    2131 2010-03-21 19:35 config.h.in
-rwxr-xr-x 1 plee plee   14987 2004-03-01 23:01 config.rpath
-rwxr-xr-x 1 plee plee   34597 2010-03-21 19:35 config.sub
-rwxr-xr-x 1 plee plee  417371 2010-03-21 19:35 configure
-rw-r--r-- 1 plee plee    2461 2004-09-30 23:03 configure.in
-rw-r--r-- 1 plee plee   35147 2010-03-21 19:35 COPYING
drwxr-xr-x 3 plee plee    4096 2010-03-21 19:35 doc
-rw-r--r-- 1 plee plee   15578 2010-03-21 19:35 INSTALL
-rw-r--r-- 1 plee plee    9618 2000-03-03 22:37 INSTALL-ja
-rwxr-xr-x 1 plee plee   13663 2010-03-21 19:35 install-sh
-rw-r--r-- 1 plee plee    2820 2000-03-03 22:37 itaijidict
-rw-r--r-- 1 plee plee    1058 2000-12-27 01:15 kakasi-config.in
-rw-r--r-- 1 plee plee 2237449 2002-10-02 00:32 kakasidict
-rw-r--r-- 1 plee plee    2789 2010-03-21 19:36 kakasi.spec
-rw-r--r-- 1 plee plee    2789 2001-04-12 23:53 kakasi.spec.in
drwxr-xr-x 2 plee plee    4096 2010-03-21 19:36 lib
-rwxr-xr-x 1 plee plee  243455 2010-03-21 19:35 ltmain.sh
-rw-r--r-- 1 plee plee     113 2003-03-12 06:46 magic-kakasi
-rw-r--r-- 1 plee plee    1632 2001-01-04 09:14 maintMakefile
-rw-r--r-- 1 plee plee     811 2004-03-01 23:01 Makefile.am
-rw-r--r-- 1 plee plee   27109 2010-03-21 19:35 Makefile.in
drwxr-xr-x 2 plee plee    4096 2010-03-21 19:36 man
-rwxr-xr-x 1 plee plee   11419 2010-03-21 19:35 missing
-rw-r--r-- 1 plee plee    3038 2004-07-26 22:57 NEWS
-rw-r--r-- 1 plee plee    5632 2000-03-03 22:37 ONEWS
-rw-r--r-- 1 plee plee    1727 2000-04-26 20:16 README
-rw-r--r-- 1 plee plee    1505 2000-04-26 20:16 README-ja
drwxr-xr-x 2 plee plee    4096 2010-03-21 19:36 src
drwxr-xr-x 2 plee plee    4096 2010-03-21 19:36 tests
-rw-r--r-- 1 plee plee     783 2006-09-21 02:30 THANKS
-rw-r--r-- 1 plee plee     441 2001-04-13 03:02 TODO

I found the encoding is wrong, so I used iconv to convert:

plee@sos:~/Japanese/kakasi/kakasi-2.3.5pre1$ vim kakasidict 
plee@sos:~/Japanese/kakasi/kakasi-2.3.5pre1$ iconv -f "EUC-JP" -t "UTF8" kakasidict > kakasidict.UTF8
plee@sos:~/Japanese/kakasi/kakasi-2.3.5pre1$ vim kakasidict.UTF8 

Now the file seems okay, but there are weird entries whose last kana is a letter:

173 きづk 気付k
174 つk 付
368 いk 行
653 おりr 下り

What happened?

Was it helpful?

Solution

It is a special syntax to match several forms of the same verb.
For instance, いk will match いく (iku) いけない (ikenai) いかせる (ikaseru) いきたい (ikitai) ...
It is not just with k, but with all verb prefixes, see for instance the line たべt 食べ, and potentially with all these letters: w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,z,b,n,m.

I have no reference for this, but after inspecting the file I am pretty sure it works this way.
I maintain the Java version of Kakasi at https://github.com/nicolas-raoul/kakasi-java and I know that documentation for Kakasi is very scarce.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top