Frage

I am looking for dictionary source file format which is convenient to edit in general purpose text editor (I use Emacs) and can be stored under version control (and allow easy merge).

So I can share my work on dictionary with other people.

Without supporting tools that convert dict source to indexed form for use with dictionary software file format is not usable.

As client dictionary software I use GoldenDict: http://goldendict.org/dictionaries.php which understand many indexed dicts file format but I switch to another if you recommend.

Previously I use TAB format (original file format for StarDict, which is converted to .dict.dz/.idx/.ifo). It is UGLY! Each word definition must stay into single line and word separate from definition by TAB. Look:

be  be [] (inf.)\n was/were [] (past ind.)\n been [] (past part.)\n     (infinitive) быть, нести, родить

Next I try use Dictd file format. It is much better:

  $ cat my.txt
...
adj.
adjective

   [ˈæʤɪktɪv]
   adj.
   adjective
   прилагательное

  $ dictfmt -f \
      --utf8 --allchars --case-sensitive \
      -s 'Usual dictionary abbreviations.' \
      my <my.txt 

I also know about http://xdxf.sourceforge.net/. But deep look say that this is dead project. And have two noticeable omissions:

  • it is hard to write article with noise XML typesetting
  • there no tools to convert source to usable dictionaries (you can convert to other source format only and then produce usable dictionaries...)

I look for better solution...


Why? I learn foreign language and want create dictionary of learned word to revise them and further use.

Also I involved into translation of Open Source software project manuals and Free books and want maintain list of terms so created dictionary can be shared between translators to stay in consistent translation.

War es hilfreich?

Lösung

As I say TAB-file for stardict is DUMB.

Currently I use -c5 format from dict (I mail to http://sourceforge.net/projects/dict/ developer and he say that it is more suitable and recommended format):

  $ dictfmt -c5 --headword-separator '; ' --break-headwords \
  --utf8 --allchars --case-sensitive \
  --mime-header 'Content-Type: text/html' \
  -s 'Usual dictionary abbreviations.' -u http://gavenkoa.users.sourceforge.net/ \
  ${prefix_for_output} <${input}

for articles:

_____

adj.; adjective

  [ˈæʤɪktɪv]
  adj.
  adjective
  прилагательное

_____

adv.; adverb

  []
  adv.
  adverb
  наречие

Author of GoldenDict recommend use Abby Lingvo DSL rich format which allow formating for bold/italic, colors and some more. To get description use Lingvo0.chm from Lingvo distribution. Look for example:

chat
  [m1][p][i][c][com]m[/com][/c][/i][/p][/m]
  [m1][trn]кот, кошка;[/trn][/m]
  [m2][*][ex][lang id=4]Chat botté[/lang] — Кот в сапогах;[/ex][/*][/m]
  [m2][*][ex][lang id=4]jouer à chat perché[/lang] — играть в кошки-мышки[/ex][/*][/m]
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top