Question

While exploring globalsight.com ,I came across the segmentation rules(link).It uses full stop(.) as a language delimiter. which segmentaion rules can we use for segment the non latin based Languages for which a dot(.) mean something other than a delimiter or for the languages which don't have any delimiters Example –Chinese,Japanese and Korean

What are the language segmentation rules used for these “non latin”(Chinese,Japanese) languages? How are the segmentation rules developed ?

Thanks in advance, Manjushree

No correct solution

OTHER TIPS

Japanese uses kinsoku shori. Not sure about the other two though.

Trados, the leading translation memory application, uses the following segmentation rules:

For Japanese and Chinese:

Full Stop:

Colons: ::

Punctuation: ?!?!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow