Segmentation rules for non latin based languages like Chinese, Japanese
While exploring globalsight.com ,I came across the segmentation rules(link).It uses full stop(.) as a language delimiter. which segmentaion rules can we use for segment the non latin based Languages for which a dot(.) mean something other than a delimiter or for the languages which don't have any delimiters Example –Chinese,Japanese and Korean
What are the language segmentation rules used for these “non latin”(Chinese,Japanese) languages? How are the segmentation rules developed ?
Thanks in advance, Manjushree
No correct solution
Japanese uses kinsoku shori. Not sure about the other two though.
Trados, the leading translation memory application, uses the following segmentation rules:
For Japanese and Chinese: