Class LaoBreakIterator

  • All Implemented Interfaces:
    Cloneable

    public class LaoBreakIterator
    extends com.ibm.icu.text.BreakIterator
    Syllable iterator for Lao text.

    This breaks Lao text into syllables according to: Syllabification of Lao Script for Line Breaking Phonpasit Phissamay, Valaxay Dalolay, Chitaphone Chanhsililath, Oulaiphone Silimasak, Sarmad Hussain, Nadir Durrani, Science Technology and Environment Agency, CRULP.

    • http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf
    • http://www.panl10n.net/Presentations/Cambodia/Phonpassit/LineBreakingAlgo.pdf

    Most work is accomplished with RBBI rules, however some additional special logic is needed that cannot be coded in a grammar, and this is implemented here.

    For example, what appears to be a final consonant might instead be part of the next syllable. Rules match in a greedy fashion, leaving an illegal sequence that matches no rules.

    Take for instance the text ກວ່າດອກ The first rule greedily matches ກວ່າດ, but then ອກ is encountered, which is illegal. What LaoBreakIterator does, according to the paper:

    1. backtrack and remove the ດ from the last syllable, placing it on the current syllable.
    2. verify the modified previous syllable (ກວ່າ ) is still legal.
    3. verify the modified current syllable (ດອກ) is now legal.
    4. If 2 or 3 fails, then restore the ດ to the last syllable and skip the current character.

    Finally, LaoBreakIterator also takes care of the second concern mentioned in the paper. This is the issue of combining marks being in the wrong order (typos).

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Field Summary

      • Fields inherited from class com.ibm.icu.text.BreakIterator

        DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD
    • Constructor Summary

      Constructors 
      Constructor Description
      LaoBreakIterator​(com.ibm.icu.text.RuleBasedBreakIterator rules)  
    • Constructor Detail

      • LaoBreakIterator

        public LaoBreakIterator​(com.ibm.icu.text.RuleBasedBreakIterator rules)
    • Method Detail

      • current

        public int current()
        Specified by:
        current in class com.ibm.icu.text.BreakIterator
      • first

        public int first()
        Specified by:
        first in class com.ibm.icu.text.BreakIterator
      • following

        public int following​(int offset)
        Specified by:
        following in class com.ibm.icu.text.BreakIterator
      • getText

        public CharacterIterator getText()
        Specified by:
        getText in class com.ibm.icu.text.BreakIterator
      • last

        public int last()
        Specified by:
        last in class com.ibm.icu.text.BreakIterator
      • next

        public int next()
        Specified by:
        next in class com.ibm.icu.text.BreakIterator
      • next

        public int next​(int n)
        Specified by:
        next in class com.ibm.icu.text.BreakIterator
      • previous

        public int previous()
        Specified by:
        previous in class com.ibm.icu.text.BreakIterator
      • setText

        public void setText​(CharacterIterator text)
        Specified by:
        setText in class com.ibm.icu.text.BreakIterator
      • setText

        public void setText​(String newText)
        Overrides:
        setText in class com.ibm.icu.text.BreakIterator
      • clone

        public Object clone()
        Clone method. Creates another LaoBreakIterator with the same behavior and current state as this one.
        Overrides:
        clone in class com.ibm.icu.text.BreakIterator
        Returns:
        The clone.