Package com.ibm.icu.text
Class UnicodeSetSpanner
- java.lang.Object
-
- com.ibm.icu.text.UnicodeSetSpanner
-
public class UnicodeSetSpanner extends Object
A helper class used to count, replace, and trim CharSequences based on UnicodeSet matches. An instance is immutable (and thus thread-safe) iff the source UnicodeSet is frozen.Note: The counting, deletion, and replacement depend on alternating a
UnicodeSet.SpanCondition
with its inverse. That is, the code spans, then spans for the inverse, then spans, and so on. For the inverse, the following mapping is used:UnicodeSet.SpanCondition.SIMPLE
→UnicodeSet.SpanCondition.NOT_CONTAINED
UnicodeSet.SpanCondition.CONTAINED
→UnicodeSet.SpanCondition.NOT_CONTAINED
UnicodeSet.SpanCondition.NOT_CONTAINED
→UnicodeSet.SpanCondition.SIMPLE
SIMPLE xxx[ab]cyyy CONTAINED xxx[abc]yyy NOT_CONTAINED [xxx]ab[cyyy] So here is what happens when you alternate:
start |xxxabcyyy NOT_CONTAINED xxx|abcyyy CONTAINED xxxabc|yyy NOT_CONTAINED xxxabcyyy| The entire string is traversed.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UnicodeSetSpanner.CountMethod
Options for replaceFrom and countIn to control how to treat each matched span.static class
UnicodeSetSpanner.TrimOption
Options for the trim() method
-
Constructor Summary
Constructors Constructor Description UnicodeSetSpanner(UnicodeSet source)
Create a spanner from a UnicodeSet.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
countIn(CharSequence sequence)
Returns the number of matching characters found in a character sequence, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE.int
countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod)
Returns the number of matching characters found in a character sequence, using SpanCondition.SIMPLE.int
countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
Returns the number of matching characters found in a character sequence.String
deleteFrom(CharSequence sequence)
Delete all the matching spans in sequence, using SpanCondition.SIMPLE The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.String
deleteFrom(CharSequence sequence, UnicodeSet.SpanCondition spanCondition)
Delete all matching spans in sequence, according to the spanCondition.boolean
equals(Object other)
UnicodeSet
getUnicodeSet()
Returns the UnicodeSet used for processing.int
hashCode()
String
replaceFrom(CharSequence sequence, CharSequence replacement)
Replace all matching spans in sequence by the replacement, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE.String
replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod)
Replace all matching spans in sequence by replacement, according to the CountMethod, using SpanCondition.SIMPLE.String
replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
Replace all matching spans in sequence by replacement, according to the countMethod and spanCondition.CharSequence
trim(CharSequence sequence)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start and end of the string, using TrimOption.BOTH and SpanCondition.SIMPLE.CharSequence
trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, using the trimOption and SpanCondition.SIMPLE.CharSequence
trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption, UnicodeSet.SpanCondition spanCondition)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, depending on the trimOption and spanCondition.
-
-
-
Constructor Detail
-
UnicodeSetSpanner
public UnicodeSetSpanner(UnicodeSet source)
Create a spanner from a UnicodeSet. For speed and safety, the UnicodeSet should be frozen. However, this class can be used with a non-frozen version to avoid the cost of freezing.- Parameters:
source
- the original UnicodeSet
-
-
Method Detail
-
getUnicodeSet
public UnicodeSet getUnicodeSet()
Returns the UnicodeSet used for processing. It is frozen iff the original was.- Returns:
- the construction set.
-
countIn
public int countIn(CharSequence sequence)
Returns the number of matching characters found in a character sequence, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- the sequence to count characters in- Returns:
- the count. Zero if there are none.
-
countIn
public int countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod)
Returns the number of matching characters found in a character sequence, using SpanCondition.SIMPLE. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- the sequence to count characters incountMethod
- whether to treat an entire span as a match, or individual elements as matches- Returns:
- the count. Zero if there are none.
-
countIn
public int countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
Returns the number of matching characters found in a character sequence. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- the sequence to count characters incountMethod
- whether to treat an entire span as a match, or individual elements as matchesspanCondition
- the spanCondition to use. SIMPLE or CONTAINED means only count the elements in the span; NOT_CONTAINED is the reverse.
WARNING: when a UnicodeSet contains strings, there may be unexpected behavior in edge cases.- Returns:
- the count. Zero if there are none.
-
deleteFrom
public String deleteFrom(CharSequence sequence)
Delete all the matching spans in sequence, using SpanCondition.SIMPLE The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- charsequence to replace matching spans in.- Returns:
- modified string.
-
deleteFrom
public String deleteFrom(CharSequence sequence, UnicodeSet.SpanCondition spanCondition)
Delete all matching spans in sequence, according to the spanCondition. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- charsequence to replace matching spans in.spanCondition
- specify whether to modify the matching spans (CONTAINED or SIMPLE) or the non-matching (NOT_CONTAINED)- Returns:
- modified string.
-
replaceFrom
public String replaceFrom(CharSequence sequence, CharSequence replacement)
Replace all matching spans in sequence by the replacement, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- charsequence to replace matching spans in.replacement
- replacement sequence. To delete, use ""- Returns:
- modified string.
-
replaceFrom
public String replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod)
Replace all matching spans in sequence by replacement, according to the CountMethod, using SpanCondition.SIMPLE. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- charsequence to replace matching spans in.replacement
- replacement sequence. To delete, use ""countMethod
- whether to treat an entire span as a match, or individual elements as matches- Returns:
- modified string.
-
replaceFrom
public String replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
Replace all matching spans in sequence by replacement, according to the countMethod and spanCondition. The code alternates spans; see the class doc forUnicodeSetSpanner
for a note about boundary conditions.- Parameters:
sequence
- charsequence to replace matching spans in.replacement
- replacement sequence. To delete, use ""countMethod
- whether to treat an entire span as a match, or individual elements as matchesspanCondition
- specify whether to modify the matching spans (CONTAINED or SIMPLE) or the non-matching (NOT_CONTAINED)- Returns:
- modified string.
-
trim
public CharSequence trim(CharSequence sequence)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start and end of the string, using TrimOption.BOTH and SpanCondition.SIMPLE. For example:new UnicodeSet("[ab]").trim("abacatbab")
"cat"
.- Parameters:
sequence
- the sequence to trim- Returns:
- a subsequence
-
trim
public CharSequence trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, using the trimOption and SpanCondition.SIMPLE. For example:new UnicodeSet("[ab]").trim("abacatbab", TrimOption.LEADING)
"catbab"
.- Parameters:
sequence
- the sequence to trimtrimOption
- LEADING, TRAILING, or BOTH- Returns:
- a subsequence
-
trim
public CharSequence trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption, UnicodeSet.SpanCondition spanCondition)
Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, depending on the trimOption and spanCondition. For example:new UnicodeSet("[ab]").trim("abacatbab", TrimOption.LEADING, SpanCondition.SIMPLE)
"catbab"
.- Parameters:
sequence
- the sequence to trimtrimOption
- LEADING, TRAILING, or BOTHspanCondition
- SIMPLE, CONTAINED or NOT_CONTAINED- Returns:
- a subsequence
-
-