Package net.sf.saxon.regex
Class UnicodeString
- java.lang.Object
-
- net.sf.saxon.regex.UnicodeString
-
- All Implemented Interfaces:
java.lang.CharSequence,java.lang.Comparable<UnicodeString>,AtomicMatchKey
- Direct Known Subclasses:
BMPString,EmptyString,GeneralUnicodeString,LatinString
public abstract class UnicodeString extends java.lang.Object implements java.lang.CharSequence, java.lang.Comparable<UnicodeString>, AtomicMatchKey
An abstract class that efficiently handles Unicode strings including non-BMP characters; it has three subclasses, respectively handling strings whose maximum character code is 255, 65535, or 1114111.
-
-
Field Summary
-
Fields inherited from interface net.sf.saxon.expr.sort.AtomicMatchKey
NaN_MATCH_KEY
-
-
Constructor Summary
Constructors Constructor Description UnicodeString()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description AtomicValueasAtomic()Get an atomic value that encapsulates this match key.intcompareTo(UnicodeString other)Compare two unicode strings in codepoint collating sequencestatic booleancontainsSurrogatePairs(java.lang.CharSequence value)Test whether a CharSequence contains Unicode codepoints outside the BMP rangebooleanequals(java.lang.Object obj)Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceinthashCode()Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceabstract booleanisEnd(int pos)Ask whether a given position is at (or beyond) the end of the stringstatic UnicodeStringmakeUnicodeString(int[] in)Make a UnicodeString for a given array of codepointsstatic UnicodeStringmakeUnicodeString(java.lang.CharSequence in)Make a UnicodeString for a given CharSequenceabstract intuCharAt(int pos)Get the character at a specified positionabstract intuIndexOf(int search, int start)Get the first match for a given characterabstract intuLength()Get the length of the string, in Unicode codepointsabstract UnicodeStringuSubstring(int beginIndex, int endIndex)Get a substring of this string
-
-
-
Method Detail
-
makeUnicodeString
public static UnicodeString makeUnicodeString(java.lang.CharSequence in)
Make a UnicodeString for a given CharSequence- Parameters:
in- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
makeUnicodeString
public static UnicodeString makeUnicodeString(int[] in)
Make a UnicodeString for a given array of codepoints- Parameters:
in- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
containsSurrogatePairs
public static boolean containsSurrogatePairs(java.lang.CharSequence value)
Test whether a CharSequence contains Unicode codepoints outside the BMP range- Parameters:
value- the string to be tested- Returns:
- true if the string contains non-BMP codepoints
-
uSubstring
public abstract UnicodeString uSubstring(int beginIndex, int endIndex)
Get a substring of this string- Parameters:
beginIndex- the index of the first character to be included (counting codepoints, not 16-bit characters)endIndex- the index of the first character to be NOT included (counting codepoints, not 16-bit characters)- Returns:
- a substring
- Throws:
java.lang.IndexOutOfBoundsException- if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
-
uIndexOf
public abstract int uIndexOf(int search, int start)Get the first match for a given character- Parameters:
search- the character to look forstart- the first position to look- Returns:
- the position of the first occurrence of the sought character, or -1 if not found
-
uCharAt
public abstract int uCharAt(int pos)
Get the character at a specified position- Parameters:
pos- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- a character (Unicode codepoint) at the specified position.
-
uLength
public abstract int uLength()
Get the length of the string, in Unicode codepoints- Returns:
- the number of codepoints in the string
-
isEnd
public abstract boolean isEnd(int pos)
Ask whether a given position is at (or beyond) the end of the string- Parameters:
pos- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- true iff if the specified index is after the end of the character stream
-
hashCode
public int hashCode()
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence- Overrides:
hashCodein classjava.lang.Object- Returns:
- a hashCode that distinguishes this UnicodeString from others
-
equals
public boolean equals(java.lang.Object obj)
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence- Overrides:
equalsin classjava.lang.Object- Parameters:
obj- the object to be compared- Returns:
- true if obj is a UnicodeString containing the same codepoints
-
compareTo
public int compareTo(UnicodeString other)
Compare two unicode strings in codepoint collating sequence- Specified by:
compareToin interfacejava.lang.Comparable<UnicodeString>- Parameters:
other- the object to be compared- Returns:
- less than 0, 0, or greater than 0 depending on the ordering of the two strings
-
asAtomic
public AtomicValue asAtomic()
Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.- Specified by:
asAtomicin interfaceAtomicMatchKey- Returns:
- an atomic value that encapsulates this match key
-
-