| 
 |   | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Object | +--sunlabs.brazil.util.regexp.Regexp
The Regexp class can be used to match a pattern against a
 string and optionally replace the matched parts with new strings.
 
Regular expressions were implemented by translating Henry Spencer's regular expression package for tcl8.0. Much of the description below is copied verbatim from the tcl8.0 regsub manual entry.
 A regular expression is zero or more branches, separated by
 "|".  It matches anything that matches one of the branches.
 
 A branch is zero or more pieces, concatenated.
 It matches a match for the first piece, followed by a match for the
 second piece, etc.
 
 A piece is an atom, possibly followed by "*", "+", or
 "?". 
An atom is
range (see below)
 
 A range is a sequence of characters enclosed in "[]".
 The range normally matches any single character from the sequence.
 If the sequence begins with "^", the range matches any single character
 not from the rest of the sequence.
 If two characters in the sequence are separated by "-", this is shorthand
 for the full list of characters between them (e.g. "[0-9]" matches any
 decimal digit).  To include a literal "]" in the sequence, make it the
 first character (following a possible "^").  To include a literal "-",
 make it the first or last character.
 
In general there may be more than one way to match a regular expression to an input string. For example, consider the command
 String[] match = new String[2];
 Regexp.match("(a*)b*", "aabaaabb", match);
 
 Considering only the rules given so far, match[0] and
 match[1] could end up with the values In the example from above, "(a*)b*" therefore matches exactly "aab"; the "(a*)" portion of the pattern is matched first and it consumes the leading "aa", then the "b*" portion of the pattern consumes the next "b". Or, consider the following example:
 String match = new String[3];
 Regexp.match("(ab|a)(b*)c", "abc", match);
 
 After this command, match[0] will be "abc",
 match[1] will be "ab", and match[2] will be an
 empty string.
 Rule 4 specifies that the "(ab|a)" component gets first shot at the input
 string and Rule 2 specifies that the "ab" sub-expression
 is checked before the "a" sub-expression.
 Thus the "b" has already been claimed before the "(b*)"
 component is checked and therefore "(b*)" must match an empty string.
 Regular expression substitution matches a string against a regular expression, transforming the string by replacing the matched region(s) with new substring(s).
 What gets substituted into the result is controlled by a
 subspec.  The subspec is a formatting string that specifies
 what portions of the matched region should be substituted into the
 result.
 
n", where n is a digit from 1 to 9,
 is replaced with a copy of the nth subexpression.
 backslash and "2", not the Unicode character 0002.
 
    public static void
    main(String[] args)
	throws Exception
    {
	Regexp re;
	String[] matches;
	String s;
	/*
	 * A regular expression to match the first line of a HTTP request.
	 *
	 * 1. ^               - starting at the beginning of the line
	 * 2. ([A-Z]+)        - match and remember some upper case characters
	 * 3. [ \t]+          - skip blank space
	 * 4. ([^ \t]*)       - match and remember up to the next blank space
	 * 5. [ \t]+          - skip more blank space
	 * 6. (HTTP/1\\.[01]) - match and remember HTTP/1.0 or HTTP/1.1
	 * 7. $		      - end of string - no chars left.
	 */
	s = "GET http://a.b.com:1234/index.html HTTP/1.1";
	re = new Regexp("^([A-Z]+)[ \t]+([^ \t]+)[ \t]+(HTTP/1\\.[01])$");
	matches = new String[4];
	if (re.match(s, matches)) {
	    System.out.println("METHOD  " + matches[1]);
	    System.out.println("URL     " + matches[2]);
	    System.out.println("VERSION " + matches[3]);
	}
	/*
	 * A regular expression to extract some simple comma-separated data,
	 * reorder some of the columns, and discard column 2.
	 */
	s = "abc,def,ghi,klm,nop,pqr";
	re = new Regexp("^([^,]+),([^,]+),([^,]+),(.*)");
	System.out.println(re.sub(s, "\\3,\\1,\\4"));
    }
 
Regsub, 
Serialized Form| Nested Class Summary | |
| static interface | Regexp.FilterThis interface is used by the Regexpclass to generate
 the replacement string for each pattern match found in the source
 string. | 
| Constructor Summary | |
| Regexp(String pat)Compiles a new Regexp object from the given regular expression pattern. | |
| Regexp(String pat,
       boolean ignoreCase)Compiles a new Regexp object from the given regular expression pattern. | |
| Method Summary | |
| static void | applySubspec(Regsub rs,
             String subspec,
             StringBuffer sb)Utility method to give access to the standard substitution algorithm used by subandsubAll. | 
| static void | main(String[] args) | 
|  String | match(String str)Matches the given string against this regular expression. | 
|  boolean | match(String str,
      int[] indices)Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions. | 
|  boolean | match(String str,
      String[] substrs)Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions. | 
|  String | sub(String str,
    Regexp.Filter rf) | 
|  String | sub(String str,
    String subspec)Matches a string against a regular expression and replaces the first match with the string generated from the substitution parameter. | 
|  String | subAll(String str,
       String subspec)Matches a string against a regular expression and replaces all matches with the string generated from the substitution parameter. | 
|  int | subspecs()Returns the number of parenthesized subexpressions in this regular expression, plus one more for this expression itself. | 
|  String | toString()Returns a string representation of this compiled regular expression. | 
| Methods inherited from class java.lang.Object | 
| equals, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
| Constructor Detail | 
public Regexp(String pat)
       throws IllegalArgumentException
It takes a certain amount of time to parse and validate a regular expression pattern before it can be used to perform matches or substitutions. If the caller caches the new Regexp object, that parsing time will be saved because the same Regexp can be used with respect to many different strings.
pat - The string holding the regular expression pattern.
IllegalArgumentException - if the pattern is malformed.
		The detail message for the exception will be set to a
		string indicating how the pattern was malformed.
public Regexp(String pat,
              boolean ignoreCase)
       throws IllegalArgumentException
pat - The string holding the regular expression pattern.ignoreCase - If true then this regular expression will
		do case-insensitive matching.  If false, then
		the matches are case-sensitive.  Regular expressions
		generated by Regexp(String) are case-sensitive.
IllegalArgumentException - if the pattern is malformed.
		The detail message for the exception will be set to a
		string indicating how the pattern was malformed.| Method Detail | 
public static void main(String[] args)
                 throws Exception
Exceptionpublic int subspecs()
public String match(String str)
str - The string to match.
str that matched the entire
		regular expression, or null if the string did not
		match this regular expression.
public boolean match(String str,
                     String[] substrs)
 substrs[0] is set to the range of str
 that matched the entire regular expression.
 
 substrs[1] is set to the range of str
 that matched the first (leftmost) parenthesized subexpression.
 substrs[n] is set to the range that matched the
 nth subexpression, and so on.
 
 If subexpression n did not match, then
 substrs[n] is set to null.  Not to
 be confused with "", which is a valid value for a
 subexpression that matched 0 characters.
 
 The length that the caller should use when allocating the
 substr array is the return value of
 Regexp.subspecs.  The array
 can be shorter (in which case not all the information will
 be returned), or longer (in which case the remainder of the
 elements are initialized to null), or
 null (to ignore the subexpressions).
str - The string to match.substrs - An array of strings allocated by the caller, and filled in
		with information about the portions of str that
		matched the regular expression.  May be null.
true if str that matched this
		regular expression, false otherwise.
		If false is returned, then the contents of
		substrs are unchanged.subspecs()
public boolean match(String str,
                     int[] indices)
For the indices specified below, the range extends from the character at the starting index up to, but not including, the character at the ending index.
 indices[0] and indices[1] are set to
 starting and ending indices of the range of str
 that matched the entire regular expression.
 
 indices[2] and indices[3] are set to the
 starting and ending indices of the range of str that
 matched the first (leftmost) parenthesized subexpression.
 indices[n * 2] and indices[n * 2 + 1]
 are set to the range that matched the nth
 subexpression, and so on.
 
 If subexpression n did not match, then
 indices[n * 2] and indices[n * 2 + 1]
 are both set to -1.
 
 The length that the caller should use when allocating the
 indices array is twice the return value of
 Regexp.subspecs.  The array
 can be shorter (in which case not all the information will
 be returned), or longer (in which case the remainder of the
 elements are initialized to -1), or
 null (to ignore the subexpressions).
str - The string to match.indices - An array of integers allocated by the caller, and filled in
		with information about the portions of str that
		matched all the parts of the regular expression.
		May be null.
true if the string matched the regular expression,
		false otherwise.  If false is
		returned, then the contents of indices are
		unchanged.subspecs()
public String sub(String str,
                  String subspec)
str - The string to match against this regular expression.subspec - The substitution parameter, described in 
		REGULAR EXPRESSION SUBSTITUTION.
str with the string generated from
		subspec.  If no matches were found, then
		the return value is null.
public String subAll(String str,
                     String subspec)
str - The string to match against this regular expression.subspec - The substitution parameter, described in 
		REGULAR EXPRESSION SUBSTITUTION.
str with the strings generated from
		subspec.  If no matches were found, then
		the return value is a copy of str.
public static void applySubspec(Regsub rs,
                                String subspec,
                                StringBuffer sb)
sub and subAll.  Appends to the
 string buffer the string generated by applying the substitution
 parameter to the matched region.
rs - Information about the matched region.subspec - The substitution parameter.sb - StringBuffer to which the generated string is appended.
public String sub(String str,
                  Regexp.Filter rf)
public String toString()
toString in class Object| 
 | Version 2.1, Generated 12/30/04 Copyright (c) 2001-2004, Sun Microsystems. | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||