|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectandyr.jtokeniser.Tokeniser
andyr.jtokeniser.RegexTokeniser
public class RegexTokeniser
The RegexTokeniser class uses regular expressions to define a word, and
tokenises according to that expression. All matching is performed via Java's
Pattern
and Matcher
classes.
The following is one example of the use of the tokeniser. The code:
RegexTokeniser ret = new RegexTokeniser("the cat sat on the mat", "\\w+"); while (ret.hasMoreTokens()) { System.out.println(ret.nextToken()); }
prints the following output:
It is also possible to keep the strings inbetween tokens should it be necessary. By default these are discarded. Note, it won't keep anything before the first match or anything after the last match. For example, take the string "123abc456def789" and the regular expression "\\D+" (one or more non-digits):the sat on the mat
RegexTokeniser ret = new RegexTokeniser("123abc456def789", "\\D+"); while (ret.hasMoreTokens()) { System.out.println(ret.nextToken()); }
prints the following output:
abc 456 def
Field Summary |
---|
Fields inherited from class andyr.jtokeniser.Tokeniser |
---|
currentTokenPosition, tokens |
Constructor Summary | |
---|---|
RegexTokeniser(java.lang.String input)
Creates a RegexTokeniser that tokenises the input. |
|
RegexTokeniser(java.lang.String input,
java.lang.String regex)
Creates a RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token. |
|
RegexTokeniser(java.lang.String input,
java.lang.String regex,
boolean keepDelim)
Creates a RegexTokeniser that tokenises the input
according a regular expression that defines a "word" or token. |
Method Summary |
---|
Methods inherited from class andyr.jtokeniser.Tokeniser |
---|
countTokens, getTokens, hasMoreTokens, nextToken, numberOfTokens |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RegexTokeniser(java.lang.String input, java.lang.String regex, boolean keepDelim)
RegexTokeniser
that tokenises the input
according a regular expression that defines a "word" or token. If
keepDelit
is true then all the strings in between the tokens
are kept as tokens too.
input
- a string from which the tokens will be extracted.regex
- the regular expression.keepDelim
- flag indicating whether to return the delimiters as tokens.Pattern
public RegexTokeniser(java.lang.String input, java.lang.String regex)
RegexTokeniser
that tokenises the input
according a regular expression that defines a "word" or token.
input
- a string from which the tokens will be extracted.regex
- the regular expression.Pattern
public RegexTokeniser(java.lang.String input)
RegexTokeniser
that tokenises the input.
input
- a string from which the tokens will be extracted.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |