|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsem.tokeniser.Tokeniser
public class Tokeniser
A very simple tokeniser implementation.
Separates text into tokens and sentences using heuristic rules. It does not function well in cases where sentences are surrounded by quotes, so it is best to remove those beforehand.
Constructor Summary | |
---|---|
Tokeniser()
|
Method Summary | |
---|---|
static void |
main(java.lang.String[] args)
|
static java.util.ArrayList<java.lang.String> |
sentenceSplit(java.lang.String tokenisedText)
Take tokenised text and split it into separate sentences. |
static java.lang.String |
tokenise(java.lang.String text)
Split the text into tokens and sentences. |
static java.util.ArrayList<java.lang.String> |
tokeniseAndSplit(java.lang.String text)
First tokenise, then sentence-split the text. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Tokeniser()
Method Detail |
---|
public static java.lang.String tokenise(java.lang.String text)
text
- Text to be tokenised.
public static java.util.ArrayList<java.lang.String> sentenceSplit(java.lang.String tokenisedText)
tokenisedText
- Tokenised text.
public static java.util.ArrayList<java.lang.String> tokeniseAndSplit(java.lang.String text)
text
- Input text.
public static void main(java.lang.String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |