public class ConcordanceToken extends Token implements IConcordanceToken
IConcordanceToken
.
This class requires JSemcor to be on the classpath.
Modifier and Type | Field and Description |
---|---|
static java.util.regex.Pattern |
semcorTokenPattern
A compiled regular expression pattern that captures the string
representation of tagged tokens.
|
static java.util.regex.Pattern |
whitespaceDelimited
A compiled regular expression a non-empty run of whitespace.
|
Constructor and Description |
---|
ConcordanceToken(java.lang.String text,
java.lang.String tag,
int tokenNum,
int partNum,
java.lang.String... stems)
Constructs a new semcor token object with the specified text, tag, token
number, part number and stems.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(java.lang.Object obj) |
int |
getPartNumber()
Returns the index of the part in the Semcor token from which this part
was extracted.
|
int |
getTokenNumber()
Returns the index of the token in the Semcor sentence from which it was
extracted.
|
int |
hashCode() |
static ConcordanceToken |
parse(java.lang.String str)
Parses a string of the form "test_NN_stem1_stem2_..._stemN_1_0" into a
ConcordanceToken instance. |
static java.util.List<ConcordanceToken> |
parseList(java.lang.String str)
Parses a string formed from the concatenation of strings of the form
"test-1-0-NN-stem1:stem2 " into a list of corresponding
ConcordanceToken instances. |
java.lang.String |
toString() |
static java.lang.String |
toString(IConcordanceToken token)
Returns the String representation of the given token.
|
static ConcordanceToken |
toToken(int tokenNum,
int partNum,
edu.mit.jsemcor.element.IWordform wf,
edu.mit.jsemcor.element.ISentence sent)
Constructs a semcor token object from the given token number, part
number, IWordform, and sentence drawn from the semcor corpus.
|
static java.util.List<ConcordanceToken> |
toTokens(edu.mit.jsemcor.element.IToken t,
int tokenNum,
edu.mit.jsemcor.element.ISentence sent)
Returns a list of Concordance token objects if the token specified by the
token number in the sentence is a continuous MWE.
|
checkStems, checkString, getForm, getStems, getTag
public static final java.util.regex.Pattern semcorTokenPattern
public static final java.util.regex.Pattern whitespaceDelimited
public ConcordanceToken(java.lang.String text, java.lang.String tag, int tokenNum, int partNum, java.lang.String... stems)
null
or
empty. If null
, no stems have been assigned. If empty, the
token is unstemmable.text
- the surface form of the token as it appears in the sentence,
capitalization intacttag
- the tag of the token, if assigned, otherwise null
tokenNum
- the token number. Must be greater than or equal to 0.partNum
- the part number representing the index of the token in a
multi-word expression, 0 if it is not part of one. Must be
greater than or equal to 0.stems
- the list of stems, possibly empty or null
java.lang.NullPointerException
- if the text is null
java.lang.IllegalArgumentException
- if the text is empty or all whitespace or if the token number
or part number is less than 0.public int getTokenNumber()
IConcordanceToken
getTokenNumber
in interface IConcordanceToken
public int getPartNumber()
IConcordanceToken
getPartNumber
in interface IConcordanceToken
public int hashCode()
hashCode
in class java.lang.Object
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public static java.lang.String toString(IConcordanceToken token)
token
- the token to be represented as a stringpublic static ConcordanceToken parse(java.lang.String str)
ConcordanceToken
instance.str
- the string representing the tagged tokenjava.lang.NullPointerException
- if the specified string is null
java.lang.IllegalArgumentException
- if the specified string does not match the expected formatpublic static java.util.List<ConcordanceToken> parseList(java.lang.String str)
ConcordanceToken
instances.str
- the concatenated string representing the tagged tokenjava.lang.NullPointerException
- if the specified string is null
java.lang.IllegalArgumentException
- if the specified string does not conform to the expected
formatpublic static java.util.List<ConcordanceToken> toTokens(edu.mit.jsemcor.element.IToken t, int tokenNum, edu.mit.jsemcor.element.ISentence sent)
t
- the token specified by the token number in the given sentencetokenNum
- the token number of the token to be translated into a
concordance token objectsent
- the sentencepublic static ConcordanceToken toToken(int tokenNum, int partNum, edu.mit.jsemcor.element.IWordform wf, edu.mit.jsemcor.element.ISentence sent)
tokenNum
- the token numberpartNum
- the part numberwf
- the word formsent
- the JSemcor sentenceCopyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.