T
- the type of token used by this scorerpublic class LeskScore<T extends IToken> extends AbstractScorer<IMWE<T>>
Modifier and Type | Field and Description |
---|---|
protected java.util.Set<java.lang.String> |
contextWords |
protected edu.mit.jwi.IDictionary |
dict |
protected static java.util.regex.Pattern |
punctuation |
protected edu.mit.jwi.morph.IStemmer |
stemmer |
protected static java.util.regex.Pattern |
whitespace |
Constructor and Description |
---|
LeskScore(java.util.List<T> sentence,
edu.mit.jwi.IDictionary dict)
Constructs a new lesk scorer for the specified sentence and dictionary.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.List<java.lang.String> |
getContentWords(java.lang.String str)
Given a string representation of a sentence, removes all punctuation and
stop words.
|
protected java.util.List<java.lang.String> |
getGlosses(java.lang.String lemma,
MWEPOS pos)
Returns a list of the glosses of a word or MWE by looking up its lemma
and part of speech in the dictionary.
|
protected java.util.Set<java.lang.String> |
getStemmedWords(java.util.Collection<java.lang.String> words)
Returns a set of string containing all the string in the specified list, as well as all the stemmed versions of those strings.
|
protected java.util.Set<java.lang.String> |
getStopWords()
Returns the set of stop words for this scorer.
|
protected int |
overlap(java.lang.String gloss)
Returns the number of elements the gloss has in common with the stemmed
word list
|
double |
score(IMWE<T> mwe)
Score the specified object.
|
compare
protected final java.util.Set<java.lang.String> contextWords
protected final edu.mit.jwi.IDictionary dict
protected final edu.mit.jwi.morph.IStemmer stemmer
protected static final java.util.regex.Pattern whitespace
protected static final java.util.regex.Pattern punctuation
public LeskScore(java.util.List<T> sentence, edu.mit.jwi.IDictionary dict)
sentence
- the sentence for the scorerdict
- the dictionary to be used by the scorer; may not be
null
java.lang.NullPointerException
- if either argument is null
public double score(IMWE<T> mwe)
IScorer
null
,
depending on the implementation.mwe
- the object to be scoredprotected java.util.List<java.lang.String> getContentWords(java.lang.String str)
str
- the string from which the content words will be extractedprotected java.util.Set<java.lang.String> getStopWords()
protected java.util.List<java.lang.String> getGlosses(java.lang.String lemma, MWEPOS pos)
lemma
- the lemma of the word or MWEpos
- the part of speech of the word. If it is a proper noun, this
method will try looking up the word as a noun, just in case it
is listed as such in the dictionary.protected int overlap(java.lang.String gloss)
gloss
- the glossprotected java.util.Set<java.lang.String> getStemmedWords(java.util.Collection<java.lang.String> words)
words
- the collection of strings to be stemmedCopyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.