T - the type of token used by this scorerpublic class LeskScore<T extends IToken> extends AbstractScorer<IMWE<T>>
| Modifier and Type | Field and Description |
|---|---|
protected java.util.Set<java.lang.String> |
contextWords |
protected edu.mit.jwi.IDictionary |
dict |
protected static java.util.regex.Pattern |
punctuation |
protected edu.mit.jwi.morph.IStemmer |
stemmer |
protected static java.util.regex.Pattern |
whitespace |
| Constructor and Description |
|---|
LeskScore(java.util.List<T> sentence,
edu.mit.jwi.IDictionary dict)
Constructs a new lesk scorer for the specified sentence and dictionary.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.util.List<java.lang.String> |
getContentWords(java.lang.String str)
Given a string representation of a sentence, removes all punctuation and
stop words.
|
protected java.util.List<java.lang.String> |
getGlosses(java.lang.String lemma,
MWEPOS pos)
Returns a list of the glosses of a word or MWE by looking up its lemma
and part of speech in the dictionary.
|
protected java.util.Set<java.lang.String> |
getStemmedWords(java.util.Collection<java.lang.String> words)
Returns a set of string containing all the string in the specified list, as well as all the stemmed versions of those strings.
|
protected java.util.Set<java.lang.String> |
getStopWords()
Returns the set of stop words for this scorer.
|
protected int |
overlap(java.lang.String gloss)
Returns the number of elements the gloss has in common with the stemmed
word list
|
double |
score(IMWE<T> mwe)
Score the specified object.
|
compareprotected final java.util.Set<java.lang.String> contextWords
protected final edu.mit.jwi.IDictionary dict
protected final edu.mit.jwi.morph.IStemmer stemmer
protected static final java.util.regex.Pattern whitespace
protected static final java.util.regex.Pattern punctuation
public LeskScore(java.util.List<T> sentence, edu.mit.jwi.IDictionary dict)
sentence - the sentence for the scorerdict - the dictionary to be used by the scorer; may not be
nulljava.lang.NullPointerException - if either argument is nullpublic double score(IMWE<T> mwe)
IScorernull,
depending on the implementation.mwe - the object to be scoredprotected java.util.List<java.lang.String> getContentWords(java.lang.String str)
str - the string from which the content words will be extractedprotected java.util.Set<java.lang.String> getStopWords()
protected java.util.List<java.lang.String> getGlosses(java.lang.String lemma,
MWEPOS pos)
lemma - the lemma of the word or MWEpos - the part of speech of the word. If it is a proper noun, this
method will try looking up the word as a noun, just in case it
is listed as such in the dictionary.protected int overlap(java.lang.String gloss)
gloss - the glossprotected java.util.Set<java.lang.String> getStemmedWords(java.util.Collection<java.lang.String> words)
words - the collection of strings to be stemmedCopyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.