edu.mit.jmwe.detect.score
Class LeskScore<T extends IToken>

java.lang.Object
  extended by edu.mit.jmwe.detect.score.AbstractScorer<IMWE<T>>
      extended by edu.mit.jmwe.detect.score.LeskScore<T>
Type Parameters:
T - the type of token used by this scorer
All Implemented Interfaces:
IScorer<IMWE<T>>, Comparator<IMWE<T>>

public class LeskScore<T extends IToken>
extends AbstractScorer<IMWE<T>>

Scores an object with its lesk-score overlap with dictionary glosses.

Since:
jMWE 1.0.0
Version:
$Id: LeskScore.java 620 2011-05-08 21:13:58Z markaf $
Author:
M.A. Finlayson

Field Summary
protected  Set<String> contextWords
           
protected  edu.mit.jwi.IDictionary dict
           
protected static Pattern punctuation
           
protected  edu.mit.jwi.morph.IStemmer stemmer
           
protected static Pattern whitespace
           
 
Constructor Summary
LeskScore(List<T> sentence, edu.mit.jwi.IDictionary dict)
          Constructs a new lesk scorer for the specified sentence and dictionary.
 
Method Summary
protected  List<String> getContentWords(String str)
          Given a string representation of a sentence, removes all punctuation and stop words.
protected  List<String> getGlosses(String lemma, MWEPOS pos)
          Returns a list of the glosses of a word or MWE by looking up its lemma and part of speech in the dictionary.
protected  Set<String> getStemmedWords(Collection<String> words)
          Returns a set of string containing all the string in the specified list, as well as all the stemmed versions of those strings.
protected  Set<String> getStopWords()
          Returns the set of stop words for this scorer.
protected  int overlap(String gloss)
          Returns the number of elements the gloss has in common with the stemmed word list
 double score(IMWE<T> mwe)
          Score the specified object.
 
Methods inherited from class edu.mit.jmwe.detect.score.AbstractScorer
compare
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.util.Comparator
equals
 

Field Detail

contextWords

protected final Set<String> contextWords

dict

protected final edu.mit.jwi.IDictionary dict

stemmer

protected final edu.mit.jwi.morph.IStemmer stemmer

whitespace

protected static final Pattern whitespace

punctuation

protected static final Pattern punctuation
Constructor Detail

LeskScore

public LeskScore(List<T> sentence,
                 edu.mit.jwi.IDictionary dict)
Constructs a new lesk scorer for the specified sentence and dictionary.

Parameters:
sentence - the sentence for the scorer
dict - the dictionary to be used by the scorer; may not be null
Throws:
NullPointerException - if either argument is null
Since:
jMWE 1.0.0
Method Detail

score

public double score(IMWE<T> mwe)
Description copied from interface: IScorer
Score the specified object. The object may be null, depending on the implementation.

Parameters:
mwe - the object to be scored
Returns:
the score

getContentWords

protected List<String> getContentWords(String str)
Given a string representation of a sentence, removes all punctuation and stop words. Returns a list of the remaining content words (assuming words are delimited by whitespace).

Parameters:
str - the string from which the content words will be extracted
Returns:
a list of all the content words in the string, in lower case.
Since:
jMWE 1.0.0

getStopWords

protected Set<String> getStopWords()
Returns the set of stop words for this scorer.

Returns:
the set of stop words for this scorer
Since:
jMWE 1.0.0

getGlosses

protected List<String> getGlosses(String lemma,
                                  MWEPOS pos)
Returns a list of the glosses of a word or MWE by looking up its lemma and part of speech in the dictionary.

Parameters:
lemma - the lemma of the word or MWE
pos - the part of speech of the word. If it is a proper noun, this method will try looking up the word as a noun, just in case it is listed as such in the dictionary.
Returns:
a list of the glosses of a word or MWE, empty if none were found.
Since:
jMWE 1.0.0

overlap

protected int overlap(String gloss)
Returns the number of elements the gloss has in common with the stemmed word list

Parameters:
gloss - the gloss
Returns:
the number of elements in common
Since:
jMWE 1.0.0

getStemmedWords

protected Set<String> getStemmedWords(Collection<String> words)
Returns a set of string containing all the string in the specified list, as well as all the stemmed versions of those strings.

Parameters:
words - the collection of strings to be stemmed
Returns:
all the words and all of their stems
Since:
jMWE 1.0.0


Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.