edu.mit.jmwe.harness
Class ConcordanceAnswerKey

java.lang.Object
  extended by edu.mit.jmwe.harness.ConcordanceAnswerKey
All Implemented Interfaces:
IAnswerKey

public class ConcordanceAnswerKey
extends Object
implements IAnswerKey

Default implementation of the IAnswerKey interface. Searches for the answer multi-word expressions in an IConcordanceSentence by using a Semcor corpus, which has multi-word expressions annotated.

This class requires JSemcor to be on the classpath.

Since:
jMWE 1.0.0
Version:
$Id: ConcordanceAnswerKey.java 620 2011-05-08 21:13:58Z markaf $
Author:
M.A. Finlayson, N. Kulkarni

Field Summary
static Pattern condordanceSentenceIDPattern
          A compiled regular expression pattern that captures the string representation of a Semcor sentence ID.
static Pattern lexSensePattern
          A compiled regular expression pattern that captures the string representation of sense key.
 
Constructor Summary
ConcordanceAnswerKey(edu.mit.jsemcor.main.IConcordance c)
          Constructs an answer key from a single concordance
ConcordanceAnswerKey(Iterable<? extends edu.mit.jsemcor.main.IConcordance> i)
          Constructs an answer key from the given semcor concordance set.
ConcordanceAnswerKey(Map<String,edu.mit.jsemcor.main.IConcordance> concords)
          Constructs an answer key from the given semcor concordance set.
 
Method Summary
protected  MWEPOS disambiguatePOS(List<edu.mit.jsemcor.element.IWordform> mwe)
          Attempts to disambiguate the part of speech of a multi-expression that does not have a semantic tag and whose parts are labeled with different part of speech tags.
<T extends IToken>
List<IMWE<T>>
getAnswers(IMarkedSentence<T> sent)
          Gets the answer multi-word expressions from the given sentence.
<T extends IToken>
List<IMWE<T>>
getAnswers(IMarkedSentence<T> sent, edu.mit.jsemcor.element.ISentence answers)
          Extracts a set of MWE answers from a sentence and its corresponding answer sentence.
protected
<T extends IToken>
List<IMWE<T>>
getContinuousMWEs(IMarkedSentence<T> sent, edu.mit.jsemcor.element.ISentence answer, Set<edu.mit.jsemcor.element.IWordform> used)
          Gets the multi-word expressions from the given sentence that are marked as single tokens.
protected  MWEPOS getMWEPOS(String lexSense)
          Given the lexical sense of a word form, extracts the one digit decimal integer representing the synset type of the sense and returns the corresponding part of speech.
protected
<T extends IToken>
List<IMWE<T>>
getNonContinuousMWEs(IMarkedSentence<T> sent, edu.mit.jsemcor.element.ISentence answer, Set<edu.mit.jsemcor.element.IWordform> used)
          Gets the multi-word expressions from the given sentence that are non-contiguous (e.g., have a distance value not equal to zero).
static edu.mit.jsemcor.element.ISentence getSentence(Map<String,edu.mit.jsemcor.main.IConcordance> concords, IMarkedSentence<?> sent)
          Returns the concordance sentence that corresponds to the specified marked sentence
 boolean isIgnoringProperNouns()
          Returns true if this answer key includes proper nouns in its results; false otherwise
protected static boolean isIllformattedLemma(edu.mit.jsemcor.element.ISemanticTag tag)
          Returns true if the semantic tag of a multi-word expression is null, tags a proper noun, or if the lemma encoded in the semantic tag is not formatted properly, that is, with underscores separating the parts of the multi-word expression.
 void setIgnoreProperNouns(boolean ignoreProperNouns)
          Sets the flag that, if true, determines that the answer key will include proper nouns in its results.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

condordanceSentenceIDPattern

public static final Pattern condordanceSentenceIDPattern
A compiled regular expression pattern that captures the string representation of a Semcor sentence ID. Pattern: (\\S+?)/(\\S+?)/(\\d+)
  1. (\\S+?)/ group 1, concordance name
  2. (\\S+?)/ group 2, context name
  3. (\\d+) group 3, sentence number

Since:
jMWE 1.0.0

lexSensePattern

public static final Pattern lexSensePattern
A compiled regular expression pattern that captures the string representation of sense key. ss_type:lex_filenum:lex_id:head_word:head_id Pattern: (\\d):(\\d\\d):(\\d\\d):?:((\\S+):(\\d\\d))?
  1. (\\d): group 1, synset type, is a one digit decimal integer representing the synset type for the sense
  2. (\\d\\d): group 2, lex_filenum, is a two digit decimal integer representing the name of the lexicographer file containing the synset for the sense
  3. (\\d\\d): group 3, lex_id is a two digit decimal integer that, when appended onto lemma , uniquely identifies a sense within a lexicographer file
  4. ?:((\\S+):(\\d\\d))? group 4 and 5, head_word and head_id, may or may not occur

Since:
jMWE 1.0.0
Constructor Detail

ConcordanceAnswerKey

public ConcordanceAnswerKey(edu.mit.jsemcor.main.IConcordance c)
Constructs an answer key from a single concordance

Parameters:
c - the concordance that backs this answer key. May not be null.
Since:
jMWE 1.0.0

ConcordanceAnswerKey

public ConcordanceAnswerKey(Iterable<? extends edu.mit.jsemcor.main.IConcordance> i)
Constructs an answer key from the given semcor concordance set.

Parameters:
i - the set of concordances that backs this answer key. May not be null.
Throws:
NullPointerException - if the specified concordance set is null
Since:
jMWE 1.0.0

ConcordanceAnswerKey

public ConcordanceAnswerKey(Map<String,edu.mit.jsemcor.main.IConcordance> concords)
Constructs an answer key from the given semcor concordance set.

Parameters:
concords - the semcor concordance that backs this answer key. May not be null.
Throws:
NullPointerException - if the specified concordance set is null
Since:
jMWE 1.0.0
Method Detail

isIgnoringProperNouns

public boolean isIgnoringProperNouns()
Returns true if this answer key includes proper nouns in its results; false otherwise

Returns:
true if this answer key includes proper nouns in its results; false otherwise
Since:
jMWE 1.0.0

setIgnoreProperNouns

public void setIgnoreProperNouns(boolean ignoreProperNouns)
Sets the flag that, if true, determines that the answer key will include proper nouns in its results.

Parameters:
ignoreProperNouns - true if this answer key should include proper nouns in its results; false otherwise
Since:
jMWE 1.0.0

getAnswers

public <T extends IToken> List<IMWE<T>> getAnswers(IMarkedSentence<T> sent)
Description copied from interface: IAnswerKey
Gets the answer multi-word expressions from the given sentence. If there are no answers, should return the empty list. Should never return null.

Specified by:
getAnswers in interface IAnswerKey
Type Parameters:
T - type of tokens that are contained in the sentence.
Parameters:
sent - the sentence for which the answers should be retrieved May not be null.
Returns:
a non-null, possibly empty list of answer multi-word expressions for the given sentence

getAnswers

public <T extends IToken> List<IMWE<T>> getAnswers(IMarkedSentence<T> sent,
                                                   edu.mit.jsemcor.element.ISentence answers)
Extracts a set of MWE answers from a sentence and its corresponding answer sentence.

Type Parameters:
T - the token type
Parameters:
sent - the sentence for which answers are needed
answers - the answers
Returns:
a list of MWEs that are ground truth for this sentence
Since:
jMWE 1.0.0

getNonContinuousMWEs

protected <T extends IToken> List<IMWE<T>> getNonContinuousMWEs(IMarkedSentence<T> sent,
                                                                edu.mit.jsemcor.element.ISentence answer,
                                                                Set<edu.mit.jsemcor.element.IWordform> used)
Gets the multi-word expressions from the given sentence that are non-contiguous (e.g., have a distance value not equal to zero).

Parameters:
sent - the unit for which the answers are being constructed
answer - the semcor sentence from which the multi-token MWEs should be extracted
Returns:
a non-null, possible empty list of multi-word expressions found in the given unit that are marked by distance coordinates
Throws:
NullPointerException - if either argument is null
Since:
jMWE 1.0.0

getContinuousMWEs

protected <T extends IToken> List<IMWE<T>> getContinuousMWEs(IMarkedSentence<T> sent,
                                                             edu.mit.jsemcor.element.ISentence answer,
                                                             Set<edu.mit.jsemcor.element.IWordform> used)
Gets the multi-word expressions from the given sentence that are marked as single tokens.

Parameters:
sent - the unit for which the answers are being constructed
answer - the semcor sentence from which the single-token MWEs should be extracted
Returns:
a non-null, possible empty list of multi-word expressions found in the given unit that are marked as a single token
Throws:
NullPointerException - if either argument is null
Since:
jMWE 1.0.0

getMWEPOS

protected MWEPOS getMWEPOS(String lexSense)
Given the lexical sense of a word form, extracts the one digit decimal integer representing the synset type of the sense and returns the corresponding part of speech.

Parameters:
lexSense - the lexical sense of a word form.
Returns:
the part of speech corresponding to the synset type of the given sense
Since:
jMWE 1.0.0

disambiguatePOS

protected MWEPOS disambiguatePOS(List<edu.mit.jsemcor.element.IWordform> mwe)
Attempts to disambiguate the part of speech of a multi-expression that does not have a semantic tag and whose parts are labeled with different part of speech tags. Will check for the case in which the first part is a verb and the second part of the multi-word expression is a preposition. In that case, will return MWEPOS.VERB. Otherwise, returns null.

Parameters:
mwe - the set of wordforms in the MWE
Returns:
The best guess of the method, or null if none
Since:
jMWE 1.0.0

getSentence

public static edu.mit.jsemcor.element.ISentence getSentence(Map<String,edu.mit.jsemcor.main.IConcordance> concords,
                                                            IMarkedSentence<?> sent)
Returns the concordance sentence that corresponds to the specified marked sentence

Parameters:
concords - the concordances which should be searched for the sentence
sent - the sentence corresponding to the concordance sentence that should be retrieved
Returns:
the retrieved sentence
Throws:
IllegalArgumentException - if unable to find the sentence
Since:
jMWE 1.0.0

isIllformattedLemma

protected static boolean isIllformattedLemma(edu.mit.jsemcor.element.ISemanticTag tag)
Returns true if the semantic tag of a multi-word expression is null, tags a proper noun, or if the lemma encoded in the semantic tag is not formatted properly, that is, with underscores separating the parts of the multi-word expression.

Parameters:
tag - the semantic tag of a wordform that is a part of a multi-word expression.
Returns:
true if the semantic tag of a multi-word expression is null, tags a proper noun, or if the lemma encoded in the semantic tag is not formatted with underscores separating the parts of the multi-word expression.
Since:
jMWE 1.0.0


Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.