Package | Description |
---|---|
edu.mit.jmwe.data |
Provides the basic data structures used by the library and their default implementations.
|
edu.mit.jmwe.data.concordance |
Provides interfaces and classes for accessing data taken Semcor-formatted concordances, useful for benchmarking detectors.
|
edu.mit.jmwe.detect |
Provides MWE detector API, a baseline detector, plus numerous other detector implementations.
|
edu.mit.jmwe.detect.score |
Provides various scoring mechanisms that can be used by subclasses of the FilterByScore and ResolveByScore detectors.
|
edu.mit.jmwe.harness |
Provides testing harness infrastructure
|
edu.mit.jmwe.harness.result |
Provides objects that encapsulate the results of a test harness run
|
edu.mit.jmwe.harness.result.error |
Provides error detectors to evaluate the results of a test harness run
|
edu.mit.jmwe.index |
Provides the MWE index interfaces and default implementations, which allow one to look up an MWE given one of its parts.
|
Modifier and Type | Interface and Description |
---|---|
interface |
IMarkedSentence<T extends IToken>
A marked sentence is a sentence (i.e., a list of tokens) that has been tagged
with a unique id.
|
interface |
IMWE<T extends IToken>
A multi-word expression found in a list of tokens.
|
class |
MWE<T extends IToken>
Default implementation of the
IMWE interface. |
class |
MWEComparator<T extends IToken>
A comparator that compares
IMWE s by checking which MWE starts earlier
in the list of tokens used to construct this comparator. |
Modifier and Type | Class and Description |
---|---|
class |
Token
Default implementation of the
IToken interface. |
Modifier and Type | Method and Description |
---|---|
static boolean |
AbstractMWEDesc.isFillerForSlot(IToken token,
IMWEDesc.IPart part)
Returns true if the part's lemma matches either the surface form of the
given token or any of the token's stems, regardless of case.
|
Modifier and Type | Interface and Description |
---|---|
interface |
IConcordanceToken
A token from a Semcor sentence.
|
Modifier and Type | Class and Description |
---|---|
protected static class |
ConcordanceTagger.TaggerToken
Represents a semcor token that is not yet tagged.
|
class |
ConcordanceToken
Default implementation of
IConcordanceToken . |
Modifier and Type | Class and Description |
---|---|
class |
MWEBuilder<T extends IToken>
A record that is used to hold tokens as the detector passes over a sentence.
|
Modifier and Type | Method and Description |
---|---|
protected <T extends IToken> |
Exhaustive.containsDuplicate(java.util.Collection<? extends IMWE<T>> results,
IMWE<T> mwe)
Returns true if the given collection of MWEs already contains a
particular MWE.
|
<T extends IToken> |
StopWords.detect(java.util.List<T> sentence) |
<T extends IToken> |
ResolveByScore.detect(java.util.List<T> sentence) |
<T extends IToken> |
ProperNouns.detect(java.util.List<T> sentence) |
<T extends IToken> |
Perfect.detect(java.util.List<T> sentence) |
<T extends IToken> |
NoProperNouns.detect(java.util.List<T> sentence) |
<T extends IToken> |
NoInflection.detect(java.util.List<T> sentence) |
<T extends IToken> |
LMLR.detect(java.util.List<T> s) |
<T extends IToken> |
InOrder.detect(java.util.List<T> sentence) |
<T extends IToken> |
InflectionPattern.detect(java.util.List<T> sentence) |
<T extends IToken> |
InflectionLookup.detect(java.util.List<T> sentence) |
<T extends IToken> |
IMWEDetector.detect(java.util.List<T> sentence)
Given a list of tokens, the detector searches for the MWEs in the list.
|
<T extends IToken> |
HasMWEDetector.detect(java.util.List<T> sentence) |
<T extends IToken> |
FilterByScore.detect(java.util.List<T> sentence) |
<T extends IToken> |
Exhaustive.detect(java.util.List<T> sentence) |
<T extends IToken> |
Continuous.detect(java.util.List<T> sentence) |
<T extends IToken> |
Consecutive.detect(java.util.List<T> sent) |
<T extends IToken> |
CompositeDetector.detect(java.util.List<T> sentence) |
static <T extends IToken> |
MWEBuilder.fillNextSlot(MWEBuilder<T> builder,
T t)
Fills the first non-null (empty) slot in the given builder.
|
protected <T extends IToken> |
Consecutive.fillNextSlot(MWEBuilder<T> builder,
T t)
Fills the first non-null (empty) slot in the given builder.
|
static <T extends IToken> |
MWEBuilder.fillSlots(java.util.Set<MWEBuilder<T>> records,
T token)
Given a set of MWE builders, fills all the slots in the records that can be
filled by the given token.
|
static <T extends IToken> |
LMLR.getFirstToken(java.lang.Iterable<? extends T> tokens,
java.util.Comparator<T> c)
Returns the token that is the first in a given iterable collection of
tokens.
|
protected <T extends IToken> |
SmallestVariance.getScorer(java.util.List<T> sentence) |
protected abstract <T extends IToken> |
ResolveByScore.getScorer(java.util.List<T> sentence)
Returns the scoring function for this filter.
|
protected <T extends IToken> |
MoreFrequentAsMWE.getScorer(java.util.List<T> sentence) |
protected <T extends IToken> |
Longest.getScorer(java.util.List<T> scorer) |
protected <T extends IToken> |
LeskAtLeast.getScorer(java.util.List<T> sentence) |
protected <T extends IToken> |
Leftmost.getScorer(java.util.List<T> sentence) |
protected abstract <T extends IToken> |
FilterByScore.getScorer(java.util.List<T> sentence)
Returns a scoring function for the specified sentence.
|
protected <T extends IToken> |
ConstrainLength.getScorer(java.util.List<T> sentence) |
static <T extends IToken> |
InflectionLookup.getSurfaceFormDescription(IRootMWEDesc root,
IMWE<T> mwe)
Returns a multi-word expression description with a lemma that is
constructed by concatenating the tokens of the MWE exactly as they appear
in the sentence with underscores.
|
<T extends IToken> |
InflectionRule.getTagPattern(IMWE<T> mwe)
Concatenates the tags of each token in the MWE, separating each by
underscores.
|
static <T extends IToken> |
InflectionRule.inflects(T token,
IMWE<T> mwe)
Returns true if a the text of a token from an MWE does not equal the
corresponding part lemma.
|
static <T extends IToken> |
Continuous.isDiscontinuous(IMWE<T> mwe,
java.util.List<T> sentence)
Determines if the specified MWE is continuous, i.e., there are no
interstitial tokens inside its boundaries that are not a part of the MWE.
|
static <T extends IToken> |
Continuous.isDiscontinuous(IMWE<T> mwe,
java.util.Map<T,java.lang.Integer> indexMap)
Determines if the specified MWE is continuous, i.e., there are no
interstitial tokens inside its boundaries that are not a part of the MWE.
|
static <T extends IToken> |
InOrder.isOutOfOrder(IMWE<T> mwe)
Determines if the constituents of the specified MWE are out of order.
|
static <T extends IToken> |
ProperNouns.isProperNoun(T token)
Checks if the token represents a proper noun by checking its part of
speech tag.
|
<T extends IToken> |
InflectionRule.isValid(IMWE<T> mwe) |
<T extends IToken> |
IInflectionRule.isValid(IMWE<T> mwe)
Returns
true if this MWE follows the rule;
false otherwise. |
protected <T extends IToken> |
ProperNouns.isValidInterstitial(T token,
java.util.LinkedList<T> tokens)
Checks if a token that is not a proper noun may still be a part of a
proper noun MWE.
|
static <T extends IToken> |
LMLR.longest(IMWE<T> one,
IMWE<T> two,
java.util.Comparator<T> c)
Compares two MWEs and returns the longest MWE.
|
<T extends IToken> |
InflectionRule.matches(IMWE<T> mwe) |
<T extends IToken> |
IInflectionRule.matches(IMWE<T> mwe)
Returns
true if the given MWE has the same syntax as this
rule. |
protected <T extends IToken> |
ProperNouns.removeIncorrectInterstitials(java.util.LinkedList<T> cs)
Removes all the tokens from the end of the given list that are not proper
nouns.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.Set<? extends IMWEDesc> |
Consecutive.getMWEDescs(IToken token)
Returns all the MWE entries in the index that contain the given token or
one of its stems as a part.
|
Modifier and Type | Class and Description |
---|---|
class |
FractionAsMWEScore<T extends IToken>
A scorer that scores with the fraction of times it appears marked as an MWE,
as opposed to a run of unmarked tokens.
|
class |
LengthScore<T extends IToken>
Scores a MWE with its length
|
class |
LeskScore<T extends IToken>
Scores an object with its lesk-score overlap with dictionary glosses.
|
class |
StartingIndexScore<T extends IToken>
Scores an MWE with its starting index.
|
class |
VarianceScore<T extends IToken>
Scores each MWE with its index variance.
|
Modifier and Type | Method and Description |
---|---|
static <T extends IToken> |
LengthScore.getInstance()
Returns the singleton instance of this class, instantiating if necessary.
|
static <T extends IToken> |
FractionAsMWEScore.getInstance()
Returns the singleton instance of this class, instantiating if necessary.
|
Modifier and Type | Method and Description |
---|---|
<T extends IToken> |
IAnswerKey.getAnswers(IMarkedSentence<T> sentence)
Gets the answer multi-word expressions from the given sentence.
|
<T extends IToken> |
ConcordanceAnswerKey.getAnswers(IMarkedSentence<T> sent) |
<T extends IToken> |
ConcordanceAnswerKey.getAnswers(IMarkedSentence<T> sent,
edu.mit.jsemcor.element.ISentence answers)
Extracts a set of MWE answers from a sentence and its corresponding
answer sentence.
|
protected <T extends IToken> |
ConcordanceAnswerKey.getContinuousMWEs(IMarkedSentence<T> sent,
edu.mit.jsemcor.element.ISentence answer,
java.util.Set<edu.mit.jsemcor.element.IWordform> used)
Gets the multi-word expressions from the given sentence that are marked
as single tokens.
|
protected <T extends IToken> |
ConcordanceAnswerKey.getNonContinuousMWEs(IMarkedSentence<T> sent,
edu.mit.jsemcor.element.ISentence answer,
java.util.Set<edu.mit.jsemcor.element.IWordform> used)
Gets the multi-word expressions from the given sentence that are
non-contiguous (e.g., have a distance value not equal to zero).
|
<T extends IToken,S extends IMarkedSentence<T>> |
TestHarness.run(IMWEDetector detector,
IResultBuilder<T,S> result,
java.util.Iterator<S> itr,
IAnswerKey answers,
IProgressBar pb) |
<T extends IToken,S extends IMarkedSentence<T>> |
ITestHarness.run(IMWEDetector detector,
IResultBuilder<T,S> results,
java.util.Iterator<S> itr,
IAnswerKey answers,
IProgressBar pb)
Runs the detector in the test harness and stores the results in the
provided result builder.
|
<T extends IToken,S extends IMarkedSentence<T>> |
TestHarness.run(java.util.Map<IMWEDetector,IResultBuilder<T,S>> detectors,
java.util.Iterator<S> itr,
IAnswerKey answers,
IProgressBar pb) |
<T extends IToken,S extends IMarkedSentence<T>> |
ITestHarness.run(java.util.Map<IMWEDetector,IResultBuilder<T,S>> detectors,
java.util.Iterator<S> itr,
IAnswerKey answers,
IProgressBar pb)
Runs the detectors in the test harness and stores the results in the
associated result builder.
|
protected <T extends IToken,S extends IMarkedSentence<T>> |
TestHarness.runDetector(IMWEDetector detector,
IResultBuilder<T,S> builder,
S sent,
java.util.List<IMWE<T>> answers)
Runs the detector over a single sentence, storing the result as an
ISentenceResult in the given result builder. |
protected <T extends IToken,S extends IMarkedSentence<T>> |
TestHarness.runDetectors(java.util.Map<IMWEDetector,IResultBuilder<T,S>> detectors,
S sent,
java.util.List<IMWE<T>> answers)
Runs a set of detectors on the specified sentence, comparing the results
to the specified answers.
|
Modifier and Type | Class and Description |
---|---|
class |
ErrorResult<T extends IToken>
Default implementation of
IErrorResult interface. |
static class |
ErrorResult.ErrorResultBuilder<T extends IToken>
An object that builds an error result.
|
interface |
IErrorResult<T extends IToken>
Stores MWEs under the type of error they make.
|
interface |
IOverallResult<T extends IToken,S extends IMarkedSentence<T>>
Contains results collected from running a test harness over a group of
IMarkedSentence objects. |
interface |
IResultBuilder<T extends IToken,S extends IMarkedSentence<T>>
Classes implementing this interface build an
IOverallResult object. |
interface |
ISentenceResult<T extends IToken,S extends IMarkedSentence<T>>
Contains results for one
IMarkedSentence object. |
class |
MWEResult<T extends IToken,S extends IMarkedSentence<T>>
Default implementation of
IOverallResult interface. |
class |
MWEResultBuilder<T extends IToken,S extends IMarkedSentence<T>>
Builds an
MWEResult by processing the data in ISentenceResult
objects. |
class |
SentenceResult<T extends IToken,S extends IMarkedSentence<T>>
Default implementation of the
ISentenceResult interface. |
class |
TokenResultBuilder<T extends IToken,U extends IMarkedSentence<T>>
A result builder that keeps track of token-level results.
|
Modifier and Type | Method and Description |
---|---|
static <T extends IToken,S extends IMarkedSentence<T>> |
SentenceResult.printTable(java.lang.StringBuilder sb,
ISentenceResult<T,S> result,
java.util.Formatter f)
Prints a table of the correct, false negative and false positive
expressions found by the detector in columns.
|
static <T extends IToken> |
ErrorResult.toString(IErrorResult<T> result)
Creates a table displaying the number of instances of each error class.
|
static <T extends IToken,S extends IMarkedSentence<T>> |
SentenceResult.toString(ISentenceResult<T,S> result,
S sentence,
boolean table)
Creates a graphical representation of the multi-word expressions found by
the detector for a given sentence.
|
static <T extends IToken,U extends IMarkedSentence<T>> |
SentenceResult.toString(ISentenceResult<T,U> result,
U sentence)
Creates a graphical representation of the multi-word expressions found by
the detector for a given sentence.
|
Modifier and Type | Method and Description |
---|---|
<T extends IToken,S extends IMarkedSentence<T>> |
VBDVBN.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
UntaggedPNoun.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
MissingFromIndex.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
InflectionPatternError.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
InflectionError.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
IErrorDetector.detect(ISentenceResult<T,S> result)
Identifies the multi-word expressions in a unit result that fall under
the specific error class this detector identifies.
|
<T extends IToken,S extends IMarkedSentence<T>> |
ExtraPrep.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
ExtraPOS.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
DetectorDisagreement.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
CompositeErrorDetector.detect(ISentenceResult<T,S> result) |
<T extends IToken,S extends IMarkedSentence<T>> |
AllStopWords.detect(ISentenceResult<T,S> result) |
<T extends IToken,U extends IMarkedSentence<T>> |
WrongPOS.detect(ISentenceResult<T,U> result) |
<T extends IToken,U extends IMarkedSentence<T>> |
PNounShort.detect(ISentenceResult<T,U> result) |
<T extends IToken,U extends IMarkedSentence<T>> |
PNounLong.detect(ISentenceResult<T,U> result) |
<T extends IToken,U extends IMarkedSentence<T>> |
OutOfOrder.detect(ISentenceResult<T,U> result) |
<T extends IToken,U extends IMarkedSentence<T>> |
InterstitialTokens.detect(ISentenceResult<T,U> result) |
protected static <T extends IToken> |
ExtraPrep.findTag(IMWE<T> test,
java.lang.String tag)
Returns the index of the first token in the MWE with the specified tag.
|
static <T extends IToken> |
InterstitialTokens.hasParticle(IMWE<T> mwe,
java.util.List<T> sentence)
Returns true if the given MWE contains a token that is a particle and is
separated from the previous token in the MWE by one or more non-MWE
tokens in the sentence.
|
static <T extends IToken> |
InterstitialTokens.isParticle(T token)
Returns
true if the specified token is tagged as a particle;
false otherwise |
static <T extends IToken> |
VBDVBN.isProblem(IMWE<T> mwe)
Determines if the specified MWE is a problem according to this error
class.
|
static <T extends IToken> |
MissingFromIndex.isProblem(IMWE<T> mwe,
IMWEIndex index)
Determines if the specified MWE is a problem, relative to the specified
index, according to this error class.
|
static <T extends IToken,S extends IMarkedSentence<T>> |
DetectorDisagreement.isProblem(IMWE<T> mwe,
ISentenceResult<T,S> result,
IMWEDetector detector)
Determines if the specified MWE is a problem relative to the specified
sentence according to this error class.
|
Modifier and Type | Method and Description |
---|---|
<T extends IToken> |
IndexBuilder.findMissingMWEs(java.util.List<IMWE<T>> mwes,
java.util.Map<IMWEDescID,IndexBuilder.MutableRootMWEDesc> index,
java.util.Set<IndexBuilder.MutableRootMWEDesc> missing)
Finds MWEs that are marked in the the specified list, but not in the
index.
|
Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.