P
- the type of the part for this mwe descriptionpublic abstract class AbstractMWEDesc<P extends IMWEDesc.IPart> extends java.lang.Object implements IMWEDesc
Modifier and Type | Class and Description |
---|---|
protected class |
AbstractMWEDesc.AbstractPart
Default implementation of the
IPart interface. |
IMWEDesc.IPart
Modifier and Type | Field and Description |
---|---|
protected int[] |
counts |
boundaryUnderscores, comma, underscore, underscores
Constructor and Description |
---|
AbstractMWEDesc(java.util.List<java.lang.String> parts)
Constructs a new MWE description object from the list of parts.
|
AbstractMWEDesc(java.util.List<java.lang.String> parts,
int... counts)
Constructs a new MWE description object from the list of parts and counts
relating to the MWE's appearance in a reference concordance.
|
AbstractMWEDesc(java.lang.String surfaceForm)
Constructs a new MWE description object from the specified surface form
that has no inflected forms.
|
AbstractMWEDesc(java.lang.String surfaceForm,
int... counts)
Constructs a new MWE description object that has no inflected forms from
the specified surface form and counts relating to the MWE's appearance in
a reference concordance.
|
Modifier and Type | Method and Description |
---|---|
protected static int |
checkCount(int count)
Checks that each passed in count is non-negative.
|
int |
compareTo(IMWEDesc id) |
static java.lang.String |
concatenate(java.lang.Iterable<java.lang.String> parts,
java.lang.String separator)
Utility method for concatenating collections of strings into a single
string using a specified separator.
|
static boolean |
equalsRoots(IMWEDesc one,
IMWEDesc two)
Returns
true if the root descriptions associated with each
of this MWE descriptions are the same; false otherwise. |
int[] |
getCounts()
Returns an array containing the marked split, marked continuous, unmarked
exact, and unmarked pattern occurrences of this MWE in the reference
concordance.
|
protected abstract int |
getExpectedCountLength()
Subclasses should implement this method to return the number of counts
relating to the MWE's appearance in a reference concordance that are
expected in the implementation.
|
java.lang.String |
getForm()
Returns the object's surface form text, exactly as it appears in its
original context, with capitalization intact.
|
int |
getMarkedContinuous()
Returns the number of times this MWE was marked on a continuous run of
tokens in the reference concordance.
|
int |
getMarkedSplit()
Returns the number of times this MWE was marked on a non-continuous run
of tokens in the reference concordance.
|
java.util.List<P> |
getParts()
Returns an unmodifiable list of parts that comprise the MWE.
|
static IRootMWEDesc |
getRoot(IMWEDesc desc)
Returns the root mwe description associated with this object.
|
int |
getUnmarkedExact()
Returns the number of times the exact surface form of this MWE
description occurs in the reference concordance without being marked as
an occurrence of the MWE.
|
int |
getUnmarkedPattern()
Returns the number of times a this MWE description occurs in the
reference concordance without being marked as an occurrence of the MWE,
and whose form matches a known inflection pattern.
|
static boolean |
isFillerForSlot(IToken token,
IMWEDesc.IPart part)
Returns true if the part's lemma matches either the surface form of the
given token or any of the token's stems, regardless of case.
|
protected boolean |
isStopWord(java.lang.String text)
Helper method that calculates, for efficiency's sake, whether this MWE
part is a stop word.
|
protected abstract P |
makePart(java.lang.String form,
int index)
Subclasses should implement this method to construct an
IMWEDesc.IPart
given the form and index of a part of an MWE. |
static java.util.List<java.lang.String> |
splitOnUnderscores(java.lang.String str)
Splits a specified string into constituent strings that are separated by
underscores.
|
java.lang.String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getPOS
public AbstractMWEDesc(java.lang.String surfaceForm)
surfaceForm
- A string representing the MWE with its words separated by
underscoresjava.lang.NullPointerException
- if the argument is null
java.lang.IllegalArgumentException
- if the surface form does not contain underscorespublic AbstractMWEDesc(java.lang.String surfaceForm, int... counts)
surfaceForm
- A string representing the MWE with its words separated by
underscorescounts
- the implementation-specific counts relating to the MWE's
appearance in a reference concordance.java.lang.NullPointerException
- if either argument is null
java.lang.IllegalArgumentException
- if the surface form does not contain underscorespublic AbstractMWEDesc(java.util.List<java.lang.String> parts)
parts
- the list of parts that will make up this list, may neither be
null
nor empty, and may not contain any
null
s, empty or all whitespace strings, or
strings that contain the underscore character.java.lang.NullPointerException
- if the specified list of parts is null
, or
contains a null
java.lang.IllegalArgumentException
- if the specified list has less than two elements, or any
trimmed string in the list contains an underscore, is empty,
or contains whitespacepublic AbstractMWEDesc(java.util.List<java.lang.String> parts, int... counts)
parts
- the list of parts that will make up this list, may neither be
null
nor empty, and may not contain any
null
s, empty or all whitespace strings, or
strings that contain the underscore character.counts
- the implementation-specific counts relating to the MWE's
appearance in a reference concordance.java.lang.NullPointerException
- if the specified list of parts is null
, or
contains a null
java.lang.IllegalArgumentException
- if the specified list has less than two elements, or any
trimmed string in the list contains an underscore, is empty,
or contains whitespaceprotected abstract int getExpectedCountLength()
protected static int checkCount(int count)
count
- the count to be checkedjava.lang.IllegalArgumentException
- if the count is less than zeroprotected abstract P makePart(java.lang.String form, int index)
IMWEDesc.IPart
given the form and index of a part of an MWE.form
- the text of the partindex
- the index of the part in the MWEnull
public java.lang.String getForm()
IHasForm
null
.public int getMarkedContinuous()
IMWEDesc
getMarkedContinuous
in interface IMWEDesc
public int getMarkedSplit()
IMWEDesc
getMarkedSplit
in interface IMWEDesc
public int getUnmarkedExact()
IMWEDesc
getUnmarkedExact
in interface IMWEDesc
public int getUnmarkedPattern()
IMWEDesc
getUnmarkedPattern
in interface IMWEDesc
public java.util.List<P> getParts()
IMWEDesc
public int[] getCounts()
IMWEDesc
public int compareTo(IMWEDesc id)
compareTo
in interface java.lang.Comparable<IMWEDesc>
public java.lang.String toString()
toString
in class java.lang.Object
protected boolean isStopWord(java.lang.String text)
Exhaustive.getStopWords()
method. Subclasses
may override this method to use a different set of stop words.text
- text, to be checked for being a stop wordtrue
if the verbatim text is a stop word;
false
otherwisepublic static boolean equalsRoots(IMWEDesc one, IMWEDesc two)
true
if the root descriptions associated with each
of this MWE descriptions are the same; false
otherwise.one
- the first mwe descriptiontwo
- the second mwe descriptiontrue
if the root descriptions associated with each
of this MWE descriptions are the same; false
otherwise.java.lang.NullPointerException
- if either argument is null
public static IRootMWEDesc getRoot(IMWEDesc desc)
desc
- the mwe object object from which to extract the rootjava.lang.NullPointerException
- if the argument is null
public static java.util.List<java.lang.String> splitOnUnderscores(java.lang.String str)
str
- a string to be split into underscore-delimited partsjava.lang.NullPointerException
- if the specified string is null
public static java.lang.String concatenate(java.lang.Iterable<java.lang.String> parts, java.lang.String separator)
parts
- List of parts to be concatenated, may not be null
separator
- String used to separate the parts in the result, may be
null
.java.lang.NullPointerException
- if the specified iterable is null
public static boolean isFillerForSlot(IToken token, IMWEDesc.IPart part)
token
- the token to be compared to the part's lemmapart
- the part whose lemma is to be compared to the tokenjava.lang.NullPointerException
- if either argument is null
Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.