edu.mit.jmwe.data
Class AbstractMWEDesc<P extends IMWEDesc.IPart>

java.lang.Object
  extended by edu.mit.jmwe.data.AbstractMWEDesc<P>
Type Parameters:
P - the type of the part for this mwe description
All Implemented Interfaces:
IHasForm, IHasMWEPOS, IMWEDesc, Comparable<IMWEDesc>
Direct Known Subclasses:
InfMWEDesc, RootMWEDesc

public abstract class AbstractMWEDesc<P extends IMWEDesc.IPart>
extends Object
implements IMWEDesc

A base class for MWE descriptions that can be used to construct a description from some combination of: a surface form, a list of parts, and counts relating to the MWE's appearance in a reference concordance.

Since:
jMWE 1.0.0
Version:
$Id: AbstractMWEDesc.java 620 2011-05-08 21:13:58Z markaf $
Author:
M.A. Finlayson

Nested Class Summary
protected  class AbstractMWEDesc.AbstractPart
          Default implementation of the IPart interface.
 
Nested classes/interfaces inherited from interface edu.mit.jmwe.data.IMWEDesc
IMWEDesc.IPart
 
Field Summary
protected  int[] counts
           
 
Fields inherited from interface edu.mit.jmwe.data.IMWEDesc
boundaryUnderscores, comma, underscore, underscores
 
Constructor Summary
AbstractMWEDesc(List<String> parts)
          Constructs a new MWE description object from the list of parts.
AbstractMWEDesc(List<String> parts, int... counts)
          Constructs a new MWE description object from the list of parts and counts relating to the MWE's appearance in a reference concordance.
AbstractMWEDesc(String surfaceForm)
          Constructs a new MWE description object from the specified surface form that has no inflected forms.
AbstractMWEDesc(String surfaceForm, int... counts)
          Constructs a new MWE description object that has no inflected forms from the specified surface form and counts relating to the MWE's appearance in a reference concordance.
 
Method Summary
protected static int checkCount(int count)
          Checks that each passed in count is non-negative.
 int compareTo(IMWEDesc id)
           
static String concatenate(Iterable<String> parts, String separator)
          Utility method for concatenating collections of strings into a single string using a specified separator.
static boolean equalsRoots(IMWEDesc one, IMWEDesc two)
          Returns true if the root descriptions associated with each of this MWE descriptions are the same; false otherwise.
 int[] getCounts()
          Returns an array containing the marked split, marked continuous, unmarked exact, and unmarked pattern occurrences of this MWE in the reference concordance.
protected abstract  int getExpectedCountLength()
          Subclasses should implement this method to return the number of counts relating to the MWE's appearance in a reference concordance that are expected in the implementation.
 String getForm()
          Returns the object's surface form text, exactly as it appears in its original context, with capitalization intact.
 int getMarkedContinuous()
          Returns the number of times this MWE was marked on a continuous run of tokens in the reference concordance.
 int getMarkedSplit()
          Returns the number of times this MWE was marked on a non-continuous run of tokens in the reference concordance.
 List<P> getParts()
          Returns an unmodifiable list of parts that comprise the MWE.
static IRootMWEDesc getRoot(IMWEDesc desc)
          Returns the root mwe description associated with this object.
 int getUnmarkedExact()
          Returns the number of times the exact surface form of this MWE description occurs in the reference concordance without being marked as an occurrence of the MWE.
 int getUnmarkedPattern()
          Returns the number of times a this MWE description occurs in the reference concordance without being marked as an occurrence of the MWE, and whose form matches a known inflection pattern.
static boolean isFillerForSlot(IToken token, IMWEDesc.IPart part)
          Returns true if the part's lemma matches either the surface form of the given token or any of the token's stems, regardless of case.
protected  boolean isStopWord(String text)
          Helper method that calculates, for efficiency's sake, whether this MWE part is a stop word.
protected abstract  P makePart(String form, int index)
          Subclasses should implement this method to construct an IMWEDesc.IPart given the form and index of a part of an MWE.
static List<String> splitOnUnderscores(String str)
          Splits a specified string into constituent strings that are separated by underscores.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.mit.jmwe.data.IMWEDesc
getID
 
Methods inherited from interface edu.mit.jmwe.data.IHasMWEPOS
getPOS
 

Field Detail

counts

protected final int[] counts
Constructor Detail

AbstractMWEDesc

public AbstractMWEDesc(String surfaceForm)
Constructs a new MWE description object from the specified surface form that has no inflected forms.

Parameters:
surfaceForm - A string representing the MWE with its words separated by underscores
Throws:
NullPointerException - if the argument is null
IllegalArgumentException - if the surface form does not contain underscores
Since:
jMWE 1.0.0

AbstractMWEDesc

public AbstractMWEDesc(String surfaceForm,
                       int... counts)
Constructs a new MWE description object that has no inflected forms from the specified surface form and counts relating to the MWE's appearance in a reference concordance.

Parameters:
surfaceForm - A string representing the MWE with its words separated by underscores
counts - the implementation-specific counts relating to the MWE's appearance in a reference concordance.
Throws:
NullPointerException - if either argument is null
IllegalArgumentException - if the surface form does not contain underscores
Since:
jMWE 1.0.0

AbstractMWEDesc

public AbstractMWEDesc(List<String> parts)
Constructs a new MWE description object from the list of parts. This constructor allocates a new internal list, and so subsequent changes to the source list will not affect this object.

Parameters:
parts - the list of parts that will make up this list, may neither be null nor empty, and may not contain any nulls, empty or all whitespace strings, or strings that contain the underscore character.
Throws:
NullPointerException - if the specified list of parts is null, or contains a null
IllegalArgumentException - if the specified list has less than two elements, or any trimmed string in the list contains an underscore, is empty, or contains whitespace
Since:
jMWE 1.0.0

AbstractMWEDesc

public AbstractMWEDesc(List<String> parts,
                       int... counts)
Constructs a new MWE description object from the list of parts and counts relating to the MWE's appearance in a reference concordance. This constructor allocates a new internal list, and so subsequent changes to the source list will not affect this object.

Parameters:
parts - the list of parts that will make up this list, may neither be null nor empty, and may not contain any nulls, empty or all whitespace strings, or strings that contain the underscore character.
counts - the implementation-specific counts relating to the MWE's appearance in a reference concordance.
Throws:
NullPointerException - if the specified list of parts is null, or contains a null
IllegalArgumentException - if the specified list has less than two elements, or any trimmed string in the list contains an underscore, is empty, or contains whitespace
Since:
jMWE 1.0.0
Method Detail

getExpectedCountLength

protected abstract int getExpectedCountLength()
Subclasses should implement this method to return the number of counts relating to the MWE's appearance in a reference concordance that are expected in the implementation.

Returns:
the number of counts relating to the MWE's appearance in a reference concordance.
Since:
jMWE 1.0.0

checkCount

protected static int checkCount(int count)
Checks that each passed in count is non-negative.

Parameters:
count - the count to be checked
Returns:
the given count if it is non negative.
Throws:
IllegalArgumentException - if the count is less than zero
Since:
jMWE 1.0.0

makePart

protected abstract P makePart(String form,
                              int index)
Subclasses should implement this method to construct an IMWEDesc.IPart given the form and index of a part of an MWE.

Parameters:
form - the text of the part
index - the index of the part in the MWE
Since:
jMWE 1.0.0

getForm

public String getForm()
Description copied from interface: IHasForm
Returns the object's surface form text, exactly as it appears in its original context, with capitalization intact. May be a single word or punctuation. The surface form may not contain whitespace or underscores. This method will never return null.

Specified by:
getForm in interface IHasForm
Returns:
the original text, never null.

getMarkedContinuous

public int getMarkedContinuous()
Description copied from interface: IMWEDesc
Returns the number of times this MWE was marked on a continuous run of tokens in the reference concordance. Will always zero or a positive number.

Specified by:
getMarkedContinuous in interface IMWEDesc
Returns:
the number of times this MWE was marked on a unbroken run of tokens in the reference concordance.

getMarkedSplit

public int getMarkedSplit()
Description copied from interface: IMWEDesc
Returns the number of times this MWE was marked on a non-continuous run of tokens in the reference concordance. Will always zero or a positive number.

Specified by:
getMarkedSplit in interface IMWEDesc
Returns:
the number of times this MWE was marked on a non-continuous run of tokens in the reference concordance.

getUnmarkedExact

public int getUnmarkedExact()
Description copied from interface: IMWEDesc
Returns the number of times the exact surface form of this MWE description occurs in the reference concordance without being marked as an occurrence of the MWE. To be counted as an exact unmarked occurrence, there must be a continuous run of tokens whose forms match, in order, the forms of the parts (ignoring case) of this MWE description. Will always zero or a positive number.

Specified by:
getUnmarkedExact in interface IMWEDesc
Returns:
the number exact unmarked occurrences of this MWE in the reference concordance.

getUnmarkedPattern

public int getUnmarkedPattern()
Description copied from interface: IMWEDesc
Returns the number of times a this MWE description occurs in the reference concordance without being marked as an occurrence of the MWE, and whose form matches a known inflection pattern. To be counted as a pattern-inflected occurrence, there must be a continuous run of tokens whose forms or stems match, in order, the forms of the parts (ignoring case) of this MWE description, and whose inflection pattern matches one of reference inflection patterns. Will always zero or a positive number.

Specified by:
getUnmarkedPattern in interface IMWEDesc
Returns:
the number of inflected unmarked occurrences of this MWE in the reference concordance.

getParts

public List<P> getParts()
Description copied from interface: IMWEDesc
Returns an unmodifiable list of parts that comprise the MWE.

Specified by:
getParts in interface IMWEDesc
Returns:
an unmodifiable list of parts that comprise the MWE.

getCounts

public int[] getCounts()
Description copied from interface: IMWEDesc
Returns an array containing the marked split, marked continuous, unmarked exact, and unmarked pattern occurrences of this MWE in the reference concordance.

Specified by:
getCounts in interface IMWEDesc
Returns:
an array containing the counts relating to the MWE's appearance in the reference concordance.

compareTo

public int compareTo(IMWEDesc id)
Specified by:
compareTo in interface Comparable<IMWEDesc>

toString

public String toString()
Overrides:
toString in class Object

isStopWord

protected boolean isStopWord(String text)
Helper method that calculates, for efficiency's sake, whether this MWE part is a stop word. This implementation uses a standard set of stop words used by the Exhaustive.getStopWords() method. Subclasses may override this method to use a different set of stop words.

Parameters:
text - text, to be checked for being a stop word
Returns:
true if the verbatim text is a stop word; false otherwise
Since:
jMWE 1.0.0

equalsRoots

public static boolean equalsRoots(IMWEDesc one,
                                  IMWEDesc two)
Returns true if the root descriptions associated with each of this MWE descriptions are the same; false otherwise.

Parameters:
one - the first mwe description
two - the second mwe description
Returns:
true if the root descriptions associated with each of this MWE descriptions are the same; false otherwise.
Throws:
NullPointerException - if either argument is null
Since:
jMWE 1.0.0

getRoot

public static IRootMWEDesc getRoot(IMWEDesc desc)
Returns the root mwe description associated with this object.

Parameters:
desc - the mwe object object from which to extract the root
Returns:
the root for this object
Throws:
NullPointerException - if the argument is null
Since:
jMWE 1.0.0

splitOnUnderscores

public static List<String> splitOnUnderscores(String str)
Splits a specified string into constituent strings that are separated by underscores. This method strips leading and trailing whitespace, leading and trailing runs of underscores, and treats runs of underscores as a single delimiter.

Parameters:
str - a string to be split into underscore-delimited parts
Returns:
an unmodifiable list of strings that were delimited by underscores in the original string
Throws:
NullPointerException - if the specified string is null
Since:
jMWE 1.0.0

concatenate

public static String concatenate(Iterable<String> parts,
                                 String separator)
Utility method for concatenating collections of strings into a single string using a specified separator.

Parameters:
parts - List of parts to be concatenated, may not be null
separator - String used to separate the parts in the result, may be null.
Returns:
a single string resulting from the concatenation of the parts with the separator in between each
Throws:
NullPointerException - if the specified iterable is null
Since:
jMWE 1.0.0

isFillerForSlot

public static boolean isFillerForSlot(IToken token,
                                      IMWEDesc.IPart part)
Returns true if the part's lemma matches either the surface form of the given token or any of the token's stems, regardless of case.

Parameters:
token - the token to be compared to the part's lemma
Returns:
true if the part's lemma matches either the surface form of the given token or any of the token's stems, regardless of case.
Throws:
NullPointerException - if either argument is null
Since:
jMWE 1.0.0


Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.