edu.mit.jwi.morph
Class SimpleStemmer

java.lang.Object
  extended by edu.mit.jwi.morph.SimpleStemmer
All Implemented Interfaces:
IStemmer
Direct Known Subclasses:
WordnetStemmer

public class SimpleStemmer
extends java.lang.Object
implements IStemmer

Provides simple a simple pattern-based stemming facility based on the "Rules of Detachment" as described in the morphy man page in the Wordnet distribution, which can be found at http://wordnet.princeton.edu/man/morphy.7WN.html It also attempts to strip "ful" endings. It does not search Wordnet to see if stems actually exist. In particular, quoting from that man page:

Rules of Detachment

The following table shows the rules of detachment used by Morphy. If a word ends with one of the suffixes, it is stripped from the word and the corresponding ending is added. ... No rules are applicable to adverbs.

POS Suffix Ending

Special Processing for nouns ending with 'ful'

Morphy contains code that searches for nouns ending with ful and performs a transformation on the substring preceding it. It then appends 'ful' back onto the resulting string and returns it. For example, if passed the nouns "boxesful", it will return "boxful".

Since:
JWI 1.0
Version:
2.4.0
Author:
Mark A. Finlayson

Field Summary
static java.lang.String ENDING_ch
           
static java.lang.String ENDING_e
           
static java.lang.String ENDING_man
           
static java.lang.String ENDING_null
           
static java.lang.String ENDING_s
           
static java.lang.String ENDING_sh
           
static java.lang.String ENDING_x
           
static java.lang.String ENDING_y
           
static java.lang.String ENDING_z
           
static java.util.Map<POS,java.util.List<StemmingRule>> ruleMap
           
static java.lang.String SUFFIX_ches
           
static java.lang.String SUFFIX_ed
           
static java.lang.String SUFFIX_er
           
static java.lang.String SUFFIX_es
           
static java.lang.String SUFFIX_est
           
static java.lang.String SUFFIX_ful
           
static java.lang.String SUFFIX_ies
           
static java.lang.String SUFFIX_ing
           
static java.lang.String SUFFIX_men
           
static java.lang.String SUFFIX_s
           
static java.lang.String SUFFIX_ses
           
static java.lang.String SUFFIX_shes
           
static java.lang.String SUFFIX_ss
           
static java.lang.String SUFFIX_xes
           
static java.lang.String SUFFIX_zes
           
static java.lang.String underscore
           
 
Constructor Summary
SimpleStemmer()
           
 
Method Summary
 java.util.List<java.lang.String> findStems(java.lang.String word, POS pos)
          Takes the surface form of a word, as it appears in the text, and the assigned Wordnet part of speech.
protected  java.util.List<java.lang.String> getNounCollocationRoots(java.lang.String composite)
          Handles stemming noun collocations.
 java.util.Map<POS,java.util.List<StemmingRule>> getRuleMap()
          Returns a set of stemming rules used by this stemmer.
protected  java.util.List<java.lang.String> getVerbCollocationRoots(java.lang.String composite)
          Handles stemming verb collocations.
protected  java.lang.String normalize(java.lang.String word)
          Converts all whitespace runs to single underscores.
protected  java.util.List<java.lang.String> stripAdjectiveSuffix(java.lang.String adj)
          Strips suffixes from the specified word according to the adjective rules.
protected  java.util.List<java.lang.String> stripNounSuffix(java.lang.String noun)
          Strips suffixes from the specified word according to the noun rules.
protected  java.util.List<java.lang.String> stripVerbSuffix(java.lang.String verb)
          Strips suffixes from the specified word according to the verb rules.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

underscore

public static final java.lang.String underscore
See Also:
Constant Field Values

SUFFIX_ches

public static final java.lang.String SUFFIX_ches
See Also:
Constant Field Values

SUFFIX_ed

public static final java.lang.String SUFFIX_ed
See Also:
Constant Field Values

SUFFIX_es

public static final java.lang.String SUFFIX_es
See Also:
Constant Field Values

SUFFIX_est

public static final java.lang.String SUFFIX_est
See Also:
Constant Field Values

SUFFIX_er

public static final java.lang.String SUFFIX_er
See Also:
Constant Field Values

SUFFIX_ful

public static final java.lang.String SUFFIX_ful
See Also:
Constant Field Values

SUFFIX_ies

public static final java.lang.String SUFFIX_ies
See Also:
Constant Field Values

SUFFIX_ing

public static final java.lang.String SUFFIX_ing
See Also:
Constant Field Values

SUFFIX_men

public static final java.lang.String SUFFIX_men
See Also:
Constant Field Values

SUFFIX_s

public static final java.lang.String SUFFIX_s
See Also:
Constant Field Values

SUFFIX_ss

public static final java.lang.String SUFFIX_ss
See Also:
Constant Field Values

SUFFIX_ses

public static final java.lang.String SUFFIX_ses
See Also:
Constant Field Values

SUFFIX_shes

public static final java.lang.String SUFFIX_shes
See Also:
Constant Field Values

SUFFIX_xes

public static final java.lang.String SUFFIX_xes
See Also:
Constant Field Values

SUFFIX_zes

public static final java.lang.String SUFFIX_zes
See Also:
Constant Field Values

ENDING_null

public static final java.lang.String ENDING_null
See Also:
Constant Field Values

ENDING_ch

public static final java.lang.String ENDING_ch
See Also:
Constant Field Values

ENDING_e

public static final java.lang.String ENDING_e
See Also:
Constant Field Values

ENDING_man

public static final java.lang.String ENDING_man
See Also:
Constant Field Values

ENDING_s

public static final java.lang.String ENDING_s
See Also:
Constant Field Values

ENDING_sh

public static final java.lang.String ENDING_sh
See Also:
Constant Field Values

ENDING_x

public static final java.lang.String ENDING_x
See Also:
Constant Field Values

ENDING_y

public static final java.lang.String ENDING_y
See Also:
Constant Field Values

ENDING_z

public static final java.lang.String ENDING_z
See Also:
Constant Field Values

ruleMap

public static final java.util.Map<POS,java.util.List<StemmingRule>> ruleMap
Constructor Detail

SimpleStemmer

public SimpleStemmer()
Method Detail

getRuleMap

public java.util.Map<POS,java.util.List<StemmingRule>> getRuleMap()
Returns a set of stemming rules used by this stemmer. Will not return a null map, but it may be empty. The lists in the map will also not be null, but may be empty.

Returns:
the rule map for this stemmer
Since:
JWI 3.5.1

findStems

public java.util.List<java.lang.String> findStems(java.lang.String word,
                                                  POS pos)
Description copied from interface: IStemmer
Takes the surface form of a word, as it appears in the text, and the assigned Wordnet part of speech. The surface form may or may not contain whitespace or underscores, and may be in mixed case. The part of speech may be null, which means that all parts of speech should be considered. Returns a list of stems, in preferred order. No stem should be repeated in the list. If no stems are found, this call returns an empty list. It will never return null.

Specified by:
findStems in interface IStemmer
Parameters:
word - the surface form of which to find the stem
pos - the part of speech to find stems for; if null, find stems for all parts of speech
Returns:
the set of stems found for the surface form and part of speech combination

normalize

protected java.lang.String normalize(java.lang.String word)
Converts all whitespace runs to single underscores. Tests first to see if there is any whitespace before converting.

Parameters:
word - the string to be normalized
Returns:
a normalized string
Throws:
java.lang.NullPointerException - if the specified string is null
java.lang.IllegalArgumentException - if the specified string is empty or all whitespace
Since:
JWI 2.1.1

stripNounSuffix

protected java.util.List<java.lang.String> stripNounSuffix(java.lang.String noun)
Strips suffixes from the specified word according to the noun rules.

Parameters:
noun - the word to be modified
Returns:
a list of modified forms that were constructed, or the empty list if none
Throws:
java.lang.NullPointerException - if the specified word is null
Since:
JWI 1.0

getNounCollocationRoots

protected java.util.List<java.lang.String> getNounCollocationRoots(java.lang.String composite)
Handles stemming noun collocations.

Parameters:
composite - the word to be modified
Returns:
a list of modified forms that were constructed, or the empty list if none
Throws:
java.lang.NullPointerException - if the specified word is null
Since:
JWI 1.1.1

stripVerbSuffix

protected java.util.List<java.lang.String> stripVerbSuffix(java.lang.String verb)
Strips suffixes from the specified word according to the verb rules.

Parameters:
verb - the word to be modified
Returns:
a list of modified forms that were constructed, or the empty list if none
Throws:
java.lang.NullPointerException - if the specified word is null
Since:
JWI 1.0

getVerbCollocationRoots

protected java.util.List<java.lang.String> getVerbCollocationRoots(java.lang.String composite)
Handles stemming verb collocations.

Parameters:
composite - the word to be modified
Returns:
a list of modified forms that were constructed, or an empty list if none
Throws:
java.lang.NullPointerException - if the specified word is null
Since:
JWI 1.1.1

stripAdjectiveSuffix

protected java.util.List<java.lang.String> stripAdjectiveSuffix(java.lang.String adj)
Strips suffixes from the specified word according to the adjective rules.

Parameters:
adj - the word to be modified
Returns:
a list of modified forms that were constructed, or an empty list if none
Throws:
java.lang.NullPointerException - if the specified word is null
Since:
JWI 1.0


Copyright © 2007-2013 Massachusetts Institute of Technology. All Rights Reserved.