|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.mit.jwi.morph.SimpleStemmer
public class SimpleStemmer
Provides simple a simple pattern-based stemming facility based on the "Rules
of Detachment" as described in the morphy
man page in the Wordnet
distribution, which can be found at
http://wordnet.princeton.edu/man/morphy.7WN.html It also attempts to
strip "ful" endings. It does not search Wordnet to see if stems actually
exist. In particular, quoting from that man page:
The following table shows the rules of detachment used by Morphy. If a word ends with one of the suffixes, it is stripped from the word and the corresponding ending is added. ... No rules are applicable to adverbs.
POS Suffix Ending
Morphy contains code that searches for nouns ending with ful and performs a transformation on the substring preceding it. It then appends 'ful' back onto the resulting string and returns it. For example, if passed the nouns "boxesful", it will return "boxful".
Field Summary | |
---|---|
static java.lang.String |
ENDING_ch
|
static java.lang.String |
ENDING_e
|
static java.lang.String |
ENDING_man
|
static java.lang.String |
ENDING_null
|
static java.lang.String |
ENDING_s
|
static java.lang.String |
ENDING_sh
|
static java.lang.String |
ENDING_x
|
static java.lang.String |
ENDING_y
|
static java.lang.String |
ENDING_z
|
static java.util.Map<POS,java.util.List<StemmingRule>> |
ruleMap
|
static java.lang.String |
SUFFIX_ches
|
static java.lang.String |
SUFFIX_ed
|
static java.lang.String |
SUFFIX_er
|
static java.lang.String |
SUFFIX_es
|
static java.lang.String |
SUFFIX_est
|
static java.lang.String |
SUFFIX_ful
|
static java.lang.String |
SUFFIX_ies
|
static java.lang.String |
SUFFIX_ing
|
static java.lang.String |
SUFFIX_men
|
static java.lang.String |
SUFFIX_s
|
static java.lang.String |
SUFFIX_ses
|
static java.lang.String |
SUFFIX_shes
|
static java.lang.String |
SUFFIX_ss
|
static java.lang.String |
SUFFIX_xes
|
static java.lang.String |
SUFFIX_zes
|
static java.lang.String |
underscore
|
Constructor Summary | |
---|---|
SimpleStemmer()
|
Method Summary | |
---|---|
java.util.List<java.lang.String> |
findStems(java.lang.String word,
POS pos)
Takes the surface form of a word, as it appears in the text, and the assigned Wordnet part of speech. |
protected java.util.List<java.lang.String> |
getNounCollocationRoots(java.lang.String composite)
Handles stemming noun collocations. |
java.util.Map<POS,java.util.List<StemmingRule>> |
getRuleMap()
Returns a set of stemming rules used by this stemmer. |
protected java.util.List<java.lang.String> |
getVerbCollocationRoots(java.lang.String composite)
Handles stemming verb collocations. |
protected java.lang.String |
normalize(java.lang.String word)
Converts all whitespace runs to single underscores. |
protected java.util.List<java.lang.String> |
stripAdjectiveSuffix(java.lang.String adj)
Strips suffixes from the specified word according to the adjective rules. |
protected java.util.List<java.lang.String> |
stripNounSuffix(java.lang.String noun)
Strips suffixes from the specified word according to the noun rules. |
protected java.util.List<java.lang.String> |
stripVerbSuffix(java.lang.String verb)
Strips suffixes from the specified word according to the verb rules. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String underscore
public static final java.lang.String SUFFIX_ches
public static final java.lang.String SUFFIX_ed
public static final java.lang.String SUFFIX_es
public static final java.lang.String SUFFIX_est
public static final java.lang.String SUFFIX_er
public static final java.lang.String SUFFIX_ful
public static final java.lang.String SUFFIX_ies
public static final java.lang.String SUFFIX_ing
public static final java.lang.String SUFFIX_men
public static final java.lang.String SUFFIX_s
public static final java.lang.String SUFFIX_ss
public static final java.lang.String SUFFIX_ses
public static final java.lang.String SUFFIX_shes
public static final java.lang.String SUFFIX_xes
public static final java.lang.String SUFFIX_zes
public static final java.lang.String ENDING_null
public static final java.lang.String ENDING_ch
public static final java.lang.String ENDING_e
public static final java.lang.String ENDING_man
public static final java.lang.String ENDING_s
public static final java.lang.String ENDING_sh
public static final java.lang.String ENDING_x
public static final java.lang.String ENDING_y
public static final java.lang.String ENDING_z
public static final java.util.Map<POS,java.util.List<StemmingRule>> ruleMap
Constructor Detail |
---|
public SimpleStemmer()
Method Detail |
---|
public java.util.Map<POS,java.util.List<StemmingRule>> getRuleMap()
public java.util.List<java.lang.String> findStems(java.lang.String word, POS pos)
IStemmer
null
, which means that all parts of speech should be
considered. Returns a list of stems, in preferred order. No stem should
be repeated in the list. If no stems are found, this call returns an
empty list. It will never return null
.
findStems
in interface IStemmer
word
- the surface form of which to find the stempos
- the part of speech to find stems for; if null
,
find stems for all parts of speech
protected java.lang.String normalize(java.lang.String word)
word
- the string to be normalized
java.lang.NullPointerException
- if the specified string is null
java.lang.IllegalArgumentException
- if the specified string is empty or all whitespaceprotected java.util.List<java.lang.String> stripNounSuffix(java.lang.String noun)
noun
- the word to be modified
java.lang.NullPointerException
- if the specified word is null
protected java.util.List<java.lang.String> getNounCollocationRoots(java.lang.String composite)
composite
- the word to be modified
java.lang.NullPointerException
- if the specified word is null
protected java.util.List<java.lang.String> stripVerbSuffix(java.lang.String verb)
verb
- the word to be modified
java.lang.NullPointerException
- if the specified word is null
protected java.util.List<java.lang.String> getVerbCollocationRoots(java.lang.String composite)
composite
- the word to be modified
java.lang.NullPointerException
- if the specified word is null
protected java.util.List<java.lang.String> stripAdjectiveSuffix(java.lang.String adj)
adj
- the word to be modified
java.lang.NullPointerException
- if the specified word is null
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES All Classes | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |