|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.mit.jmwe.util.AbstractFileSelector
edu.mit.jmwe.data.concordance.ConcordanceTagger
public class ConcordanceTagger
Tags with parts of speech all words in all contexts provided in a given concordance set.
This class depends on three external libraries: JWI, JSemcor, and the Stanford POS Tagger.
Use the main method of this class for its default functionality.
TaggedConcordanceIterator
Nested Class Summary | |
---|---|
protected static class |
ConcordanceTagger.TaggerToken
Represents a semcor token that is not yet tagged. |
Constructor Summary | |
---|---|
ConcordanceTagger()
|
Method Summary | |
---|---|
protected void |
addWords(edu.mit.jsemcor.element.IWordform wf,
int tokenNum,
List<ConcordanceTagger.TaggerToken> result,
edu.mit.jwi.morph.IStemmer stemmer)
Stems each of the words in the provided wordform, adding the tagger tokens created from these stems, words and token number to the given results list. |
protected File |
getLocation(Class<?> key)
Utility method for getting a location that has a default stored in the Java preferences. |
protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> |
getPOSTagger()
Returns a maximum entropy tagger using a Stanford NLP tagging model selected by the user. |
protected edu.mit.jsemcor.main.IConcordanceSet |
getSemcor()
Returns the Semcor concordance set or null if the directory
cannot be found. |
protected edu.mit.jwi.morph.IStemmer |
getStemmer()
Returns a stemmer that requires Wordnet or null if the
Wordnet directory cannot be found. |
protected Writer |
getWriter()
Returns a writer for the file to which the tagged concordance will be written. |
static void |
main(String[] args)
Tags the Semcor corpus. |
protected ArrayList<edu.stanford.nlp.ling.HasWord> |
makeSentence(edu.mit.jsemcor.element.ISentence s,
edu.mit.jwi.morph.IStemmer stemmer)
Returns a Stanford parser sentence that contains all the tokens from the specified JSemcor sentence, with MWE expressions broken into their constituent tokens. |
void |
process(edu.mit.jsemcor.element.IContextID startContext,
int startSent,
Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the specified tagger, writing the data to the specified writer. |
protected void |
process(edu.mit.jsemcor.element.IContextID cid,
edu.mit.jsemcor.element.ISentence s,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
Writer writer)
Tags the provided sentence, using the specified tagger, writing the data to the specified writer. |
void |
process(Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the specified tagger, writing the data to the specified writer. |
void |
run()
|
protected void |
setLocation(Class<?> key,
File loc)
Sets a default location into the Java Preferences. |
protected List<String> |
stem(String token,
edu.mit.jsemcor.element.IWordform wf,
edu.mit.jwi.morph.IStemmer stemmer)
Stems the given token. |
Methods inherited from class edu.mit.jmwe.util.AbstractFileSelector |
---|
choose, chooseDirectory, chooseFile, chooseFileForWriting, getFileChooser |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ConcordanceTagger()
Method Detail |
---|
public static void main(String[] args)
TaggedConcordanceIterator
class.
args
- standard main method arguments; ignoredpublic void run()
run
in interface Runnable
protected edu.mit.jsemcor.main.IConcordanceSet getSemcor()
null
if the directory
cannot be found.
null
if the directory
cannot be found.protected edu.mit.jwi.morph.IStemmer getStemmer()
null
if the
Wordnet directory cannot be found.
null
if the
Wordnet directory cannot be found.protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> getPOSTagger() throws Exception
null
if no model is
selected or found.
MaxentTagger
using a Stanford NLP tagging model
selected by the user. Will return null
if no model
is selected or found.
Exception
protected Writer getWriter() throws IOException
null
if no output file is selected.
null
if no output file is
selected.
IOException
- if an exception occurs when constructing the file writerprotected File getLocation(Class<?> key)
getLocation
in class AbstractFileSelector
key
- the class that serves as key for this location
null
if noneprotected void setLocation(Class<?> key, File loc)
setLocation
in class AbstractFileSelector
key
- the class that serves as key for this locationloc
- the location to be saved to the preferencespublic void process(Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, Writer writer, IProgressBar pb) throws IOException
cs
- the concordance set from which contexts should be drawn, may
not be null
posTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- a stemmer used to stem wordswriter
- the writer to which results should be written, may not be
null
pb
- the progress bar to which progress is to be reported; may be
null
IOException
- if there is a problem writing to the provided writer
NullPointerException
- if any argument is null
public void process(edu.mit.jsemcor.element.IContextID startContext, int startSent, Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, Writer writer, IProgressBar pb) throws IOException
startContext
- the context where the tagging should begin. If
null
, the tagging will being with the first
context.startSent
- the sentence number past which tagging should being. If the
number is non-positive, no sentences in the specified context
are skippedcs
- the concordance set from which contexts should be drawn, may
not be null
posTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- a stemmer used to stem wordswriter
- the writer to which results should be written, may not be
null
pb
- the progress bar to which progress is to be reported; may be
null
IOException
- if there is a problem writing to the provided writer
NullPointerException
- if any of the concordance set, tagger, or writer are
null
protected void process(edu.mit.jsemcor.element.IContextID cid, edu.mit.jsemcor.element.ISentence s, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, Writer writer) throws IOException
cid
- the context containing the sentences
- the sentence being taggedposTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- the stemmer used to stem the tokens, may not be
null
writer
- the writer to which results should be written, may not be
null
IOException
- if there is a problem writing to the provided writer
NullPointerException
- if any of the sentence, tagger, or writer are
null
protected ArrayList<edu.stanford.nlp.ling.HasWord> makeSentence(edu.mit.jsemcor.element.ISentence s, edu.mit.jwi.morph.IStemmer stemmer)
IToken
object in the original
semcor sentence.
s
- a JSemcor ISentence
object to be transformed
NullPointerException
- if the specified sentence is null
protected void addWords(edu.mit.jsemcor.element.IWordform wf, int tokenNum, List<ConcordanceTagger.TaggerToken> result, edu.mit.jwi.morph.IStemmer stemmer)
wf
- the wordform whose constituent words are to be stemmedtokenNum
- the number of the token to be tagged, inside the wordformresult
- the list to which the tagger tokens will be addedstemmer
- the stemmer used to stem the tokens, may not be
null
protected List<String> stem(String token, edu.mit.jsemcor.element.IWordform wf, edu.mit.jwi.morph.IStemmer stemmer)
token
- the token to be stemmedwf
- the wordform from which the token is drawn the wordform from
which the token is drawnstemmer
- the stemmer used to stem the tokens, may not be
null
null
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |