public class ConcordanceTagger extends AbstractFileSelector implements java.lang.Runnable
This class depends on three external libraries: JWI, JSemcor, and the Stanford POS Tagger.
Use the main method of this class for its default functionality.
TaggedConcordanceIterator
Modifier and Type | Class and Description |
---|---|
protected static class |
ConcordanceTagger.TaggerToken
Represents a semcor token that is not yet tagged.
|
Constructor and Description |
---|
ConcordanceTagger() |
Modifier and Type | Method and Description |
---|---|
protected void |
addWords(edu.mit.jsemcor.element.IWordform wf,
int tokenNum,
java.util.List<ConcordanceTagger.TaggerToken> result,
edu.mit.jwi.morph.IStemmer stemmer)
Stems each of the words in the provided wordform, adding the tagger
tokens created from these stems, words and token number to the given
results list.
|
protected java.io.File |
getLocation(java.lang.Class<?> key)
Utility method for getting a location that has a default stored in the
Java preferences.
|
protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> |
getPOSTagger()
Returns a maximum entropy tagger using a Stanford NLP tagging model
selected by the user.
|
protected edu.mit.jsemcor.main.IConcordanceSet |
getSemcor()
Returns the Semcor concordance set or
null if the directory
cannot be found. |
protected edu.mit.jwi.morph.IStemmer |
getStemmer()
Returns a stemmer that requires Wordnet or
null if the
Wordnet directory cannot be found. |
protected java.io.Writer |
getWriter()
Returns a writer for the file to which the tagged concordance will be
written.
|
static void |
main(java.lang.String[] args)
Tags the Semcor corpus.
|
protected java.util.ArrayList<edu.stanford.nlp.ling.HasWord> |
makeSentence(edu.mit.jsemcor.element.ISentence s,
edu.mit.jwi.morph.IStemmer stemmer)
Returns a Stanford parser sentence that contains all the tokens from the
specified JSemcor sentence, with MWE expressions broken into their
constituent tokens.
|
void |
process(edu.mit.jsemcor.element.IContextID startContext,
int startSent,
java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the
specified tagger, writing the data to the specified writer.
|
protected void |
process(edu.mit.jsemcor.element.IContextID cid,
edu.mit.jsemcor.element.ISentence s,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer)
Tags the provided sentence, using the specified tagger, writing the data
to the specified writer.
|
void |
process(java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the
specified tagger, writing the data to the specified writer.
|
void |
run() |
protected void |
setLocation(java.lang.Class<?> key,
java.io.File loc)
Sets a default location into the Java Preferences.
|
protected java.util.List<java.lang.String> |
stem(java.lang.String token,
edu.mit.jsemcor.element.IWordform wf,
edu.mit.jwi.morph.IStemmer stemmer)
Stems the given token.
|
choose, chooseDirectory, chooseFile, chooseFileForWriting, getFileChooser
public static void main(java.lang.String[] args)
TaggedConcordanceIterator
class.args
- standard main method arguments; ignoredpublic void run()
run
in interface java.lang.Runnable
protected edu.mit.jsemcor.main.IConcordanceSet getSemcor()
null
if the directory
cannot be found.null
if the directory
cannot be found.protected edu.mit.jwi.morph.IStemmer getStemmer()
null
if the
Wordnet directory cannot be found.null
if the
Wordnet directory cannot be found.protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> getPOSTagger() throws java.lang.Exception
null
if no model is
selected or found.MaxentTagger
using a Stanford NLP tagging model
selected by the user. Will return null
if no model
is selected or found.java.lang.Exception
- if there is a problem instantiating the maximum entropy
tagger.protected java.io.Writer getWriter() throws java.io.IOException
null
if no output file is selected.null
if no output file is
selected.java.io.IOException
- if an exception occurs when constructing the file writerprotected java.io.File getLocation(java.lang.Class<?> key)
getLocation
in class AbstractFileSelector
key
- the class that serves as key for this locationnull
if noneprotected void setLocation(java.lang.Class<?> key, java.io.File loc)
setLocation
in class AbstractFileSelector
key
- the class that serves as key for this locationloc
- the location to be saved to the preferencespublic void process(java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, java.io.Writer writer, IProgressBar pb) throws java.io.IOException
cs
- the concordance set from which contexts should be drawn, may
not be null
posTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- a stemmer used to stem wordswriter
- the writer to which results should be written, may not be
null
pb
- the progress bar to which progress is to be reported; may be
null
java.io.IOException
- if there is a problem writing to the provided writerjava.lang.NullPointerException
- if any argument is null
public void process(edu.mit.jsemcor.element.IContextID startContext, int startSent, java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, java.io.Writer writer, IProgressBar pb) throws java.io.IOException
startContext
- the context where the tagging should begin. If
null
, the tagging will being with the first
context.startSent
- the sentence number past which tagging should being. If the
number is non-positive, no sentences in the specified context
are skippedcs
- the concordance set from which contexts should be drawn, may
not be null
posTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- a stemmer used to stem wordswriter
- the writer to which results should be written, may not be
null
pb
- the progress bar to which progress is to be reported; may be
null
java.io.IOException
- if there is a problem writing to the provided writerjava.lang.NullPointerException
- if any of the concordance set, tagger, or writer are
null
protected void process(edu.mit.jsemcor.element.IContextID cid, edu.mit.jsemcor.element.ISentence s, edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger, edu.mit.jwi.morph.IStemmer stemmer, java.io.Writer writer) throws java.io.IOException
cid
- the context containing the sentences
- the sentence being taggedposTagger
- the part of speech tagger to be used to tag the sentences, may
not be null
stemmer
- the stemmer used to stem the tokens, may not be
null
writer
- the writer to which results should be written, may not be
null
java.io.IOException
- if there is a problem writing to the provided writerjava.lang.NullPointerException
- if any of the sentence, tagger, or writer are
null
protected java.util.ArrayList<edu.stanford.nlp.ling.HasWord> makeSentence(edu.mit.jsemcor.element.ISentence s, edu.mit.jwi.morph.IStemmer stemmer)
IToken
object in the original
semcor sentence.s
- a JSemcor ISentence
object to be transformedstemmer
- the stemmer to use when making the wordsjava.lang.NullPointerException
- if the specified sentence is null
protected void addWords(edu.mit.jsemcor.element.IWordform wf, int tokenNum, java.util.List<ConcordanceTagger.TaggerToken> result, edu.mit.jwi.morph.IStemmer stemmer)
wf
- the wordform whose constituent words are to be stemmedtokenNum
- the number of the token to be tagged, inside the wordformresult
- the list to which the tagger tokens will be addedstemmer
- the stemmer used to stem the tokens, may not be
null
protected java.util.List<java.lang.String> stem(java.lang.String token, edu.mit.jsemcor.element.IWordform wf, edu.mit.jwi.morph.IStemmer stemmer)
token
- the token to be stemmedwf
- the wordform from which the token is drawn the wordform from
which the token is drawnstemmer
- the stemmer used to stem the tokens, may not be
null
null
.Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.