JSemcor 1.0.1
(MIT JSemcor Library)

JSemcor was designed to be an easy-to-use, easy-to-extend Java library for interfacing with the Semcor electronic concordance.

See:
          Description

Packages
edu.mit.jsemcor.data Provides an interface, and a default implementation of that interface, that governs transforming character data into IContext objects.
edu.mit.jsemcor.detokenize Provides interfaces and default implementations for classes that provide the ability to generate free text from Semcor files.
edu.mit.jsemcor.element Provides interfaces and default implementations for objects representing paragraphs, sentences, and tokens.
edu.mit.jsemcor.main Provides the main concordance interfaces and default implementations.
edu.mit.jsemcor.tags Provides interfaces and default implementations for objects that allow searching of a concordance via taglist files.
edu.mit.jsemcor.term Provides interfaces and default implementations for objects representing annotation information on tokens.

 

JSemcor was designed to be an easy-to-use, easy-to-extend Java library for interfacing with the Semcor electronic concordance. It features API calls to retrieve context objects, paragraphs, sentences and tokens from the Semcor data files. It also has classes that allow the user to interface to the taglist index files (if available) and the permit ‘detokenization’ of the texts back into human-readable form. The library includes no GUI elements.

JSemcor supports all currently available versions of Semcor. No version of Semcor is included with the JSemcor distribution; Semcor must be downloaded separately from Rada Milhacea’s website at http://www.cs.unt.edu/~rada/downloads.html.

The freely available version of JSemcor is licensed for use for non-commercial purposes only, as long as proper acknowledgment is made. Details can be found in the license, which is included with the distribution. The copyright on the software is owned by MIT; if you wish to use the software for commercial purposes, please contact the MIT Technology Licensing Office for more information on how to obtain a commercial license.

The main interface for accessing concordance data is the IConcordanceSet interface. The distribution comes with a single default implementation of this interface, the Semcor class. In the simplest case, where you are using data files on the same filesystem as your Java program, you can instantiate the Semcor class with a single argument, a Java URL object that points to the directory where the Semcor concordance data files are located.

An example of this can be found in below, in the form of a Java method testSemcor(). In that method, the first block of two lines (4-5) deals with constructing a URL object that points to the Semcor data files. The base Semcor directory is the directory that contains the subdirectories brown1, brown2, and brownv. In this example, it is assumed that the Semcor zip file was unzipped to the location “C:\Semcor\”. This may be different on your system depending on where you choose to put your Semcor files. The second block of code, two lines long (8-9), constructs an instance of the default Semcor object, and opens it by calling the open() method. The next block of lines (12-14) retrieves the first context file from the concordance by name, and retrieves the first sentence of that context file by direct access. Following that, lines 17-23 comprise a simple for loop searches for the first wordform in that sentence that has an assigned sense. Once that word form is found, some salient characteristics are printed to the console on lines 26-28. The text following the example code shows the console output of the method.

Sample Code:

1   public void testSemcor() throws IOException {
2       
3     // construct the URL to the Semcor directory
4     String path = "C:/Semcor/";
5     URL url = new URL("file", null, path);
6
7     // construct the semcor object and open it
8     IConcordanceSet semcor = new Semcor(url);
9     semcor.open();
10
11    // look up first sentence of first context
12    IConcordance concord = semcor.get("brown1");
13    IContext context = concord.getContext("br-a01");
14    ISentence sentence = context.getSentences().get(0);
15
16    // find first word with non-null sense
17    IWordform wordform = null;
18    for(IWordform wf : sentence.getWordList()){
19      if(wf.getSemanticTag() != null){
20          wordform = wf;
21          break;
22      }
23    }
24
25    // print it
26    System.out.println("Text = " + wordform.getText());
27    System.out.println("POS = " + wordform.getPOSTag().getValue());
28    System.out.println("Sense Key = " + wordform.getSemanticTag().getSenseKeys().get(0));
29  }

Sample Code Output:

1  Text = Fulton County Grand Jury
2  POS = NNP
3  Sense Key = group%1:03:00::



Copyright © 2008-2011 Massachusetts Institute of Technology. All Rights Reserved.