sem.graphreader
Class RaspXmlGraphReader

java.lang.Object
  extended by sem.graphreader.RaspXmlGraphReader
All Implemented Interfaces:
GraphReader

public class RaspXmlGraphReader
extends java.lang.Object
implements GraphReader

Graph reader for the RASP XML format.

RASP toolkit: http://ilexir.co.uk/2011/open-source-rasp-release/

It supports multiple tags, multiple parses, and reads weighted GR info into the metadata field. It can also directly read the file in a GZIP format.

The small example files were parsed using the following command: ./rasp.sh -m -p'-mgi -pr -n10' < input.txt > output.xml

The big example file was parsed with this command and then gzipped, to save time and space: ./rasp.sh -m -p'-mg -pr' < input.txt > output.xml

There are two cases where the output from RASP does not exactly correspond to a graph. First, the passive property for verbs is represented as an edge where the verb is the head but the dependent does not exist. This is solved by adding a new 'null' node to the graph that acts as a dependent for that edge. Second, the head of a relation can be marked as an ellipsis, which doesn't correspond to any lemma in the sentence. In such a case, a new 'ellip' node is added to the graph that acts as the head. If these nodes/edges are not needed, they should be removed in post-processing.

There are also two possible ways of selecting nodes for the node list of each graph:

If the multiple tags option is not activated in RASP, both of these modes should give the same output.


Field Summary
static int NODES_ALL
          The nodes list will contain all the lemmas given by RASP, including the cases where two lemmas correspond to the same token.
static int NODES_TOKENS
          The system tries to construct the most likely set of lemmas, given the sentence and the graph.
 
Constructor Summary
RaspXmlGraphReader(java.lang.String inputPath, int nodeSelectionMode, boolean getAllParses, boolean getMetaData)
          Create a new reader for RASP XML.
 
Method Summary
 void close()
          Close the reader.
 boolean hasNext()
          Check whether there are more graphs available.
 Graph next()
          Get the next graph from the corpus.
 java.util.ArrayList<Graph> nextSentence()
          Read a sentence from the corpus.
 void reset()
          Reset the whole reading process to the beginning.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NODES_ALL

public static final int NODES_ALL
The nodes list will contain all the lemmas given by RASP, including the cases where two lemmas correspond to the same token.

See Also:
Constant Field Values

NODES_TOKENS

public static final int NODES_TOKENS
The system tries to construct the most likely set of lemmas, given the sentence and the graph. A node is included if it is used in one of the edges. Also, if a word is not used in the edges at all, its first lemma is included. This way the number of tokens in the sentence and the number of nodes in the graph should match.

See Also:
Constant Field Values
Constructor Detail

RaspXmlGraphReader

public RaspXmlGraphReader(java.lang.String inputPath,
                          int nodeSelectionMode,
                          boolean getAllParses,
                          boolean getMetaData)
                   throws GraphFormatException
Create a new reader for RASP XML.

Parameters:
inputPath - Path to the file or directory.
nodeSelectionMode - Set the way that nodes are added to the list of nodes in the graph. NODES_ALL includes all lemmas that RASP outputs. NODES_TOKENS includes one lemma for each token in the sentence.
getAllParses - Whether to include alternative parses for each sentence (if available).
getMetaData - Whether to read metadata (sentence id and weighted grs).
Throws:
GraphFormatException
Method Detail

next

public Graph next()
           throws GraphFormatException
Get the next graph from the corpus.

Specified by:
next in interface GraphReader
Returns:
The next graph.
Throws:
GraphFormatException

hasNext

public boolean hasNext()
Check whether there are more graphs available.

Specified by:
hasNext in interface GraphReader
Returns:
True if there are more graphs available.

reset

public void reset()
           throws GraphFormatException
Reset the whole reading process to the beginning.

Specified by:
reset in interface GraphReader
Throws:
GraphFormatException

close

public void close()
Close the reader.

Specified by:
close in interface GraphReader

nextSentence

public java.util.ArrayList<Graph> nextSentence()
                                        throws GraphFormatException
Read a sentence from the corpus. This returns a list of graphs if getAllParses is set to true.

Specified by:
nextSentence in interface GraphReader
Returns:
List of graphs
Throws:
GraphFormatException