RaspXmlGraphReader

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

sem.graphreader
Class RaspXmlGraphReader

java.lang.Object
  sem.graphreader.RaspXmlGraphReader

All Implemented Interfaces:: GraphReader

public class RaspXmlGraphReader
extends java.lang.Object
implements GraphReader
extends java.lang.Object
implements GraphReader

Graph reader for the RASP XML format.

RASP toolkit: http://ilexir.co.uk/2011/open-source-rasp-release/

It supports multiple tags, multiple parses, and reads weighted GR info into the metadata field. It can also directly read the file in a GZIP format.

The small example files were parsed using the following command: ./rasp.sh -m -p'-mgi -pr -n10' < input.txt > output.xml

The big example file was parsed with this command and then gzipped, to save time and space: ./rasp.sh -m -p'-mg -pr' < input.txt > output.xml

There are two cases where the output from RASP does not exactly correspond to a graph. First, the passive property for verbs is represented as an edge where the verb is the head but the dependent does not exist. This is solved by adding a new 'null' node to the graph that acts as a dependent for that edge. Second, the head of a relation can be marked as an ellipsis, which doesn't correspond to any lemma in the sentence. In such a case, a new 'ellip' node is added to the graph that acts as the head. If these nodes/edges are not needed, they should be removed in post-processing.

There are also two possible ways of selecting nodes for the node list of each graph:

NODES_ALL - Include all lemmas that RASP outputs for a given sentence. If the option of multiple POS tags is activated, this can result in multiple nodes that correspond to a single token.
NODES_TOKENS - If a lemma is used in the edges of a graph, that's the one we include. Otherwise, if a token does not have any lemmas participating in the edges, add the first lemma of this token to the list of nodes. This should result in matching numbers of nodes and tokens.

If the multiple tags option is not activated in RASP, both of these modes should give the same output.

Field Summary
`static int`	`NODES_ALL` The nodes list will contain all the lemmas given by RASP, including the cases where two lemmas correspond to the same token.
`static int`	`NODES_TOKENS` The system tries to construct the most likely set of lemmas, given the sentence and the graph.

Constructor Summary
`RaspXmlGraphReader(java.lang.String inputPath, int nodeSelectionMode, boolean getAllParses, boolean getMetaData)` Create a new reader for RASP XML.

Method Summary
`void`	`close()` Close the reader.
`boolean`	`hasNext()` Check whether there are more graphs available.
`Graph`	`next()` Get the next graph from the corpus.
`java.util.ArrayList<Graph>`	`nextSentence()` Read a sentence from the corpus.
`void`	`reset()` Reset the whole reading process to the beginning.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

NODES_ALL

public static final int NODES_ALL

The nodes list will contain all the lemmas given by RASP, including the cases where two lemmas correspond to the same token.

See Also:: Constant Field Values

NODES_TOKENS

public static final int NODES_TOKENS

The system tries to construct the most likely set of lemmas, given the sentence and the graph. A node is included if it is used in one of the edges. Also, if a word is not used in the edges at all, its first lemma is included. This way the number of tokens in the sentence and the number of nodes in the graph should match.

See Also:: Constant Field Values

Constructor Detail