|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsem.graphreader.RaspXmlGraphReader
public class RaspXmlGraphReader
Graph reader for the RASP XML format.
RASP toolkit: http://ilexir.co.uk/2011/open-source-rasp-release/
It supports multiple tags, multiple parses, and reads weighted GR info into the metadata field. It can also directly read the file in a GZIP format.
The small example files were parsed using the following command:
./rasp.sh -m -p'-mgi -pr -n10' < input.txt > output.xml
The big example file was parsed with this command and then gzipped, to save time and space:
./rasp.sh -m -p'-mg -pr' < input.txt > output.xml
There are two cases where the output from RASP does not exactly correspond to a graph. First, the passive property for verbs is represented as an edge where the verb is the head but the dependent does not exist. This is solved by adding a new 'null' node to the graph that acts as a dependent for that edge. Second, the head of a relation can be marked as an ellipsis, which doesn't correspond to any lemma in the sentence. In such a case, a new 'ellip' node is added to the graph that acts as the head. If these nodes/edges are not needed, they should be removed in post-processing.
There are also two possible ways of selecting nodes for the node list of each graph:
Field Summary | |
---|---|
static int |
NODES_ALL
The nodes list will contain all the lemmas given by RASP, including the cases where two lemmas correspond to the same token. |
static int |
NODES_TOKENS
The system tries to construct the most likely set of lemmas, given the sentence and the graph. |
Constructor Summary | |
---|---|
RaspXmlGraphReader(java.lang.String inputPath,
int nodeSelectionMode,
boolean getAllParses,
boolean getMetaData)
Create a new reader for RASP XML. |
Method Summary | |
---|---|
void |
close()
Close the reader. |
boolean |
hasNext()
Check whether there are more graphs available. |
Graph |
next()
Get the next graph from the corpus. |
java.util.ArrayList<Graph> |
nextSentence()
Read a sentence from the corpus. |
void |
reset()
Reset the whole reading process to the beginning. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int NODES_ALL
public static final int NODES_TOKENS
Constructor Detail |
---|
public RaspXmlGraphReader(java.lang.String inputPath, int nodeSelectionMode, boolean getAllParses, boolean getMetaData) throws GraphFormatException
inputPath
- Path to the file or directory.nodeSelectionMode
- Set the way that nodes are added to the list of nodes in the graph. NODES_ALL includes all lemmas that RASP outputs. NODES_TOKENS includes one lemma for each token in the sentence.getAllParses
- Whether to include alternative parses for each sentence (if available).getMetaData
- Whether to read metadata (sentence id and weighted grs).
GraphFormatException
Method Detail |
---|
public Graph next() throws GraphFormatException
next
in interface GraphReader
GraphFormatException
public boolean hasNext()
hasNext
in interface GraphReader
public void reset() throws GraphFormatException
reset
in interface GraphReader
GraphFormatException
public void close()
close
in interface GraphReader
public java.util.ArrayList<Graph> nextSentence() throws GraphFormatException
nextSentence
in interface GraphReader
GraphFormatException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |