Question

I am doing university final year project, My task is to extract causalities from news containing crime. I need to extract also location of crime. I used OPENNLP name entity recognizer to extract location. I trained a model for location names and its working fine for my scenario till now. Now I am searching some thing(a way OR a library) to extract causality. Here is a snap shot of a news which i am using and the Bold and Italic text is that which I want to extract. Need help for a library for this purpose same like OPENNLP, or any thing which can be useful or me in order to perform this task.

News 1:

KARACHI: At least 12 people were gunned down in the city on Monday, two of them apparently killed in sectarian attacks and one of the other victims a Muttahida Qaumi Movement activist.

News 2:

KARACHI: Police on Tuesday arrested three accused in different raids at Gulistan-e-Jauhar and Brigade areas, Geo News reported.

News 3:

KARACHI: Five members of a family were found dead inside their house in Baldia Town here on Monday, Geo News reported.

News 4:

KARACHI: Sindh Rangers in their continued targeted operation in the city last night rounded up eight professional criminals and recovered weapons from them, Geo News reported.

Was it helpful?

Solution

You may be able to pull out causality by using the parser, chunker, or part of speech tagger to identify verb phrases and noun phrases. This can be done by something like extracting consecutive verb and noun phrases. this is how to use the parser, which will give you the entire sentence structure to play with, you will need to download the parser model

use this class (I put one of your sentences in)

public class ParseMap {

  public static void main(String[] args) throws InvalidFormatException, IOException {

    InputStream is = new FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    is.close();
    Parser parser = ParserFactory.create(model);
    String sentence = "KARACHI: At least 12 people were gunned down in the city on Monday, two of them apparently killed in sectarian attacks and one of the other victims a Muttahida Qaumi Movement activist.";
    Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
    Parse p = topParses[0];
    p.showCodeTree();
    StringBuffer sb = new StringBuffer(sentence.length()*4);
    p.show(sb);
    System.out.println(sb);
  }
}

the output looks like this (held in the stringbuffer)

(TOP (S (`` KARACHI:) (S (NP (QP (IN At) (JJS least) (CD 12)) (NNS people)) (VP (VBD were) (VP (VBN gunned) (ADVP (RB down)) (PP (IN in) (NP (NP (DT the) (NN city)) (PP (IN on) (NP (NP (NNP Monday,) (CD two)) (PP (IN of) (NP (PRP them))))))) (ADVP (RB apparently)) (VP (VBD killed) (PP (IN in) (NP (JJ sectarian) (NNS attacks))))))) (CC and) (S (NP (NP (CD one)) (PP (IN of) (NP (DT the) (JJ other) (NNS victims)))) (NP (DT a) (NNP Muttahida) (NNP Qaumi) (NNP Movement))) (. activist.)))

notice how the causality you are looking for is a noun verb combo following one of your named entities (Karachi). With some tinkering you may be able to get some decent results.

EDIT: just to be clear, what I wrote was a suggestion to get something quick, you should be looking at some linguistic heuristics for this, and make sure what you want is actually causality, and not just event extraction, which you may be able to achieve training an NER model.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top