Pergunta

Is there an algorithmic approach to identify that dates given in a paragraph correlate to particular events (phrases) in the paragraph?

Example, consider the following paragraph:

In June 1970, the great leader took the oath. But it was only after May 1972, post the death of the Minister of State, that he took over the reins of the country. While he enjoyed popular support until Mid-1980, his influence began to fall thereafter.

Is there an algorithm (deterministic or stochastic)# that can generate a 2-tuple (date, event), where the event is implied, by the paragraph, to have occured on the date? In the above case:

  • (June 1970, great leader took oath)
  • (May 1972, took over the reins)

    or better yet

  • (May 1972, the great leader took over the reins)
  • (1980, fall in influence)

#Later addition

Foi útil?

Solução

In general, the problem of identifying dates and other temporal markers in text is called the problem of extracting temporal references. The search linked will take you to papers related to this.

Outras dicas

Since you ask for a algorithmic approach, I will be as stubborn as an algorithm. I'm sorry to treat this question like this, but since it doesn't seem like a complex theoretical problem, I will synthesize the possible approaches.

Question: can you give me a algorithmic definition of a date and of particular event?

If you can: Since your definition is algorithmic, then this is probably some kind of formal grammar, and your problem will be to tune that grammar to catch every case you need to consider. (I'm interested if you can give me an exact definition that isn't a formal grammar)

If you can't: then at least you can come up with examples. Alright then. The best – and only I can think of – approach is machine learning algorithms, that you will have to train in order to recognize your dates and then your events. (Using a corpus of sentences annotated by hand) However this is quite overweening compared to some big hand-made regexp that will probably do the job. If you really, really want to do it I think the most efficient will be this kind of regexp given as a parameter to the learning algorithm but you better ask machine learning experts.

Good luck with this, It's much easier just to talk about it (in both cases).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top