Pergunta

So, this is how the problem goes: I am trying to extract information from scanned receipts like this,

Receipt from New York, USA

What I have been told is that I would get the textual data from a OCR software, so in short I will be working with a textual version of the image directly.

Problem:

The problem in hand here is that I have to extract certain information here, namely,

  1. The Location. (for eg: New York, United States)
  2. The complete Total (after all the discounts, tips, etc. have been involved) (Eg: 1033.42)
  3. The Currency. ($, £, €, etc)
  4. The Date. (Easier to guess)

The reason I want to extract the Location information is that, if for example, the currency is not explicitly mentioned here, then I can infer from the location where the receipt is generated.

Challenges:

The challenge here is that, information like Total Due could be anything like Grand Total or Total (only), or something semantically similar to Total, as I am not going to get the same receipt from the same restaurant only. (The restaurant could be anywhere in the world, but the problem is limited to only English speaking countries for now.)

The other challenge is to actually get the total information. It's very easy for us to see that 1033.42 is the total in the above receipt. But how do I get the software to know that? The way I see it, is that 1033.42 is near to the total(proximity). But there could be other numbers near to it as well.

Where I have tried and failed:

I have been told to start from NLTK(NER) but NER doesn't work for everything here. I can get the date information through it but the problem isn't just only to identify what the named entities are, imo.

What I think would work

The way I see it, I think I need to use a machine/deep learning model where the machine would be able to understand the proximity match between anything which semantically says Total and the number near to it (most probably on the right side).

Any help regarding which model would work best in terms of speed(first and foremost) and accuracy would be greatly appreciated.
I would also appreciate any help regarding where I could find any dataset or existing model which I could use for Transfer Learning.

Foi útil?

Solução

There is already an ML engine that does these extractions, here is a general process layout: There

Hre is the orginal paper describing the architecture, features, approaches etc. read it up, instead of me copying it here. cloudscan

Licenciado em: CC-BY-SA com atribuição
scroll top