Question

I just want to segment this text file into lines and to classify the lines. If the line starts with "Qty" then the next lines are the order items until the line starts with "GST".

If the line starts with "Total Amount" Then this is the total amount line.

Business me . ' l
Address "rwqagePnnter Pro DemcRafifilp
Address "mfgr Eva|uat|on Only
Contact line 1
Transaction Number 10006
Issue Date 27/02/201
Time 10:36:55
Salesperson orsa orsa
Qty Description Unit Price Total
1 test $120.00 $120.00
GST $10.91
Total Amount $120.00
Cash $120.00
Please contact us for more information about
this receipt.
Thank you for your business.
d
.
test

Please show me how to do with PegJS http://pegjs.majda.cz/

Was it helpful?

Solution

Here's a quick and dirty sample solution

{
  var in_quantity = false // Track whether or not we are in a quantity block
  var quantity    = []
  var gst         = null
  var total       = null
}

start =
  // look for a quantity, then GST, then a total and finally anything else
  (quantity / gst / total / line)+
  {
    return {quantity: quantity, gst: gst, total: total}
  }

chr = [^\n]
eol = "\n"?

quantity   = "Qty" chr+ eol        { in_quantity = true; }
gst        = "GST" g:chr+ eol      { in_quantity = false; gst = g.join('').trim(); }
total      = "Total Amount" t:line { in_quantity = false; total = t.trim(); }

line =
  a:chr+ eol
  {
    if( in_quantity ){
      // break quantities into columns based on tabs
      quantity.push( a.join('').split(/[\t]/) );
    }
    return a.join('');
  }

OTHER TIPS

How about the following code as another solution.

{
  var result = [];
}

start
  = (!QTY AnyLine /
      set:(Quantities TotalAmount)
        {result.push({orders:set[0], total:set[1]})}
    )+ (Chr+)?
  {return result;}

QTY = "Qty"
GST = "GST"

Quantities
  = QtyLine order:(OrderLine*) GSTLine {return order;}

QtyLine
  = QTY Chr* _

OrderLine
  = !GST ch:(Chr+) _ {return ch.join('');}

GSTLine
  = GST Chr* _

TotalAmount
  = "Total Amount" total:(Chr*) _ {return total.join('');}

AnyLine
  = ch:(Chr*) _ {return ch.join('');}

Chr
  = [^\n]
_
  = "\n"

You could use XML, or you could do every line ending with a "/" and then splitting it by them using the split function.

mytext = mytext.split("/");

And then work with that. I don't know why you wouldn't just use sql or something similar.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top