Question

I'm using the grammar on this site in my javacc. It works fine apart from some picture statements. For example ----,---,---.99 or --9.

http://mapage.noos.fr/~bpinon/cobol.jj

It doesn't seem to like more than one dash.

What do I need to change in this to support my picture examples.

I'v messed about with

void NumericConstant() :
{}
{
  (<PLUSCHAR>|<MINUSCHAR>)? IntegerConstant() [ <DOTCHAR> IntegerConstant() ]
} 

but nothing seems to be working. Any help is much appreciated

EDIT:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

Is this the regular expression for this whole line:

07 STRINGFIELD2 PIC AAAA. ??

If I want to accept 05 TEST3 REDEFINES TEST2 PIC X(10). would I change the regex to be:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
(<REDEFINES> (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*)?
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*

Thanks a lot for the help so far

Was it helpful?

Solution

Why are you messing around with NumericConstant() when you are trying to parse a COBOL PICTURE string?

According to the JavaCC source you have, a COBOL PICTURE should parse with:

void DataPictureClause() :
{}
{
  ( <PICTURE> | <PIC> ) [ <IS> ] PictureString()
}

the --9 bit is a Picture String and should parse with the PictureString() function:

void PictureString() :
{}
{
    [ PictureCurrency() ]
    ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+
    [ PicturePunctuation() ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+ ]
}

PictureCurrency() comes up empty so move on to PictureChars():

void PictureChars() :
{}
{
    <INTEGER> | <COBOL_WORD>
}

But COBOL_WORD does not appear to support many "interesting" valid PICTURE clause definitions:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

Parsing COBOL is not easy, in fact it is probably one of the most difficult languages in existance to build a quality parser for. I can tell you right now that the JavaCC source you are working from is not going to cut it - except for some very simple and probably totally artificial COBOL program examples.

Answer to comment

COBOL Picture strings tend to mess up the best of parsers. The minus sign you are having trouble with is only the tip of the iceburg! Picture Strings are difficult to parse through because the period and comma may be part of a Picture string but serve as separators outside of the string. This means that parsers cannot unambiguously classify a period or comma in a context free manner. They need to be "aware" of the context in which it is encountered. This may sound trivial but it isn't.

Technically, the separator period and comma must be followed by a space (or end of line). This little fact could make determining the period/comma role very simple because a Picture String cannot contain a space. However, many commercial COBOL compilers are "smart" enough correctly recognize separator periods/commas that are not followed by a space. Consequently there are a lot of COBOL programmers that code illegal separator period/commas, which means you will probably have to deal with them.

The bottom line is that no matter what you do, those little Picture Strings are going to haunt you. They will take quite a bit of effort to to deal with.

Just a hint of things to come, how would you parse the following:

01 DISP-NBR-1 PIC -99,999.
01 DISP-NBR-2 PIC -99,999..
01 DISP-NBR-3 PIC -99,999, .
01 DISP-NBR-4 PIC -99,999,. 

The period following DISP-NBR-1 terminates the Picture string. It is a separator period. The period following DISP-NBR-2 is part of the string, the second period is the separator. The comma following DISP-NBR-3 is a separator - it is not part of the Picture string. However the comma following DISP-NBR-4 is part of the Picture string because it is not followed by a space.

Welcome to COBOL!

OTHER TIPS

I found that I had to switch the lexer into another mode when I got PICTURE. A COBOL PICTURE string has completely different 'lexics' from the rest of the language, and you must discourage the lever from doing anything with periods, commas, etc, other than accumulate them into the picture string. See NealB's answer for some examples of knowing when to stop picture-scanning.

I have no idea why you want to incorporate the REDEFINES phrase into the word. Just parse it normally in the parser.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top