Question

I try to write the Xtext BNF for Configuration files (known with the .ini extension)

For instance, I'd like to successfully parse

[Section1]
a = Easy123
b = This *is* valid too

[Section_2]
c = Voilà # inline comments are ignored

My problem is matching the property value (what's on the right of the '=').

My current grammar works if the property matches the ID terminal (eg a = Easy123).

PropertyFile hidden(SL_COMMENT, WS):
    sections+=Section*;

Section:
    '[' name=ID ']'
    (NEWLINE properties+=Property)+
    NEWLINE+;

Property:
    name=ID (':' | '=') value=ID ';'?;

terminal WS:
    (' ' | '\t')+;

terminal NEWLINE:
// New line on DOS or Unix 
    '\r'? '\n';

terminal ID:
    ('A'..'Z' | 'a'..'z') ('A'..'Z' | 'a'..'z' | '_' | '-' | '0'..'9')*;

terminal SL_COMMENT:
// Single line comment
    '#' !('\n' | '\r')*;

I don't know how to generalize the grammar to match any text (eg c = Voilà).

I certainly need to introduce a new terminal Property: name=ID (':' | '=') value=TEXT ';'?;

Question is: how should I define this TEXT terminal?

I have tried

  • terminal TEXT: ANY_OTHER+; This raises a warning

    The following token definitions can never be matched because prior tokens match the same input: RULE_INT,RULE_STRING,RULE_ML_COMMENT,RULE_ANY_OTHER

    (I think it doesn't matter).

    Parsing Fails with

    Required loop (...)+ did not match anything at input 'à'

  • terminal TEXT: !('\r'|'\n'|'#')+; This raises a warning

    The following token definitions can never be matched because prior tokens match the same input: RULE_INT

    (I think it doesn't matter).

    Parsing Fails with

    Missing EOF at [Section1]

  • terminal TEXT: ('!'|'$'..'~'); (which covers most characters, except # and ") No warning during the generation of the lexer/parser. However Parsing Fails with

    Mismatch input 'Easy123' expecting RULE_TEXT

    Extraneous input 'This' expecting RULE_TEXT

    Required loop (...)+ did not match anything at 'is'

Thanks for your help (and I hope this grammar can be useful for others too)

Was it helpful?

Solution

This grammar does the trick:

grammar org.xtext.example.mydsl.MyDsl hidden(SL_COMMENT, WS)

generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
import "http://www.eclipse.org/emf/2002/Ecore"

PropertyFile:
    sections+=Section*;

Section:
    '[' name=ID ']' 
    (NEWLINE+ properties+=Property)+
    NEWLINE+;

Property:
    name=ID value=PROPERTY_VALUE;

terminal PROPERTY_VALUE: (':' | '=') !('\n' | '\r')*;

terminal WS:
    (' ' | '\t')+;

terminal NEWLINE:
// New line on DOS or Unix 
    '\r'? '\n';

terminal ID:
    ('A'..'Z' | 'a'..'z') ('A'..'Z' | 'a'..'z' | '_' | '-' | '0'..'9')*;

terminal SL_COMMENT:
// Single line comment
    '#' !('\n' | '\r')*;

Key is, that you do not try to cover the complete semantics only in the grammar but take other services into account, too. The terminal rule PROPERTY_VALUE consumes the complete value including leading assignment and optional trailing semicolon.

Now just register a value converter service for that language and take care of the insignificant parts of the input, there:

import org.eclipse.xtext.conversion.IValueConverter;
import org.eclipse.xtext.conversion.ValueConverter;
import org.eclipse.xtext.conversion.ValueConverterException;
import org.eclipse.xtext.conversion.impl.AbstractDeclarativeValueConverterService;
import org.eclipse.xtext.conversion.impl.AbstractIDValueConverter;
import org.eclipse.xtext.conversion.impl.AbstractLexerBasedConverter;
import org.eclipse.xtext.nodemodel.INode;
import org.eclipse.xtext.util.Strings;

import com.google.inject.Inject;

public class PropertyConverters extends AbstractDeclarativeValueConverterService {
    @Inject
    private AbstractIDValueConverter idValueConverter;

    @ValueConverter(rule = "ID")
    public IValueConverter<String> ID() {
        return idValueConverter;
    }

    @Inject
    private PropertyValueConverter propertyValueConverter;

    @ValueConverter(rule = "PROPERTY_VALUE")
    public IValueConverter<String> PropertyValue() {
        return propertyValueConverter;
    }

    public static class PropertyValueConverter extends AbstractLexerBasedConverter<String> {

        @Override
        protected String toEscapedString(String value) {
            return " = " + Strings.convertToJavaString(value, false);
        }

        public String toValue(String string, INode node) {
            if (string == null)
                return null;
            try {
                String value = string.substring(1).trim();
                if (value.endsWith(";")) {
                    value = value.substring(0, value.length() - 1);
                }
                return value;
            } catch (IllegalArgumentException e) {
                throw new ValueConverterException(e.getMessage(), node, e);
            }
        }
    }
}

The follow test case will succeed, after you registered the service in the runtime module like this:

@Override
public Class<? extends IValueConverterService> bindIValueConverterService() {
    return PropertyConverters.class;
}

Test case:

import org.junit.runner.RunWith
import org.eclipse.xtext.junit4.XtextRunner
import org.xtext.example.mydsl.MyDslInjectorProvider
import org.eclipse.xtext.junit4.InjectWith
import org.junit.Test
import org.eclipse.xtext.junit4.util.ParseHelper
import com.google.inject.Inject
import org.xtext.example.mydsl.myDsl.PropertyFile
import static org.junit.Assert.*

@RunWith(typeof(XtextRunner))
@InjectWith(typeof(MyDslInjectorProvider))
class ParserTest {

    @Inject
    ParseHelper<PropertyFile> helper

    @Test
    def void testSample() {
        val file = helper.parse('''
            [Section1]
            a = Easy123
            b : This *is* valid too;

            [Section_2]
            # comment
            c = Voilà # inline comments are ignored
        ''')
        assertEquals(2, file.sections.size)
        val section1 = file.sections.head
        assertEquals(2, section1.properties.size)
        assertEquals("a", section1.properties.head.name)
        assertEquals("Easy123", section1.properties.head.value)
        assertEquals("b", section1.properties.last.name)
        assertEquals("This *is* valid too", section1.properties.last.value)

        val section2 = file.sections.last
        assertEquals(1, section2.properties.size)
        assertEquals("Voilà # inline comments are ignored", section2.properties.head.value)
    }

}

OTHER TIPS

The problem (or one problem anyway) with parsing a format like that is that, since the text part may contain = characters, a line like foo = bar will be interpreted as a single TEXT token, not an ID, followed by a '=', followed by a TEXT. I can see no way to avoid that without disallowing (or requiring escaping of) = characters in the text part.

If that is not an option, I think, the only solution would be to make a token type LINE that matches an entire line and then take that apart yourself. You'd do that by removing TEXT and ID from your grammar and replacing them with a token type LINE that matches everything up to the next line break or comment sign and must start with a valid ID. So something like this:

LINE :
    ('A'..'Z' | 'a'..'z') ('A'..'Z' | 'a'..'z' | '_' | '-' | '0'..'9')*
    WS* '=' WS*
    !('\r' | '\n' | '#')+
;

This token would basically replace your Property rule.

Of course this is a rather unsatisfactory solution as it will give you the entire line as a string and you still have to pick it apart yourself to separate the ID from the text part. It also prevents you from highlighting the ID part or the = sign as the entire line is one token and you can't highlight part of a token (as far as I know). Overall this does not buy you all that much over not using XText at all, but I don't see a better way.

As a workaround, I have changed

Property:
    name=ID ':' value=ID ';'?;

Now, of course, = is not in conflict any more, but this is certainly not a good solution, because properties can usually defined with name=value

Edit: Actually, my input is a specific property file, and the properties are know in advance.

My code now looks like

Section:
    '[' name=ID ']'
    (NEWLINE (properties+=AbstractProperty)?)+;

AbstractProperty:
    ADef
        | BDef

ADef:
    'A' (':'|'=') ID;

BDef:
    'B' (':'|'=') Float;

There is an extra benefit, the property names are know as keywords, and colored as such. However, autocompletion only suggest '[' :(

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top