Skip to content

Lexer issues, in particular: Java backends do not accept/require whitespace between consecutive tokens #322

@andreasabel

Description

@andreasabel

The following grammar should parse ⟦ ab c.

Whatever. Main ::= Uni Foo Bar;

token Uni '⟦' ;
token Foo letter letter;
token Bar (char - 'a');

This is the situation in the different backends:

  • Haskell: yes
  • Ocaml: ocamllex refuses generated lexer definition with error
    File "Lextest.mll", line 42, character 11: illegal escape sequence \1.
    
  • C: parsing fails with error: 1,1: syntax error at ?
  • CPP: parsing fails with Parse error on line 1
  • Java: parsing fails with
    Syntax Error, trying to recover and continue parse... for input symbol "" spanning from unknown:-1/-1(-1) to unknown:-1/-1(-1)
    At line -1, near "ab c" :
       Unrecoverable Syntax Error
    
  • Java/ANTLR: parsing fails with
    line 1:1 extraneous input ' ' expecting Foo
    At line 1, column 1 :
       extraneous input ' ' expecting Foo
    

The parsers generated by the Java backends accept instead the input without the spaces: ⟦abc.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions