1 Parser
2 Greedy Lexing
3 Precedence
4 Additional Syntax Constraints
5 Hints and Explanation
6 One very simple example
7 Output
8 Primitives
9 Surface Interface / Handin
10 Errors and Corrections

Assignment 2, CSC431, Winter 2010

For the next assignment, you’ll need to build a parser for your language, and to connect it to your interpreter.

Here’s the concrete syntax for the Footle language:

1 Parser

A program consists of a sequence of ’stmt’s:

 

stmt

 ::= 

expr ;

 

  |  

id = expr ;

 

  |  

expr . id = expr ;

 

  |  

var id = expr ;

 

  |  

return expr ;

 

  |  

if ( expr ) { stmt›* }

 

  |  

if ( expr ) { stmt›* } else { stmt›* }

 

  |  

while ( expr ) { stmt›* }

 

  |  

function id ( paramlist ) { stmt›* }

These should map in an obvious way onto the language forms whose meanings are described as part of assignment 1.

Next, ’expr’ is defined like this:

 

expr

 ::= 

int

 

  |  

float

 

  |  

true

 

  |  

false

 

  |  

id

 

  |  

string

 

  |  

( expr )

 

  |  

id ( arglist )

 

  |  

( expr ) ( arglist )

 

  |  

expr binop expr

 

  |  

! expr

 

  |  

expr . id

 

  |  

expr . id ( arglist )

 

  |  

new id ( arglist )

Again, these should map in an obvious way onto the language forms whose meanings are described as part of assignment 1. Note that the "function" position of an application must either be a name or be enclosed by parentheses.

Here are the ’binop’s:

 

binop

 ::= 

+  |  -  |  *  |  /  |  <  |  <=  |  >  |  >=  |  ==  |  &&  |  ||

Identifiers are defined as a single alphabetic character followed by zero or more alphabetic, numeric, underscore (_), and question mark (?) characters.

Strings start and end with a double-quote ("). Within a string, the sequence backslash-double-quote (\") should be used to represent a double-quote character in the string, and the sequence backslash-n should be used to represent a newline character.

Integers consist of one or more digits.

Floating point numbers consist of one or more digits, containing exactly one decimal point (.).

Paramlists are comma-separated lists of identifiers (possibly empty), and arglists are comma-separated lists of expressions (again, possibly empty).

All tokens may be separated by whitespace.

2 Greedy Lexing

As written, there is a great deal of ambiguity in this specification. For instance, given the sequence of characters "abcdef", should this be parsed as a single identifier with the name "abcdef" or as two identifiers with the names "abc" and "def"? I hope you prefer the former. In general, the scanner should be "greedy", in the sense that it should gather as many characters as possible into the next token. Fortunately, you will find that most regular expression systems are greedy by default. This can have somewhat surprising results; for instance, the string "abc342.241" will be scanned as the identifier "abc342" followed by the floating point number ".241"... but as the parser will reject this, I don’t see this as a problem.

3 Precedence

Certain boolean and unary operators have higher precedence than others. In particular, they should be grouped like this:

Within these levels, the operators should be left-associative.

Here’s an example. The expression "3 + 4 / ! c . abc * 6 == 5 + 6 + 7 && true || false" should be parsed into the same AST as the expression "((3 + 4) / ((! (c . abc)) * 6)) == ((((5 + 6) + 7) && true) || false)".

4 Additional Syntax Constraints

It is an error to use the name of a primitive as a variable name. Here are the primitive names: "stringLength", "subString", "stringEqual?", "stringAppend" "stringLessThan?", "instanceof", "int?", "bool?", "float?", "void?", "string?", "closure?", "plain?", "print", and "readLine". All of these are applied using the standard application syntax. For instance, "string?("abc")" or "instanceof(new F(),F)".

It is an error to declare a variable or parameter named "this", or to mutate the "this" binding (brrr... too scary for me to think about).

It is an error to use the same name twice in a single function’s parameter list.

It is an error to mutate a primitive.

5 Hints and Explanation

You are not required to structure your grammar precisely following this one. In particular, you need to gather statements together into the body of a variable declaration, and you need to gather adjacent function declarations together into a single mutually recursive block. One way to do this is to restructure the grammar given here to make the rules follow more closely the structure of the language.

6 One very simple example

Here’s one simple program:

function odd(x) {

  if (x == 0) {

    return false;

  } else {

    return even(x - 1);

  }

}

function even(x) {

  if (x == 0) {

    return true;

  } else {

    return odd(x - 1);

  }

}

 

print(even(14));

7 Output

Your parser must generate outputs in XML syntax, using the Relax NG specification that appeared on the newsgroup. Here it is, again:

start = prog

 

prog = element Program{ast*}

 

ast = (litstr | litint | litfloat | litbool

| varref | if | application | sequence

| varbind | funbind | return | setvar

| while | fieldref | fieldset | fieldcall

| newexp )

 

litstr = element LitStr {text}

litint = element LitInt {xsd:integer}

litfloat = element LitFloat {xsd:double}

litbool = element LitBool { ("true" | "false") }

varref = element Varref {text}

if = element If { ast, ast, ast}

application = element Application {ast, ast*}

sequence = element Sequence {ast*}

varbind = element VarBind { element VarName{text}, ast, ast}

funbind = element FunBind { funbinding*, ast}

return = element Return {ast}

setvar = element SetVar {element VarSetName{text}, ast}

while = element While {ast, ast}

fieldref = element FieldRef{ast, element FieldRefName {text}}

fieldset = element FieldSet{ast, element FieldSetName {text}, ast}

fieldcall = element FieldCall{ast, element FieldCalledName {text}, ast* }

newexp = element NewExp{ast, ast*}

 

funbinding = element FunBinding { element Name { text }, element Param {text} *, ast}

Once again, I’m hoping that the correspondence between this XML language and the language forms specified in assignment 1 is self-evident. Let me know if you see ambiguities, or if fixes are required.

8 Primitives

For this assignment, we’re going to nail down the output of the "print" primitive a bit better. Specifically:

In addition, you should add the stringAppend primitive, that consumes two strings and produces the string containing the characters of the first followed by the characters of the second.

9 Surface Interface / Handin

Here’s the command-line specification for your parser:

  parse

Your parser’s executable should be called "parse", and it should accept no arguments. It should read a footle program on stdin, and write XML on stdout.

Here’s the command-line specification of the interpreter:

  interpret

Again, it should accept no arguments, read XML from stdin, and evaluate the program. It should produce no output aside from that produced by calls to print.

In order to hand in your code, create a branch called Assignment2.

10 Errors and Corrections

There are doubtless errors in this specification. Please let me know about them as soon as you find them.