1 Parser
2 Greedy Lexing
3 Precedence
4 Additional Syntax Constraints
5 Hints and Explanation
6 One very simple example
7 Output
8 Primitives
9 Surface Interface / Handin
10 Errors and Corrections

Assignment 2, CSC431, Spring 2010

For the next assignment, you’ll need to build a parser for your language.

Here’s the concrete syntax for the Footle language:

1 Parser

A program consists of a sequence of ’stmt’s:

 

stmt

 ::= 

expr ;

 

  |  

id = expr ;

 

  |  

expr . id = expr ;

 

  |  

var id = expr ;

 

  |  

return expr ;

 

  |  

if ( expr ) { stmt›* }

 

  |  

if ( expr ) { stmt›* } else { stmt›* }

 

  |  

while ( expr ) { stmt›* }

 

  |  

function id ( paramlist ) { stmt›* }

These should map in an obvious way onto the language forms whose meanings are described as part of assignment 1.

Next, ’expr’ is defined like this:

 

expr

 ::= 

int

 

  |  

float

 

  |  

true

 

  |  

false

 

  |  

id

 

  |  

string

 

  |  

( expr )

 

  |  

id ( arglist )

 

  |  

( expr ) ( arglist )

 

  |  

expr binop expr

 

  |  

! expr

 

  |  

expr . id

 

  |  

expr . id ( arglist )

 

  |  

new id ( arglist )

Again, these should map in an obvious way onto the language forms whose meanings are described as part of assignment 1. Note that the "function" position of an application must either be a name or be enclosed by parentheses.

Here are the ’binop’s:

 

binop

 ::= 

+  |  -  |  *  |  /  |  <  |  <=  |  >  |  >=  |  ==  |  &&  |  ||

Identifiers are defined as a single alphabetic character followed by zero or more alphabetic, numeric, underscore (_), and question mark (?) characters. No keyword or primitive name may be used as an id.

Strings start and end with a double-quote ("). Within a string, the sequence backslash-double-quote (\") should be used to represent a double-quote character in the string, and the sequence backslash-n should be used to represent a newline character.

Integers consist of one or more digits.

Floating point numbers consist of at least one digit followed by a decimal point, followed by zero or more digits.

Note that you’re not required to handle unary minuses in either integers or floating point numbers.

Paramlists are comma-separated lists of identifiers (possibly empty), and arglists are comma-separated lists of expressions (again, possibly empty).

All tokens may be separated by whitespace.

2 Greedy Lexing

As written, there is a great deal of ambiguity in this specification. For instance, given the sequence of characters "abcdef", should this be parsed as a single identifier with the name "abcdef" or as two identifiers with the names "abc" and "def"? I hope you prefer the former. In general, the scanner should be "greedy", in the sense that it should gather as many characters as possible into the next token. Fortunately, you will find that most regular expression systems are greedy by default. This can have somewhat surprising results; for instance, the string "abc342.241" will be scanned as the identifier "abc342" followed by the floating point number ".241"... but as the parser will reject this, I don’t see this as a problem.

3 Precedence

Certain boolean and unary operators have higher precedence than others. In particular, they should be grouped like this:

Within these levels, the operators should be left-associative.

Here’s an example. The expression "3 + 4 / ! c . abc * 6 == 5 + 6 + 7 && true || false" should be parsed into the same AST as the expression "(3 + ((4 / (! (c . abc))) * 6)) == ((((5 + 6) + 7) && true) || false)".

4 Additional Syntax Constraints

It is an error to use the name of a primitive as a variable name. Here are the primitive names: "stringLength", "subString", "stringEqual?", "stringAppend" "stringLessThan?", "instanceof", "int?", "bool?", "float?", "void?", "string?", "closure?", "plain?", "print", and "readLine". All of these are applied using the standard application syntax. For instance, "string?("abc")" or "instanceof(new F(),F)".

It is an error to declare a variable or parameter named "this", or to mutate the "this" binding (brrr... too scary for me to think about).

It is an error to use the same name twice in a single function’s parameter list.

It is an error to mutate a primitive.

5 Hints and Explanation

You are not required to structure your grammar precisely following this one. In particular, you need to gather statements together into the body of a variable declaration, and you need to gather adjacent function declarations together into a single mutually recursive block. One way to do this is to restructure the grammar given here to make the rules follow more closely the structure of the language.

6 One very simple example

Here’s one simple program:

function odd(x) {

  if (x == 0) {

    return false;

  } else {

    return even(x - 1);

  }

}

function even(x) {

  if (x == 0) {

    return true;

  } else {

    return odd(x - 1);

  }

}

 

print(even(14));

7 Output

Your parser must produce ’exp’s, using the same definition that was a part of assignment 1.

8 Primitives

For this assignment, we’re going to nail down the output of the "print" primitive a bit better. Specifically:

9 Surface Interface / Handin

Your makefile should produce a parser.dll file defining a function called Parser.parse, that maps strings to exps.

In order to hand in your code, create a branch called Assignment2.

10 Errors and Corrections

There are doubtless errors in this specification. Please let me know about them as soon as you find them.