Assignment 2, CSC431, Spring 2010
For the next assignment, you’ll need to build a parser for your language.
Here’s the concrete syntax for the Footle language:
1 Parser
A program consists of a sequence of ’stmt’s:
| ‹stmt› | ::= | ‹expr› ; |
|
| | | ‹id› = ‹expr› ; |
|
| | | ‹expr› . ‹id› = ‹expr› ; |
|
| | | var ‹id› = ‹expr› ; |
|
| | | return ‹expr› ; |
|
| | | if ( ‹expr› ) { ‹stmt›* } |
|
| | | if ( ‹expr› ) { ‹stmt›* } else { ‹stmt›* } |
|
| | | while ( ‹expr› ) { ‹stmt›* } |
|
| | | function ‹id› ( ‹paramlist› ) { ‹stmt›* } |
These should map in an obvious way onto the language forms whose meanings are described as part of assignment 1.
Next, ’expr’ is defined like this:
| ‹expr› | ::= | ‹int› |
|
| | | ‹float› |
|
| | | true |
|
| | | false |
|
| | | ‹id› |
|
| | | ‹string› |
|
| | | ( ‹expr› ) |
|
| | | ‹id› ( ‹arglist› ) |
|
| | | ( ‹expr› ) ( ‹arglist› ) |
|
| | | ‹expr› ‹binop› ‹expr› |
|
| | | ! ‹expr› |
|
| | | ‹expr› . ‹id› |
|
| | | ‹expr› . ‹id› ( ‹arglist› ) |
|
| | | new ‹id› ( ‹arglist› ) |
Again, these should map in an obvious way onto the language forms whose meanings are described as part of assignment 1. Note that the "function" position of an application must either be a name or be enclosed by parentheses.
Here are the ’binop’s:
| ‹binop› | ::= | + | - | * | / | < | <= | > | >= | == | && | || |
Identifiers are defined as a single alphabetic character followed by zero or more alphabetic, numeric, underscore (_), and question mark (?) characters. No keyword or primitive name may be used as an id.
Strings start and end with a double-quote ("). Within a string, the sequence backslash-double-quote (\") should be used to represent a double-quote character in the string, and the sequence backslash-n should be used to represent a newline character.
Integers consist of one or more digits.
Floating point numbers consist of at least one digit followed by a decimal point, followed by zero or more digits.
Note that you’re not required to handle unary minuses in either integers or floating point numbers.
Paramlists are comma-separated lists of identifiers (possibly empty), and arglists are comma-separated lists of expressions (again, possibly empty).
All tokens may be separated by whitespace.
2 Greedy Lexing
As written, there is a great deal of ambiguity in this specification. For instance, given the sequence of characters "abcdef", should this be parsed as a single identifier with the name "abcdef" or as two identifiers with the names "abc" and "def"? I hope you prefer the former. In general, the scanner should be "greedy", in the sense that it should gather as many characters as possible into the next token. Fortunately, you will find that most regular expression systems are greedy by default. This can have somewhat surprising results; for instance, the string "abc342.241" will be scanned as the identifier "abc342" followed by the floating point number ".241"... but as the parser will reject this, I don’t see this as a problem.
3 Precedence
Certain boolean and unary operators have higher precedence than others. In particular, they should be grouped like this:
Level 1 (highest precedence) : "."
Level 2 : "!"
Level 3 : "*", "/"
Level 4 : "+", "-"
Level 5 : ">", ">=", "<", "<="
Level 6 : "&&", "||"
Level 7 (lowest precedence) : "=="
Within these levels, the operators should be left-associative.
Here’s an example. The expression "3 + 4 / ! c . abc * 6 == 5 + 6 + 7 && true || false" should be parsed into the same AST as the expression "(3 + ((4 / (! (c . abc))) * 6)) == ((((5 + 6) + 7) && true) || false)".
4 Additional Syntax Constraints
It is an error to use the name of a primitive as a variable name. Here are the primitive names: "stringLength", "subString", "stringEqual?", "stringAppend" "stringLessThan?", "instanceof", "int?", "bool?", "float?", "void?", "string?", "closure?", "plain?", "print", and "readLine". All of these are applied using the standard application syntax. For instance, "string?("abc")" or "instanceof(new F(),F)".
It is an error to declare a variable or parameter named "this", or to mutate the "this" binding (brrr... too scary for me to think about).
It is an error to use the same name twice in a single function’s parameter list.
It is an error to mutate a primitive.
5 Hints and Explanation
You are not required to structure your grammar precisely following this one. In particular, you need to gather statements together into the body of a variable declaration, and you need to gather adjacent function declarations together into a single mutually recursive block. One way to do this is to restructure the grammar given here to make the rules follow more closely the structure of the language.
6 One very simple example
Here’s one simple program:
function odd(x) { |
if (x == 0) { |
return false; |
} else { |
return even(x - 1); |
} |
} |
function even(x) { |
if (x == 0) { |
return true; |
} else { |
return odd(x - 1); |
} |
} |
|
print(even(14)); |
7 Output
Your parser must produce ’exp’s, using the same definition that was a part of assignment 1.
8 Primitives
For this assignment, we’re going to nail down the output of the "print" primitive a bit better. Specifically:
Void: print the string <void>
Strings: print the characters of the string, nothing else
Integers: print the numeric characters of the integer
Floats: do something reasonable :)
Booleans: print the string <true> or the string <false>.
Closures: Print the string <closure>.
Arrays: Print the string <Array>.
Plain Objects: Print the string <plain-object>
9 Surface Interface / Handin
Your makefile should produce a parser.dll file defining a function called Parser.parse, that maps strings to exps.
In order to hand in your code, create a branch called Assignment2.
10 Errors and Corrections
There are doubtless errors in this specification. Please let me know about them as soon as you find them.