Assignment 2, CSC431, Winter 2010
For the next assignment, you’ll need to build a parser for your language, and to connect it to your interpreter.
Here’s the concrete syntax for the Footle language:
1 Parser
A program consists of a sequence of ’stmt’s:
| ‹stmt› | ::= | ‹expr› ; |
|
| | | ‹id› = ‹expr› ; |
|
| | | ‹expr› . ‹id› = ‹expr› ; |
|
| | | var ‹id› = ‹expr› ; |
|
| | | return ‹expr› ; |
|
| | | if ( ‹expr› ) { ‹stmt›* } |
|
| | | if ( ‹expr› ) { ‹stmt›* } else { ‹stmt›* } |
|
| | | while ( ‹expr› ) { ‹stmt›* } |
|
| | | function ‹id› ( ‹paramlist› ) { ‹stmt›* } |
These should map in an obvious way onto the language forms whose meanings are described as part of assignment 1.
Next, ’expr’ is defined like this:
| ‹expr› | ::= | ‹int› |
|
| | | ‹float› |
|
| | | true |
|
| | | false |
|
| | | ‹id› |
|
| | | ‹string› |
|
| | | ( ‹expr› ) |
|
| | | ‹id› ( ‹arglist› ) |
|
| | | ( ‹expr› ) ( ‹arglist› ) |
|
| | | ‹expr› ‹binop› ‹expr› |
|
| | | ! ‹expr› |
|
| | | ‹expr› . ‹id› |
|
| | | ‹expr› . ‹id› ( ‹arglist› ) |
|
| | | new ‹id› ( ‹arglist› ) |
Again, these should map in an obvious way onto the language forms whose meanings are described as part of assignment 1. Note that the "function" position of an application must either be a name or be enclosed by parentheses.
Here are the ’binop’s:
| ‹binop› | ::= | + | - | * | / | < | <= | > | >= | == | && | || |
Identifiers are defined as a single alphabetic character followed by zero or more alphabetic, numeric, underscore (_), and question mark (?) characters.
Strings start and end with a double-quote ("). Within a string, the sequence backslash-double-quote (\") should be used to represent a double-quote character in the string, and the sequence backslash-n should be used to represent a newline character.
Integers consist of one or more digits.
Floating point numbers consist of one or more digits, containing exactly one decimal point (.).
Paramlists are comma-separated lists of identifiers (possibly empty), and arglists are comma-separated lists of expressions (again, possibly empty).
All tokens may be separated by whitespace.
2 Greedy Lexing
As written, there is a great deal of ambiguity in this specification. For instance, given the sequence of characters "abcdef", should this be parsed as a single identifier with the name "abcdef" or as two identifiers with the names "abc" and "def"? I hope you prefer the former. In general, the scanner should be "greedy", in the sense that it should gather as many characters as possible into the next token. Fortunately, you will find that most regular expression systems are greedy by default. This can have somewhat surprising results; for instance, the string "abc342.241" will be scanned as the identifier "abc342" followed by the floating point number ".241"... but as the parser will reject this, I don’t see this as a problem.
3 Precedence
Certain boolean and unary operators have higher precedence than others. In particular, they should be grouped like this:
Level 1 (highest precedence) : "."
Level 2 : "!"
Level 3 : "*"
Level 4 : "+", "-"
Level 5 : "/"
Level 6 : ">", ">=", "<", "<="
Level 7 : "&&", "||"
Level 8 (lowest precedence) : "=="
Within these levels, the operators should be left-associative.
Here’s an example. The expression "3 + 4 / ! c . abc * 6 == 5 + 6 + 7 && true || false" should be parsed into the same AST as the expression "((3 + 4) / ((! (c . abc)) * 6)) == ((((5 + 6) + 7) && true) || false)".
4 Additional Syntax Constraints
It is an error to use the name of a primitive as a variable name. Here are the primitive names: "stringLength", "subString", "stringEqual?", "stringAppend" "stringLessThan?", "instanceof", "int?", "bool?", "float?", "void?", "string?", "closure?", "plain?", "print", and "readLine". All of these are applied using the standard application syntax. For instance, "string?("abc")" or "instanceof(new F(),F)".
It is an error to declare a variable or parameter named "this", or to mutate the "this" binding (brrr... too scary for me to think about).
It is an error to use the same name twice in a single function’s parameter list.
It is an error to mutate a primitive.
5 Hints and Explanation
You are not required to structure your grammar precisely following this one. In particular, you need to gather statements together into the body of a variable declaration, and you need to gather adjacent function declarations together into a single mutually recursive block. One way to do this is to restructure the grammar given here to make the rules follow more closely the structure of the language.
6 One very simple example
Here’s one simple program:
function odd(x) { |
if (x == 0) { |
return false; |
} else { |
return even(x - 1); |
} |
} |
function even(x) { |
if (x == 0) { |
return true; |
} else { |
return odd(x - 1); |
} |
} |
|
print(even(14)); |
7 Output
Your parser must generate outputs in XML syntax, using the Relax NG specification that appeared on the newsgroup. Here it is, again:
start = prog |
|
prog = element Program{ast*} |
|
ast = (litstr | litint | litfloat | litbool |
| varref | if | application | sequence |
| varbind | funbind | return | setvar |
| while | fieldref | fieldset | fieldcall |
| newexp ) |
|
litstr = element LitStr {text} |
litint = element LitInt {xsd:integer} |
litfloat = element LitFloat {xsd:double} |
litbool = element LitBool { ("true" | "false") } |
varref = element Varref {text} |
if = element If { ast, ast, ast} |
application = element Application {ast, ast*} |
sequence = element Sequence {ast*} |
varbind = element VarBind { element VarName{text}, ast, ast} |
funbind = element FunBind { funbinding*, ast} |
return = element Return {ast} |
setvar = element SetVar {element VarSetName{text}, ast} |
while = element While {ast, ast} |
fieldref = element FieldRef{ast, element FieldRefName {text}} |
fieldset = element FieldSet{ast, element FieldSetName {text}, ast} |
fieldcall = element FieldCall{ast, element FieldCalledName {text}, ast* } |
newexp = element NewExp{ast, ast*} |
|
funbinding = element FunBinding { element Name { text }, element Param {text} *, ast} |
Once again, I’m hoping that the correspondence between this XML language and the language forms specified in assignment 1 is self-evident. Let me know if you see ambiguities, or if fixes are required.
8 Primitives
For this assignment, we’re going to nail down the output of the "print" primitive a bit better. Specifically:
Void: print the string <void>
Strings: print the characters of the string, nothing else (ignore the slots)
Integers: print the numeric characters of the integer
Floats: do something reasonable :)
Booleans: print the string <true> or the string <false>.
Closures: Print the string <closure>.
Plain Objects: Print the string <plain-object>
Primitives: Print something like this: <prim:+>. Note that this can’t arise in parsed programs.
In addition, you should add the stringAppend primitive, that consumes two strings and produces the string containing the characters of the first followed by the characters of the second.
9 Surface Interface / Handin
Here’s the command-line specification for your parser:
parse
Your parser’s executable should be called "parse", and it should accept no arguments. It should read a footle program on stdin, and write XML on stdout.
Here’s the command-line specification of the interpreter:
interpret
Again, it should accept no arguments, read XML from stdin, and evaluate the program. It should produce no output aside from that produced by calls to print.
In order to hand in your code, create a branch called Assignment2.
10 Errors and Corrections
There are doubtless errors in this specification. Please let me know about them as soon as you find them.