Assignment 1, CSC431, Spring 2010
1 Assignment
You must implement an interpreter for the language that you’ll later be writing a compiler for. Let’s call this language Footle. I don’t know why I picked that name. Your solution will map Footle programs to answers, as described below.
2 NO PARSING
Your assignment does not need to include a parser.
More specifically, your program will map ASTs (defined below) to Footle Values (also defined below).
The ASTs consumed by your interpreter will be the same ones produced by your parser. This means that you’ll be able to check your parser (when you write it as part of a later assignment) by connecting it to this interpreter.
3 Frank Talk About Test Cases
Do you like writing test cases? How about writing test cases before writing the program? I’m betting the answer is no. So why bother?
In a nutshell, the reason is this: if you’re not sure how to write the test case, you’re really really not sure how to write the program.
Test cases don’t need to be complex. The most important test cases you write are the simple ones. Here’s a simple one:
should equal (IntV 3) (eval emptyEnv (IntExp 3))
That is, evaluating the expression 3 should produce the number 3.
You should have a simple test case for every simple clause in your interpreter, and then some complex test cases that test everything all together.
Write the simple test cases first. It’s the simplest test cases that are going to show you how to implement the corresponding language forms; the more complex test cases help to isolate more subtle issues. If you start with a giant hairy test case, you’re going to spend hours debugging simple problems.
Finally, if you do discover a problem with a complex test case, don’t just dive into the debugger; debuggers are the fool’s first resort, and you can spend hours just watching the values bounce around. Instead:
Simplify the test case until you find the smallest test case that reproduces the problem.
Formulate a hypothesis about how the problem occurs. (Use your brain! Think!)
Use a printf or a debugger to test your hypothesis.
Repeat until finished.
The "thinking" part is generally the hardest.
4 Values
The values are the possible results of interpreting an expression. They include the following:
booleans
integers
floating-point numbers
void
strings
closures: A closure contains a list of parameters, a body expression, and an environment. It is the result of evaluating a function definition.
arrays: arrays are not of fixed size. Elements may be added arbitrarily.
objects: objects’ fields are not fixed. They may be added arbitrarily.
5 Language Forms
In the Footle interpreter, there is no distinction between statements and expressions, or between programs and expressions. Everything is just an expression. This simplifies the interpreter.
The words ’interpret’ and ’evaluate’ are used interchangeably in this text.
The words ’field’ and ’slot’ are used interchangeably in this text.
literals: the set of literals includes strings, integers, floating point numbers, and booleans. Evaluating a literal produces the corresponding value.
variable references: your language must include variable references. Interpreting a variable reference produces the value currently associated with that reference’s binding. The languge is lexically scoped, meaning that the binding for any reference is the nearest lexically enclosing declaration with that name.
if: this expression contains three expressions: a ’test’ clause, a ’then’ clause, and an ’else’ clause. To interpret it, first interpret the ’test’ clause. If this result is not a boolean, signal an error. (That is, cause the evaluator to halt.) Otherwise, if it’s true, produce the interpretation of the ’then’ clause. If it’s false, produce the interpretation of the ’else’ clause.
primitive application: This form represents a call to a primitive function. It contains a primitive and a list of argument expressions. Interpret each of the argument expressions, then apply the given primitive to these values.
application: This form represents a call to a function. It contains a ’function’ expression indicating the function to be called, and zero or more ’argument’ expressions. To interpret it: first interpret the ’function’ expression. If it does not produce a closure, signal an error. If it produces a closure, call the closure by evaluating each of the arguments, and then interpreting the body of the closure using the environment contained in the closure, extended with bindings from the closure’s parameters to the interpreted arguments.
sequence: This form represents a sequence of expressions (or ’statements’, if you prefer). To interpret it, evaluate each expression in the sequence, and produce the value of the last one. An empty sequence produces the ’void’ value.
variable binding: this form contains a variable name, a value-expression for the new variable, and a body for which the variable is bound. To interpret it: evaluate the value-expression, and then produce the result of interpreting the body in the current environment extended with a binding from the given name to the value-expression’s value.
Note that this means that the variable binding is only in scope for the single expression in its body. "But wait," I hear you say, "how will that work with code like this?":
var x = 14;
f(x,4);
g(x+14);
The answer is that your parser (when you write it) will parse this as a variable binding for ’x’ with a body that contains a single sequence expression. This sequence expression will contain the two function applications. So the formulation I’m proposing is rich enough to express all of the programs that we want to write.
What about multiple variable declarations? No problem; these can be parsed as nested single bindings. Once again, I’m not asking you to write a parser yet; I’m just answering questions about what will happen when we do come to write the parser.
function bindings: A set of function bindings contains a set of function declarations and a body expression. A function declaration contains the function’s name, a list of parameter names, and a single expression.
Note that unlike a variable binding, the function binding form has to handle multiple functions at once. This is necessary, in order to allow mutual recursion. If we handled multiple function declarations as nested bindings, then the "forward" references from the earlier function to the later one would not be bound correctly.
In order to allow recursion, then, the environment that goes into each of the closures defined by this form must itself contain bindings for all of the new functions. Here’s how to do this:
First, extend the environment with new bindings for the new function names; initially, each name will be bound to some bogus value. Then, using this environment, build a closure for each of the functions. Finally, use these function values to replace the bogus values in the bindings. Note that this creates cycles in our environment: the environment contains bindings for closures that contain environments that contain bindings for that same function, etc. etc.
return: A return contains an expression.
Evaluating return is really (if you’ll pardon me saying it) quite horrible. You will be startled at how it messes up your interpreter. If you’re not startled, then you probably didn’t implement it correctly.
The problem with return is that it needs to discard context. That is, it needs to escape from whatever computation is occurring, be it an if or a while or an arithmetic expression or whatever, back out to the entry point of this function call. Since the interpreter recurs on subexpressions, this means that you’re going to have to escape from an arbitrary number of recursive calls in the interpreter when you encounter a return.
There are two main ways to implement this. The first one, that I recommend, is to use some kind of non-local escape mechanism that your implementing language has. For instance, exceptions work well for this.
The second one, that I don’t recommend, is to stop using the language’s stack and roll your own. In other words, rather than making recursive calls to the interpreter to evaluate subexpressions, you’ll explicitly push something onto your own stack and then just loop back to the beginning of your interpret procedure. Since the stack is explicitly represented, you can discard a whole bunch of frames (out to the function call boundary) when you encounter a ’return’. The down side of this design is that you have to manage your own stack explicitly, and this brings a whole new set of things that can go wrong.
Also, be careful of ’return’ within ’return’; it should be an error to use a ’return’ expression within a ’return’ expression.
Calling ’return’ outside of all function bodies should halt the evaluation of the program and return the corresponding value.
A program without a top-level return should return the value that is the result of interpreting it.
Finally, function bodies whose evaluation finishes without ever evaluating a ’return’ should simply produce the ’void’ value.
variable mutation: A variable mutation contains a variable name, and a new value expression. To interpret it, first find the binding corresponding to the variable name in the environment. If it’s missing, signal an error. Next, evaluate the new-value expression, and put the resulting value in place of the old binding’s value. This new value should also be the result of the interpretation.
while loop: A while loop contains a test expression and a body expression. To interpret it: first interpret the test expression. If the result is not a boolean, signal an error. Otherwise, if the result is false, the result is ’void’. If the result is ’true’, interpret the body expression, then evaluate the whole ’while’ expression again.
field lookup: A field lookup expression contains an ’object’ expression and a name. To interpret it, first evaluate the ’object’ expression. If this value is not an object, signal an error. Otherwise, look at the slots of the expression, for the named field. If the field is absent, signal an error. Otherwise, the result is the value associated with the named field.
field mutation: A field mutation expression contains an ’object’ expression, a name, and a new-value expression. To interpret it, first evaluate the ’object’ expression. If this value is not an object, signal an error. Otherwise, look at the slots of the expression, for the named field. If the field is absent, add a new slot with the given name . Finally, mutate the binding to contain the result of evaluating the new-value expression. The result is the result of the new-value expression.
method call: A method call expression contains an ’object’ expression, a name, and zero or more argument expressions. To interpret it, first evaluate the ’object’ expression. If this value is not an object, signal an error. Otherwise, look at the slots of the expression, for the named field. If the field is absent, signal an error. If the value in the field is not a closure, signal an error. Otherwise, call the closure with the results of interpreting the arguments much as described in ’application’, above, except that there is one additional binding: the identifier ’this’ is bound to the result of evaluating the ’object’ expression.
new: A new expression contains a ’function’ expression, and zero or more argument expressions. To interpret it, first interpret the ’function’ expression. If the result is not a closure, signal an error. Otherwise, create a new object with one slot, binding the name ’constructor’ to the closure. Then, call the closure as described in ’application’, above, except that there is one additional binding: the identifier ’this’ is bound to the newly created object.
That’s it! Well, unless we decide to add more things in later assignments.
5.1 A note on mutation of environments
Mutating environments: don’t do it. Your interpreter will wind up buggy. More specifically, make sure that when you extend an environment with new bindings, that references to the old environment are still unchanged, containing all the same bindings. The easiest way to guarantee this is never to mutate a variable that refers to the environment. (Note that mutating the bindings within an environment is not only okay, but in fact required in order to implement variable mutation. I’m just talking about mutating the environment itself.)
5.2 Where is mutation required?
Mutation is required in several places: the value associated with each binding must be mutable, the value associated with each slot must be mutable, and the set of slots associated with an object must be mutable (so you can add fields on the fly).
6 Primitive Functions
You’re required to implement a number of primitive functions. These primitive functions should appear in a base environment used by all calls to the interpreter.
+, -, *, /, <, >, <=, >= : binary numeric operations. Calling these with integer values should perform the corresponding "integer" operation, calling them with floating-point argument should perform the corresponding floating-point operation, and calling them with one of each should promote the integer to a floating-point number. Calling these with non-numeric arguments should signal an error.
sqrt : take the square root of a number. Always returns a floating-point number. When called with an integer, converts the integer to a floating-point number before taking the square root.
and,or,not : boolean operators. Calling these with non-boolean arguments should signal an error. They need not be "short-circuiting"; that is, your evaluator should always evaluate both arguments to ’and’ or ’or’, even when the first one determines the result.
stringLength : the string length operator. Calling it with a string produces an integer corresponding to the length of the string. Signals an error otherwise.
substring : consumes a string, a start position ’s’, and an end position ’e’. This operator returns the substring starting at position ’s’ and ending with position ’e-1’. If the first argument is not a string, or the other two are not integers, signals an error.
stringAppend : consumes two strings and returns the new string containing the characters of the first followed by the characters of the second.
stringEqual? : consumes two strings, and returns true if they contain the same sequence of characters.
stringLessThan? : consumes two strings, and returns true if the first strictly precedes the second, using a dictionary ordering. Signals an error if either argument is not a string.
== : the comparison operator. This operator should return true for two integers that are equal, two floating point numbers that are equal, two booleans that are equal, two voids, or two strings containing the same characters, or two closures containing the same arguments, body, and environment, or two arrays containing the same values, or two objects that contain the same fields. For all other pairs of values, this operator should return false. It should never signal an error.
instanceof : this operator consumes an object and a value, and returns true when the object has a ’constructor’ slot and that slot’s value is the second argument.
integer? : this operator returns true when called with an integer, and false otherwise.
boolean? : this operator returns true when called with a boolean, and false otherwise.
floating-point? : this operator returns true when called with a floating-point number, false otherwise.
void? : this operator returns true when called with the void value, false otherwise.
string? : this operator returns true when called with a string, false otherwise.
closure? : this operator returns true when called with a closure, false otherwise.
array? : this operator returns true when called with an array, false otherwise
object? : this operator returns true when called with an object, false otherwise.
newArray : this operator accepts no arguments, and returns a new array of length zero.
arrayRef:
arraySet:
arrayMax:
print : this operator consumes any value and prints it out. I’m not going to nail this format down precisely at this point, but if I were you, I’d come up with an output format such that no two different values have the same printed representation.
readLine : this operator should read a single newline-terminated line from the standard input, and returns a string containing these characters, including the final newline.
7 Incomplete Specification
I’m sure there are ambiguities, missing primitives, and probably outright errors in this specification. Post to the newsgroup when you discover them.
8 Hints
For Heaven’s sake, develop this project incrementally. For instance, you can develop an interpreter that only handles constants. Then, you can add primitives, and the application of primitives. Don’t try to swallow the elephant in one bite.
9 Sample Code
This assignment comes with an archive containing sample code. This archive includes a number of things that will be important.
AST.fs: the definition of the AST datatype. Do not change this file.
Values.fs: the definition of Footle values that the interpreter should output. Do not change this file.
xunit.dll: a library containing the xunit testing framework.
fstest.fll: a library containing the F# extension to the xunit testing framework.
xunit.console.exe: the executable that runs all the tests in your project.
xunit.utility.dll: a library used by the test-case runner.
sample-interp.fs: a simple interpreter for a few arithmetic expressions.
makefile: a makefile that will build and run the tests in the sample interpreter.
sample.exe: the compiled sample interpreter.
Honestly, most of these things can live somewhere else, but the simplest way to deal with them all is just to have them in the same directory. To compile the project, you’ll need to edit the makefile to point to your system’s fsc.exe file. To build sample.exe, run
make
To run the tests, run
make sampleruntests
10 Requirements
Develop the ’interp’ function, that accepts an env and an exp and returns an fVal.
11 Handin
Use the directory "Assignment1" in your subdirectory of the class subversion repository to hand in your work. for this assignment. See the course web page for more information on the subversion repository and how to use it.
Your submission should include the following files:
README : a readme file, containing a few lines indicating how you solved the project, whether it’s working, etc.
TEAM : a file indicating who the members of your team are, as specified below.
AST.fs : unchanged, as provided
Values.fs : unchanged, as provided (per google group mail)
makefile : a makefile whose default target builds the project
... other source files as required.
Contents of the TEAM file: one line for each team member with that member’s login id, a comma, and then his or her full name. So, if Benjamin Franklin were working with George Washington, this file might contain:
bfrankli, Benjamin Franklin |
gwashing, George Washington |
The order of the lines is not important. You must supply this file even if you’re working alone.