This is written for instructors, and presents things in the order in which it makes sense for an instructor to understand them. To actually teach them, you’ll want to scaffold and spiral-staircase and whatever architectural metaphor you like to introduce the easy things first. Specifically, you’ll certainly want to go through the complete design recipe many times for simple data definitions before introducing more complex data definitions.

Nearly everything here is adapted from How To Design Programs, by Felleisen, Flatt, Findler, and Krishnamurthi. This material adapts it to Python, taking into account Python’s issues.

The Design Recipe

The design recipe is a five- or six-step sequence of steps for designing functions, a.k.a. programs.

  • Data Definition : A data definition describes a set of values in a standardized yet flexible way. It takes the form of a comment, supported by class definitions. This is step one of the design recipe.

  • The second step of the design recipe has three sub-parts:

    • Signature : A signature describes each of the function’s inputs and its output, using primitive names (e.g. str, float) and names defined by data definitions. This is part of step two of the design recipe. details
    • Purpose Statement : A purpose statement is a one-line description of what the function returns, and any effects it has. This is part of step two of the design recipe. details
    • [[Header]] : The first line of the function. This is step three of the design recipe.
  • Template : Step four of the design recipe. The template maps the data definition onto a concrete piece of program code.

  • Complete the function : Okay, now you need to finish the program. This is step five of the design recipe

  • Test, redesign, refactor : You are not finished at the end of step five. You must now go back and make the code better.

Data Definitions

A data definition is a specification of a set of values, using a semi-formal system. Here are a bunch of examples:

1
Price : TypeAlias = int # represents a number of US cents
1
ID : TypeAlias = str # 10 alphanumeric characters
1
2
3
4
@dataclass
class Location:
  x : float
  y : float
1
2
3
4
5
6
# represents a household in the census data
@dataclass
class Household:
  loc : Location
  occupants : List[Person]
  income : int
1
2
3
4
5
# A ClassStatus is one of:
# - "Enrolled",
# - "Waitlisted", or
# - "Not Enrolled"
ClassStatus : TypeAlias = str # ugh, python doesn't have singleton types
1
Weight : TypeAlias = Union[float, None] # some weights are unknown
1
2
3
4
5
6
7
FloatBinTree : TypeAlias = Union["Node", None]

@dataclass
class Node:
  val : float
  l : FloatBinTree
  r : FloatBinTree

That set of examples covers just about the whole class, actually…

Data Definitions are the most important but also the most challenging part of the whole design recipe.

Why are they important? Data definitions specify the values that the program will be handling. By defining values, we immediately start thinking about functions as mappings from pieces of data to other pieces of data, rather than as sequences-of-instructons. Data definitions lead to examples of data, and then to test cases. Data definitions steer students toward specifications, rather than implementations.

Why are they challenging? Well, for one thing, they’re the first step in the design process, so it may be hard to know what definition to choose. Choosing a definition well depends on a good understanding of what functions you’re going to implement, and how the choice of data definition will affect that.

Secondly, they have an interesting relationship to the rest of the design recipe. Specifically: you usually don’t need to produce a new data definition, in order to write a new function! Generally we define a piece of data, then write many many functions on that data definition. This presents mundane pedagogic difficulties; the data definition is step one in the design recipe—but for most functions, we choose from existing data definitions, rather than designing new ones, and don’t need to write anything down as part of step one.

Next, as of 2025, Data definitions generally consist entirely of PEP 484 type language, which is cheerfully ignored by the standard “run python” way of running programs; in order to convince Python to actually check types, one must generally use an auxiliary type checker, such as mypy.

There are several different classes of data definitions. Generally, we follow a “spiral staircase” approach, revisiting step one repeatedly as we encounter more and more sophisticated forms of data.

Existing / Built-In

We take certain data definitions as existing, including:

  • int
  • float
  • str
  • bool

(These denote the expected sets of values.)

Finally, we use the data definition Any to denote the set of all python values.

Simple Data

The first kind of nontrivial data definition, like the first two above, is just giving a new name to an existing kind of data.

From the list above:

1
Price : TypeAlias = int # represents a number of US cents

Strictly speaking, there’s no need to define a new kind of data here; whenever you use the term Price, you could simply have used the built-in float.

You can also add side conditions to data definitions, though; consider the data definition

1
ID : TypeAlias = str # 10 alphanumeric characters

You can imagine all kinds of goofy side conditions. Unsurprisingly, no widely used Python type systems can capture this kind of side condition.

Choice (Sums)

Often, data definitions can include choice. That is: something is either this, or that.

Here are the examples from above:

1
2
3
4
5
# A ClassStatus is one of:
# - "Enrolled",
# - "Waitlisted", or
# - "Not Enrolled"
ClassStatus : TypeAlias = str # ugh, python doesn't have singleton types
1
Weight : TypeAlias = Union[float, None] # some weights are unknown

The first of these is what, in Java, would probably be an Enum type. In a languge like Typed Racket or TypeScript, with singleton types (that is, there is a type "Enrolled" that contains only the value "Enrolled"), these can easily be represented using a Union type. Sadly, Python does not have these, and in fact adding them would be really hard, because of the weird hack where we encode class names as strings, ugh.

The second of these is a place where we would typically deploy the “magic hammer” of null/None. It’s important to note that this totally fails if there are two non-float values to be represented (in this case, e.g.: “unknown”, and “still growing”), or if the non-float is an error condition with information attached to it. You can think about how you might represent each of these in Java, but as you do, I think you’ll see that you’re “making do” with a variety of incomplete tools.

Compound / Objects (Products)

No surprise here: an object is a value that contains a sequence of named values.

… more here!

Mixed Data (Sums / Unions)

Mixed data can be one thing or another. Mixed data is generally represented using a “sum” type. In Python, these are expressed in PEP 484 syntax using the Union type constructor. So, for instance, Union[int,str] is used to indicate something that is either an integer or a string.

Mixed Compound Data

When we combine both sums and products, we suddenly get the ability to write inductively-defined (that is, self-referential) types that have finite inhabitants.

Perhaps most commonly, linked lists and binary trees both require a Union type and a piece of compound data (a class).

For an example, see the “linked list” example, below.

Note that Python mercifully does not recapitulate Tony Hoare’s “billion dollar mistake”; that is, providing the value None as an argument to a function that expects something of type Node does not succeed. If you want a type that allows None, you must explicitly include it with a Union (or by using Optional).

Step 2: Purpose Statement, Header

Purpose Statements

The purpose statement is easy to describe, and hard (for students) to do. You want a one-line description of what the function does.

Tip for students: if you can’t easily describe what the function does in one line, maybe you need to come up with data definitions that allow you to describe it more clearly, or maybe it shouldn’t be a function.

From a software engineering standpoint, the purpose statement is a vital “deep breath” that precedes writing any function. It gives you a chance to think about the role of this function in the program, and whether it actually makes sense. It’s a part of the antidote to “fingers first” programming, and it’s a part of the “reflection” that’s becoming more and more widely accepted as part of programming education.

Perhaps most importantly: it’s something that students should be doing for the rest of their lives. It’s applicable to every language that they’ll ever use, and it’s always time well spent.

Ed. Note: examples required!

Step 3: Test Cases

Step 4: Templates

Every data definition has a template. The template is a piece of code, a “skeleton”, if you will, that unfolds the input data, and includes recursive calls on elements that have self-referential types.

It is incredibly helpful in helping students to define function on linked lists, binary trees, and other inductively defined datatypes.

Templates are typically applied to data definitions that consist of a “sum of products”. The template for a data definition contains a match for a sum, and match clauses for each piece of potentially-compound data in that sum.

Here’s a simple nonsense data definition:

1
2
3
4
5
6
Zib : TypeAlias = Union['Blotz',float]

@dataclass
class Blotz:
  nurg : Str
  durg : Zib

… and here’s the template for a function that accepts a Zib:

1
2
3
4
5
6
7
# the template for a function that accepts a Zib
def zib_fun(z : Zib) -> Any:
  match z:
    case Blotz(nurg,durg):
      return ... nurg ... zib_fun(durg) ...
    case float(f):
      return ... f ...

Note that there’s one case for each type in the union, and that within each case, a compound piece of data is pattern-matched, and any self-referential elements correspond to recursive calls.

Step 5 : Fill in the rest of the function

Example Code

Copyright (C) 2017—2025 John Clements (clements@racket-lang.org)