on the robust difficulty of rubik’s cubes

:: speedsolving, rubik's cube

By: John Clements

I’ve been amused (not in an evil genius way) by the once-and-future popularity of the Rubik’s cube. Had one as a kid, memorized its method, got a couple of sub–2-minute times, put it down.

Fast forward 35 years; now my son is mildly amused by the Rubik’s cube (amused enough to let his old man persuade him to make a Rubik’s Cube Halloween costume, anyway), and I decide to learn a new method (currently Roux, but who knows if it’ll last).

Along the way, though, I start wondering: how do they make sure that “scrambled” cubes at speedsolving competitions are about equally hard to solve? What if Joe Plotnik just happens to get one that’s easy?

It turns out… that’s very unlikely, due to a kind of a cool property that Rubik’s cubes have. Specifically, given a random (reachable) position of the cube, there’s about a 94% likelihood that it requires either 18 or 19 moves to solve.

These numbers come from Cube20, a team that used lots of compute power back in 2010 to determine the distribution of “moves required.” One thing they didn’t do (erm, on their front page, anyway) was to show the CDF of these numbers. Here it is:

The cool thing about this is the incredibly narrow band of likelihood; if you simply generate a position at random from the set of all possible positions, there’s a rounds-to-zero-percent chance of it requiring 15 moves or less, 3% of getting 16 moves or less, and a rounds-to–100% chance of it requiring 19 moves or less. There’s more than a 93% chance that it will require either 18 or 19 moves.

So… cubing competitions are almost certainly fair. Neat!

Cal Poly Quarter Numbering

:: Cal Poly, Department Data

By: John Clements

People often ask me what the heck is going on with Cal Poly’s nutty quarter numbering system. I’m going to try to summarize it.

Before I begin, I think I’m probably suffering from Stockholm syndrome, here; I’m kind of fond of it, even though it’s non-intuitive.

Actual History

Based on a very brief conversation with Debbie Dudley, it appears that the current system was designed in 2006, during the migration to PeopleSoft. At that time, Cal Poly had to devise a system for numbering quarters both in the future and in the past (that is, the new system also had to represent existing records of past students).

The rest of this is still largely hypothetical; please feel free to write and correct me about anything I’ve gotten wrong; I think that trying to understand how the system arose helps me (and you?) to understand its quirks a bit better.

(Semi) Hypothetical History

Problem: we need a way to represent a particular quarter (e.g.: Fall 2006) as a number. It would be really convenient to be able to sort in a sane way, etc. etc.

Solution: well, let’s use integers, not floats (for what are hopefully obvious reasons). Let’s use the year, then a digit for the quarter. What digits should we use for the quarters? Well, we should probably stay away from zero, because it’s not clear what year a zero would belong to. Okay, let’s use 2, 4, 6, and 8 for the four quarters. This spaces out the four quarters, and allows for the insertion of additional terms, should it become necessary.

So, how should we write Fall 2006? We could write “20068”. But… let’s try to cut that down to four digits.

(Knowing what I now know, I can’t see why you would want to do this unless PeopleSoft has some kind of internal restriction or performance penalty regarding numbers with more digits. Let’s assume this is the case.)

Well, maybe we can encode the century as a single digit. Let’s use “2” for the century from 2000 — 2099. So Fall 2006 becomes 2068.

Great! So how do we represent Spring 1997?

If we’re using century codes, then the one before “2” should be “1”. So in this case, Winter 1997 would be written as 1972.

Apparently, though, we decided that this was confusing, and that we should instead use a “0” for the 1900s, so that Winter 1997 is instead written as 0972.

This scheme is not awful, but it does have some problems if you try to subtract numbers or increment or decrement quarter numbers, because going from a fall quarter to the next winter quarter requires adding four… unless the year is 1999, in which case you have to add 1004. Oh well.

Okay, so let’s summarize. How the heck do we actually read one of these numbers?

Examples of quarter numbers:

  • 2174 : 2000 + 17 = 2017, 4 = Spring. Spring 2017.

  • 2102 : 2000 + 10 = 2010, 2 = Winter. Winter 2010.

  • 976 : 1900 + 97 = 1997, 6 = Summer. Summer 1997.

Parsing a Cal Poly (San Luis Obispo) quarter number:

Actually, I’m going to be precise, and just give you some code.

Yes, in Racket.

#lang typed/racket

;; given a quarter, return its season: "Fall", "Winter", etc.
(define (qtr->season [qtr : Natural]) : Season
  (match (modulo qtr 10)
    [2 "Winter"]
    [4 "Spring"]
    [6 "Summer"]
    [8 "Fall"]))

;; return the year in which a quarter number falls
(define (qtr->year [qtr : Natural]) : Natural
  (define century-code (floor (/ qtr 1000)))
  (define year-code (modulo (floor (/ qtr 10)) 100))
  (define century-offset
    (match century-code
      [0 1900]
      [2 2000]
      [_ (raise-argument-error 'qtr-year
                               "qtr with century code of 0 or 2"
                               0 qtr)]))
  (+ century-offset year-code))

more students in smaller classes

:: Department Data

By: John Clements

Good news, folks! We’re continuing to deliver reasonably sized classes for our students. Specifically, more than 90% of our SCUs are currently delivered in classes of size less than 40. The last time we managed that was back in winter of 2011. On the other hand, if you think that class sizes should be below 30, we’re not doing so well.

Here’s the college as a whole (we’re better):

knowing what’s out there

:: Programming, Teaching

By: John Clements

I’m teaching a class to first-year college students. I just had a quick catch-up session with some of the students that had no prior programming experience, and one of them asked a fantastic question: “How do you know what library functions are available?”

In a classroom setting, teachers can work to prevent this kind of question by ensuring that students have seen all of the functions that they will need, or at least that they’ve seen enough library functions to complete the assignment.

But what about when they’re trying to be creative, and do something that might or might not be possible?

Let’s take a concrete example: a student in a music programming question wants to reverse a sound. How can this be done?

The Many Uses of the AR–15 (updated)

:: Guns

By: John Clements

This just in, folks; evidence continues to mount to suggest that AR–15-style assault rifles are in fact extremely effective at killing large numbers of people. Not yet sure if there are any other uses for this device.

  • July 20, 2012 - Aurora, CO – 12 people dead - Smith & Wesson AR–151
  • December 14, 2012 - Sandy Hook elementary school, Newtown, CT – 20 people dead - Bushmaster AR–152
  • December 2, 2015 - San Bernardino, CA – 14 people dead - Smith & Wesson AR–153
  • June 12, 2016 - Orlando, FL – 49 people dead - Sig Sauer AR–154
  • October 2, 2017 - Las Vegas, NV – 57 people dead - “AR–15-style” assault rifle5

ontologies OF programs

:: Programming Languages, Teaching

By: John Clements

Reading Daniel Dennett’s “From Bacteria to Bach and Back” this morning, I came across an interesting section where he extends the notion of ontology—a “system of things that can be known”—to programs. Specifically, he writes about what kinds of things a GPS program might know about: latitudes, longitudes, etc.

I was struck by the connection to the “data definition” part of the design recipe. Specifically, would it help beginning programmers to think about “the kinds of data that their program ‘knows about’”? This personification of programs can be seen as anti-analytical, but it might help students a lot.

Perhaps I’ll try it out this fall and see how it goes.

Okay, that’s all.

Granite Wo-Mon 2017

:: granitemon

By: John Clements

Whoops! The day is now past. The annual Granite Wo-Mon swim took place on August 6th.

Tricia Sawyer got the ball rolling this year, as she has in the past two or three, by asking when the swim was going to be. Favorable tides suggested either mid-July or early August, if I recall correctly, and we settled on the 6th.

We met on the dock at 6:30, and headed out to Long Island. I think we began the swim at around 7:15.

It was an absolutely gorgeous day, and the water was drop-dead gorgeous. That is, it’s so warm that it’s killing lots of marine life. Nice for swimmers, maybe not so nice for everyone else. I swam without a wet suit for the second year running, and I have to say; it wasn’t even cold. Even jumping into the ocean at 7:15 in the morning. Not cold. I’m guessing it was above 20 celsius. Very warm.

Also, there was a good breeze at 7 AM, and it picked up steadily. The chop grew accordingly, and it took me a full 2 hours to finish, compared to 1:38 and 1:15 (!) in 2016 and 2015 respectively. Even discounting the ongoing decrepitude of advancing age (and a near-total lack of preparation), I’d say it was VERY CHOPPY.

  • John Clements
  • Mary Clews
  • Chris Guinness
  • Amanda Herman
  • Lane Murnik
  • Tricia Sawyer
  • Pat Starkey

Kudos to first-time swimmer Lane Murnik! Hope to see you many more times. Hopefully it’ll be a bit smoother next year.

Amazing Awesome Chase boats and welcoming committee:

  • Sara Ardrey
  • Guy Ardrey
  • Henry Becton
  • Jeannie Becton
  • Alice Clements
  • Anika Clements
  • Xavier Clements
  • Henry Clews
  • Hal Clews
  • Martha Faye (sp?)
  • Sean Guinness
  • Brennan Starkey
  • Eliza Wilmerding
  • Oliver (Wilmerding-Herman?)
  • His sister?
  • Mike Murnik

hash table timings

:: Racket

By: John Clements

Are immutable hash tables constant-time insert and lookup? Actually, it looks like the answer is “no, not really”:

This picture shows insertion time per insertion, so we’d expect to see something like a flat line. Of course, there’s a lot of hand-waving involving “as n approaches infinity” and what that actually means on a machine with limited memory, but it seems clear from this picture that if you’re creating a hash table with about ten million elements, you should probably consider using a mutable hash table; the time required to create that table will be about 40 seconds for an immutable hash table vs. about 7 seconds for a mutable one.

This graph strongly suggests that the added time is entirely GC time, assuming that GC is not running in parallel, which I believe it’s not.

Drawing this on a linear-x graph suggests that the times for the immutable hash table insertion are well below linear; I’m thinking they’re somewhere between log n and (log n)^2.

What about lookup? Well, I ran two experiments; one on keys that are in the table, and one on keys that are not.

In this experiment, there are 3 million lookups on each tree size. The numbers for both of them are interesting in that they move around quite a lot; the error bars aren’t that big, and you can see (especially in the failed lookups) that there are some definite patterns. First of all, the immutable tree lookups pretty clearly display an upward trend, suggesting that lookup is actually log(n), albeit with a fairly small constant (about 20% per 10x). The lookups on the mutable hash tables also seem to be growing, though in their case there seems to be a sawtooth pattern, presumably occurring when the size of the table passes a particular threshold.

In the case of lookups, though, unlike insertion, there’s no clear mutable-vs-immutable winner, at least for the table sizes that I used. Lookups are generally around 150 microseconds, compared to about 600 microseconds to insert into a mutable hash table.

loops hurt so bad

:: Programming Languages, Python

By: John Clements

Why, professor Clements, why? Why are you taking all of our loops away?

I feel like the Grinch, sometimes, taking all of the pretty little candy loops away from the wide-eyed first-year students.

But the fact is… that I really don’t like loops.

  • Loops in every popular language simply execute a block of code repeatedly. This means that your code has to perform mutation in order to allow a value to escape from the code. This requires before-and-after reasoning, and introduces a huge new source of bugs into your program. By contrast, there’s almost no chance to tangle the action of the caller and the callee in a recursive call.

  • Corollary of this: you can easily update loop variables in the wrong order: e.g.: i += 1, sum += arr[i]. Oops, bug.

  • Loops can’t easily be debugged; in a functional style, each loop iteration is a recursive call for which the student can write an explicit test. Not possible using loops.

  • In functional style, you can’t forget to update a variable; each recursive call must contain values for all loop variables, or you get a sensible (and generally statically detectable) error.

  • You can’t write a loop in a non-tail-calling fashion. You have to do all of the work on the way down. Traversing a tree is basically impossible (to do it, you’re going to wind up building a model of the stack, turing tar pit etc.).

  • Finally, writing in a functional style is about 80% of the way to proving your code correct; you have to choose a name for the action of the recursive call, and you can usually state an invariant on the relation between the inputs and the outputs without difficulty.

Nothing here is new; it’s just a way of blowing off steam while grading my students’ terrible, terrible code. Sigh.

(EDIT: I should add… none of this applies to list comprehension forms such as for/list; those are dandy. Not sure? Run down the list of bullet points and check for yourself.)