On a scale from one to ten, …

Today we got a new intern (lets call him Tom), and while my colleague (that would be Jane) was trying to find a suitable task for him to get started with, I overheard a conversation similar to the following:

Jane: On a scale from one to ten, how proficient would you say you are in JavaScript?

Tom: Uhm, not sure. I’m not even sure what the numbers mean in this context.

Jane: Well, it goes from “I never heard of it” to “I know everything there is to know”.

Tom: I really can’t say… Maybe a five? …

And I really feel for Tom. Not because he is only “five good” in JavaScript, but because I consider this whole “How would you rate your $foo on a scale from $bar to $baz” thing to be nonsense. Back in school, my physics teacher would always tell me how I must never forget the units in my calculations, because otherwise I wouldn’t be able to interpret the result. But as it turns out, there is no unit for JavaScript knowledge, nor for any other programming language, because skill is very diffcult to quantify.

Let’s take another look at why this or any other arbitrary scale is no good for skill (self-)assessment. We could assume that Tom was brought up in Spain, where the school system rates you from 1 (worst) to 10 (best). While in theory this could be a linear scale, in practice it is far from it: 1-4 means that you’ve failed the class entirely and have to repeat it. The remainder is mapped as follows:

  • 5 ≡ sufficient
  • 6 ≡ good
  • 7-8 ≡ notable
  • 9-10 ≡ outstanding

As a result, Tom might be inclined to think that being “five good” in JavaScript signifies that you have only very basic knowledge of the language because this is consistent with how he’s been graded his entire school career. Now we could assume that Jane was brought up in Germany where school grades are commonly 1 (best) through 6 (worst), with a 5 and a 6 indicating a failed class. Since this is not really compatible with our one-to-ten-scale, she might simply assume that skill level is distributed linearly. Both of their assumptions are visualized in the figure below, colors ranging from red (poor skill) through yellow (mediocre) to green (skilled).

Note that this interpretation of the scale is consistent with 1 meaning “you’ve never heard of it” and 10 meaning “I know everything there is know”. The distribution in-between is very different though, such that being a “five” JavaScript coder for Jane would be roughly a “seven” for Tom, and so on. As a result, either Jane would think that Tom is massively overestimating himself or Tom would receive assignments that are way outside his own perceived skillset.

So now I’ve critized the one-to-ten-scale. “But which one should I use instead?”, you ask. The answer could be: whichever you like, just keep the scale to yourself.

Using binary search to position someone on a scale of your choice

The thing is, that if you ask a question such as “How good are you on a scale from […]”, then you already have an idea of what a “seven” (or any other number) means for a JavaScript programmer (at least in your own, very subjective scale). As a result, supposing that you are able to come up with a question for every number on your scale, you’ll be able to position any person you encounter on any scale in about log2(10) steps (3.32, for the 1-10 scale), by using binary search starting from the middle of the scale.
For example:

(1 2 3 4 5 6 7 8 9 10) - Pivot: 5. Answer: Correct
(          6 7 8 9 10) - Pivot: 8. Answer: Correct
(                9 10) - Pivot: 9. Answer: Incorrect
                       = Result: An 8

Since a scale is never communicated to the interviewee, no misunder­standings regarding the nature of the scale can arise. An additional benefit of this technique is that you can refine your questionnaire over time, based on your observations, thus gradually increasing the precision of your assessments. It should be noted that these must not be questions that can be memorized but could also be small programming excercises or the like, thus making this a viable technique for programming interviews as well. And it can even be automated!

What do you think about the one-to-ten-scale? Is it useful, and if so, in which situations? Could binary-search-based assessment be next big thing™?