Suppose you want to write an algorithm that, when given a set of data points, will find an appropriate number
of clusters within the data. In other words, you want to find the
k
for input to the
k-means algorithm without having any
a priori knowledge about the data. (Here is my own
failed attempt at finding the k in k-means.)
def test_find_k_for_k_means_given_data_points()
data_points = [1,2,3,9,10,11,20,21,22]
k = find_k_for_k_means(data_points)
assert(k==3, "find_k_for_k_means found the wrong k.")
end
The test above is a reasonable specification for what the algorithm should do. But take it further: can you actually
design the algorithm by writing unit tests and making them pass?
I've previously
expressed my doubt that TDD makes an effective approach to algorithm design.
More recently, I
alluded to a some optimism towards the same idea.
Then, in a comment to the same post,
Dat Chu asked about using unit tests in algorithm design, also referencing
something that
Ben Nadel had said about asserting in code-comments
how the state of the algorithm should look at certain points.
That all led to this post, and me wanting to lay my thoughts out a little further.
In the general case, I agree with Dat that it would be better to have the executable tests/specs.
But, what Ben has described sounds like a stronger version of what Steve McConnell called
the
pseudocode programming process
in Code Complete 2, which can be useful in working your way through an algorithm.
Taking it to the next step, with executable asserts - the "Iterative Approach To Algorithm Design" post
came out of a case similar to the one described at the top. Imagine you're coming up with something
completely new to you (in fact, in our case, we think it is new to anyone), and you know what you want
the results to be, but you're not quite sure how to transform the input to get them.
What
good does it do me to have that test if I don't know how to make it pass?
The unit test is useful for testing the entire unit (algorithm), but not as helpful for
testing the bits in between.
Now, you could potentially break the algorithm into pieces - but if you're working through it for the
first time, it's unlikely you'll see those breaking points up front.
When you do see them, you can write a test if you like. However, if it's not really a complete unit,
then you'll probably end up throwing the test away.
Because of that, and the inability to notice the units until
after you've created them, I like the simple assert statements as opposed to
the tests, at least in this case.
When we
tried solving Sudoku using
TDD during a couple of meetings of the
UH Code Dojo, we introduced a lot of methods I felt were artificially there, just to be able to test them.
We also created an object where one might not have existed had we known a way to solve Sudoku through
code to begin with.
Now, we could easily clean up the interface when we're done, but I don't really feel a compulsion to practice
TDD when working on algorithms like I've outlined above. I will write several tests for them to make sure
they work, but (at least today) I prefer to work through them without the hassle of writing tests for the
subatomic particles that make up the unit.
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Leave a comment
There are no comments for this entry yet.
Leave a comment