Yesterday, I posted this message to my blog

http://giuliozambon.blogspot.com, but I thought that probably some of you would find it interesting. THis is the first time that I try to show an image I stored on ImageShack. Previously, I stored them on one of my websites. The preview looks lousy, but I'll give it a try. Here it goes... OK. It din't work well. Second attempt using my website...

I found three websites that let you play daily KenKen (® Nextoy LLC) /CalcuDoku puzzles:

http://www.kenken.com/,

http://www.calcudoku.org/, and my website

http://zambon.com.au/puzzles/calcudoku/daily/.

While the other two websites include puzzles of different sizes, and calcudoku.org also includes variants, my website only includes 9x9 puzzles. KenKen and CalcuDoku are the same puzzle, but their implementation is done by different people. Each implementation has a different feel and, on average, different levels of difficulty.

I know how my puzzles are developed, but, obviously, I have no idea what algorithms the other developers use. I thought it would be interesting to identify some of the differences from a statistical point of view.

For this purpose, I analysed 10 puzzles taken from each website. I know that 10 is too small a sample to make good statistics, but it was a lot of counting...

Anyhow, what follows is a summary of what I came up with. To avoid repeating the domain names, I will use K to identify kenken.com, Z for zambon.com.au, and C for calcudoku.org. ‘A’ indicates values obtained by averaging all. The triplets of numbers indicate minimum, average, and maximum values.

Number of cages: K=[31, 33.2, 38]; Z=[32, 34.3, 37]; C=[33, 34.2, 35]; A=[32.0, 33.9, 36.7].

The average number of cages is for everyone around 34. But it is interesting to note that K's spread is 7, Z's is 5, and C's only 2. This might indicate that, while K and Z do not set any limits to the number of cages, C determines the cages not completely as a result of random choices. This might be consistent with the fact that C sometimes presents puzzles that have the cages arranged in particular patterns (although none of the puzzles I randomly picked belonged to that group). It would be interesting to know what Patrick (C's developer) would have to say about this.

With larger samples, I expect that Z's (i.e., my) number of cages would turn out to be normally (i.e., randomly) distributed. Actually, as I generate the puzzles, I don't need to do the counting, because the computer automatically lists for the number of cages. I can check it out right now.

...

It turns out that the number of cages calculated for 100 Z's puzzles is [32, 34.83, 38].

The following image shows how the normal distribution (the magenta squares) fits to the measured values (the blue diamonds; these are the default of Excel and I didn't bother to change them). The vertical bars represent a standard deviation from the normally distributed values. In other words, if the distribution reflects reality, there is a 68.2% probability for each measurement to fall within the bars. At the very least, the plot confirms that the number of cages in my puzzles is not in disagreement with a normal distribution. I confess I would have been shocked if it had not been so, because the distribution is the result of several [pseudo]random choices...

The following table summarises the counts of operation codes and cage sizes.

C has about 6 times the number of 1-cell cages that K and Z have, and half the number of 2-cell cages. I will go out on a limb and say I believe that such differences are not due to statistical fluctuations within the samples. C also seems to have fewer divisions (1.3 vs. 3.5 and 5.5) and more cages with more than 3 cells (6.4 vs. 3.1 and 3.9). It seems reasonable to assume that the lower number of divisions (and perhaps subtractions) is due to the lower numbers of 2-cell cages.

In general, I have the impression that C's puzzles are more difficult than those of K and Z, and it seems reasonably safe to assume that the higher number of large cages is a contributing factor.

To deduce more from such a small sample would be inappropriate.