TheEpicCowOfLife's Blog

Imagine reviving this blog, couldn't be me.


Project maintained by Hosted on GitHub Pages — Theme by mattgraham

Projection Algorithms 3: Sudoku (WIP)

20 Feb 2022 - TheEpicCowOfLife


In this article, we will talk about Sudoku, and why it is possibly one of the greatest denial of service attacks on the human intelligence! I’ll show how this problem can be formulated as a feasibility problem, and my various findings on how the algorithms performs.

If you haven’t read the first part, do it now, otherwise things will simply not make sense.

If you haven’t read the second part, you should do it too because it helps put everything into context.

\[\def\reals{\mathbb{R}} \newcommand{\bx}{\mathbf x} \newcommand{\bw}{\mathbf w} \newcommand{\by}{\mathbf y} \newcommand{\bz}{\mathbf z} \newcommand{\ba}{\mathbf a} \newcommand{\bv}{\mathbf v} \DeclareMathOperator{\Fix}{Fix} \newcommand{\integ}[2]{[[{#1},{#2}]]}\]

Sudoku

You probably already know what sudoku is, but for avoidance of doubt, here is the problem:

You are given a 9x9 grid with some numbers filled in already. Fill in the rest of the grid such that:

Note that a puzzle is defined by a set of filled in numbers (clues), such that there is a a single unique solution. Clearly, there are very many puzzles that exist.

Below is an image of a sudoku puzzle, and its unique solution.

A sudoku, and its solution side-by side. Credit: The wikipedia page on sudoku

Note the following terminology:

In contrast to N-M queens where there can be many valid solutions, here any solution found is unique. This means that convergence of any algorithm, at all, will be quite miraculous!

The first formulation.

A first instinct is to formulate a given Sudoku board as an \(x \in \reals^{9 \times 9}\) where \(x_{ij} = k\) iff the cell at position \((i,j)\) has value \(k\).

In essence, a completed puzzle/solution \(x \in \reals^{9 \times 9}\) will have integer values between 1 and 9, satisfying the sudoku rules.

4 constraint sets can now be defined. Informally, they are:

Clearly, the first 3 sets are just 9 copies each of the set of permutations discussed all the way back in the first part, so projection can be done quite quickly. The last set is a fairly trivial projection: to project any given \(x\) onto \(C_4\), simply change the value of any cells that have a clue to the value of the clue.

However, according to my supervisor Matthew Tam, this formulation doesn’t work very well at all, so I never tried it. I can theorise why: the labels 1-9 are arbitrary, any permutation of the clues is effectively the same puzzle! Hence, doing basic arithmetic on the labels is completely nonsense. However, the operations in Douglas Rachford and Tam-Malitsky does do arithmetic on boards. To illustrate, If a board \(A\) is confident it has a 9 at position \((i,j)\), but board B has a 1, then does the average \(\frac{A + B}{2}\) mean that \((i,j)\) has a 5? Changing a cell’s value completely could totally stuff up the puzzle!

In contrast, in an n-m queens board, if \(A\) contains a 1 at position \((i,j)\) and \(B\) contains a 0, then \(\frac{A + B}{2}\) would have a 0.5 at that position, which has a natural interpretation of now being “unsure” whether or not \((i,j)\) has a queen.

A better formulation

The formulation I implemented was as follows: Represent a Sudoku board \(x\) as \(x \in \reals^{9 \times 9 \times 9}\) where \(x_{ijk} = 1\) iff the cell at position \((i,j)\) has value \(k\), and \(0\) if it doesn’t have value \(k\).

In essence, each cell now contains 9 values each, and in a completed puzzle/solution \(x \in \reals^{9 \times 9 \times 9}\) each cell will contain exactly one 1, with the rest 0. The position of the 1 determines the cell’s value.

For avoidance of doubt, here is an onslaught of definitions. For any \(X \in \reals^{9 \times 9 \times 9}\) define the following: (Switched to using \(X\) instead of \(x\) just so the subscripting looks okay)

Reminder that the notation \(\integ{a}{b}\) denotes “The set of integers between \(a\) and \(b\) inclusive.”

\[\{X_{ij}\} := \{(X_{ij1}, X_{ij2}, \dots, X_{ij9}) \ | \ i,j \in \integ{1}{9}\}\] \[\{X_{ik}\} := \{(X_{i1k}, X_{i2k}, \dots, X_{i9k}) \ | \ i,k \in \integ{1}{9}\}\] \[\{X_{jk}\} := \{(X_{1jk}, X_{2jk}, \dots, X_{9jk}) \ | \ j,k \in \integ{1}{9}\}\]

Then we also need to define \(e_i\) as the standard basis vectors in \(\reals^9\), and \(S\) as the set of all standard basis vectors as follows:

\[e_i \in \reals^9 := v \begin{cases} v_j = 1 \qquad j = i \\ v_j = 0 \qquad j \neq i \end{cases}\] \[S := \{e_i \in \reals^9 \ | \ i \in \integ{1}{9}\}\]

With that in mind, 5 constraint sets can now be defined. They are:

Projection onto \(C_1, C_2, C_3, C_4\) is easy, it simply involves many repeated projections of an \(v \in \reals^9\) onto \(S\). Using a rearrangement inequality argument similar to part one, it can be shown that projecting \(v\) onto \(S\) is as simple as setting the max value of \(v\) to 1, and the rest of the components to 0.

\(C_5\) is the same as \(C_4\) from the previous formulation, so we know how to project onto that.

Now, in contrast to the previous formulation, if a board \(A\) is confident it has a 9 at position \((i,j)\), but board \(B\) has a 1, then the average \(\frac{A + B}{2}\) will basically be perfectly undecided between a 1 and 9, with an entry of 0.5 each. This is a much more reasonable intepretation of the arithmetic done to the board.


In search of data

In contrast to nm-queens, this time we can benchmark algorithms by chucking sudoku puzzles at them and seeing how many they solve!

But where do we get good datasets of puzzle? By entering the rabbit hole.

Google was very kind to me, and I immediately found this repository. This is an extremly optimised sudoku solver, as if it was written by someone whose sole purpose in life was the write the sudoku solver to end all sudoku solvers, and end the years of man-hours spent writing sudoku solvers.

Well, it really isn’t that dramatic, but that’s what I imagined as I waded through man-years of efforts dedicated solely to the study of sudoku- scrolling down the gargantuan algorithm writeup, infinitely recursing in more and more references like some kind of horrid academic hydra. Then there was the meticulous benchmarking of a dozen top algorithms written by a dozen authors, each presumably with their own set of unique insights and observations.

Looking into the data sources, the hydra splits yet again. Not only are people solving sudokus, automating the solving of sudokus, there are people dedicating their intellect to generating puzzles? Here is a forum spanning 61 pages. Participants communicate in brief, but vague posts with code-words like “sk-loop” and “bi bi pattern”, but somehow perfectly understand each others’ dizzyingly abstract ideas: a sign of terrifying collective intelligence. They are not putting their minds to curing cancer. This is a community dedicated to writing programs to generate sudoku puzzles, with the aim for finding the hardest puzzles for computers and humans to solve.

If a denial of service attack can naturally self-replicate, does that mean that it is a virus? Is sudoku a virus?

Either way, should I be so cynical as I myself, was prepared to justify throwing my man-weeks into this black hole of intellect?

I finally took the data.zip file and left.

Inside data.zip

There were 9 data sets. I unzipped it and committed to my repository.

Foolish Quang.

There was a dataset over 100 mb when unzipped, and now I couldn’t push to github. Great. There goes an afternoon expanding my git vocabulary to try and undo some commits. If you ever run into the same situation, git reset -soft (or something like that) will undo a commit without changing the files in your repository.

After ignoring the datasets with multiple solutions or no solutions, there were 4 major data sets I considered.

According to the tdoku repository where I sourced this data, these datasets are roughly in order of difficulty.

A Puzzle’s difficulty.

But what does difficulty really mean? Good thing reading those forums answered this question before I decided to have a crack at some of the sudokus in the dataset. Turns out people have created sudoku solvers that solve sudokus using human-like deduction. Each technique aims to make progress by ruling out possible entries from cells. They seem to work by trying to make progress by going down an increasingly esoteric list of techniques, and rate the difficulty of the puzzle by the most esoteric technique used.

What exactly are the “techniques”?

Let’s look through the dataset…

puzzles0_kaggle

I put a couple kaggle puzzles of the dataset through and all the “puzzles” look fairly trivial: they just consist of applying the basic sudoku rules over and over like in the picture below Image of a a simple deduction

This is consistent with what the comments on the Kaggle post says

puzzles2_17_clue

It is well known that 17 is the minimum number of clues required for a puzzle to have a unique solution. While it may be intuitive that the less clues you give, the harder the puzzle must be, that is actually not really the case. There are some very simple 17 clue puzzles in there, that can be solved by a beginner using simple techniques like above.

However puzzles requiring more advanced deductions are quite common in the dataset. To illustrate, here’s one such technique:

Image of a slightly smarter deduction

Many more esoteric technique features in the database, but an experienced human solver could reasonably do most of these.

puzzles3_magictour_top1465

Here’s where things start getting unhinged. It is at this point that people are generating sudoku puzzles with the intent of making them as difficult to solve as possible.

Here’s one example of a deduction required to solve the first puzzle in the dataset.

Image of a fairly unhinged deduction

To give you an idea of what is going on in this madness, I’ve pasted the entire deduction below. Happy scrolling.

Click to show the full deduction

In short, these puzzles require brute force to solve. No human would have the patience to solve them.

puzzles6_forum_hardest_1106

A quick mention of the puzzles4 and puzzles5 dataset, they have largely the same backstory of puzzles6: collections of the hardest sudoku puzzles generated by the forum. However these sets contained more puzzles, and the difficulty was more “dilute”. However they were still crazy hard, and were strictly harder than even the puzzles in the previous dataset.

We skip ahead to puzzles6, a collection of the hardest puzzles amongst the hardest puzzles. Plugging them into sukaku explainer, we are greeted with a particularly forboding message.

Window saying "The next solving techniques are advanced ones that may take a very long computing time. Do you want to continue anyway?:

This happens a couple of times, and each time out pops a very cooked deduction chain. Here is an impressively cooked deduction.

Image of a completely unhinged deduction

Click to show the full deduction