I enjoy Sudoku as a way to relax on occasion but, being an IT guy, I suppose I am predictably meta: solving them is fun, but making a machine to solve them is even more fun. I created a clunky solver once before based on the idea of using as few T-SQL statements as possible. This year I decided to try to make a fast solver instead. It’s working fairly well and solves most puzzles in under a second on my laptop. It helps I’ve been sick with a cold for the past two weeks, giving me the sort of “enforced relaxation” that drives me to puzzles.
Here’s the basic algorithm:
- Create a table for the puzzle solution, and insert the given values from the puzzle definition.
- Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This is a bit like the “notes” feature of an electronic Sudoku app, where you can indicate possible values for unsolved cells.
- Loop while there is no solution
- Using some list of solving rules, “push” solved values from the cells in the candidates table to the solution table.
- Use a process of elimination to remove candidate values that the Sudoku rules make impossible.
- End Loop
I started with that fundamental idea, but I found that the solver rules I could encode in T-SQL (at least with my level of math skill) covered most but not all puzzles. Some of the advanced ones would stall or reach an impasse where the available rules couldn’t make progress eliminating values. For those cases I added some branching logic to do a minimal level of trial and error. The adjusted algorithm became this more complicated version:
- Create a table for 1-n puzzle solutions with a branching identifier. Insert the given values from the puzzle as branch 1.
- Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This table also has a branch identifier, and the first set of candidates is also labeled as branch 1.
- Loop while there is no solution
- Using some list of solving rules, “push” solved values from any branches in the candidates table into the corresponding branch in the solution table
- If no cells are solved, then “branch” the solution by duplicating the solved values and the candidates into multiple versions/branches on some cell that has a minimal number of candidate values. Example: if we see no progress, and cell 3B might be a 2 or might be a 3, then make two versions, one using 2 and one using 3, and proceed with both.
- If a branch contains a “contradiction” – meaning an impossible combination of values – then it cannot be the solution to the puzzle, so remove it.
- Use a process of elimination to remove candidate values that the rules make impossible, from all branches.
- End Loop
I found that getting started with this felt complicated, but I was able to distill the code down gradually to a few statements. Admittedly, they are dense, and take some time to understand, but in all it didn’t end up being very much code at all.
Modeling the Puzzle
First, how to represent the puzzle space in SQL Server? A Sudoku puzzle looks like a table, but in fact it isn’t like a database table at all, because position is so important to the logic of the puzzle – in SQL the order of rows and columns is, by definition, undefined. For example, there’s no “last row” in a table. Further, in creating the solver I tried to stay in the spirit of relations and sets, and not resort to techniques like string manipulation or stuffing multiple values into the same column.
For me, it was simplest to think of the puzzle as several intersecting sets (“rows”, “columns,” “squares,” available digits) and make a table representing the sets instead of the physical layout of the Sudoku square.
The schema looks like this: a Solution table has columns for Value, Row, Column, and Square. Value is the digit 1-9 in the cells of the puzzle, and Row, Column and Square represent the position of the cell in the larger Sudoku square. Rows are numbered, columns assigned letters A-I and the squares are labeled S1-S9, reading left to right and top to bottom in the physical puzzle. Unfortunately, Row, Column and Square are all keywords in T-SQL, so I abbreviated them as Rw, Cl, Sq to keep the square brackets at bay in the code. Lastly, a Branch column is appended to allow the solver to make versions of the puzzle:
USE Sudoku
GO
CREATE TABLE dbo.Solution (
Value tinyint NOT NULL,
Rw tinyint NOT NULL,
Cl char(1) NOT NULL,
Sq AS ( dbo.SquareFor(Rw,Cl) ),
Branch int NOT NULL
)
Because the square is implied by the row and column location in the puzzle, the Sq value is computed using a UDF, as:
CREATE FUNCTION dbo.SquareFor
(
@Rw tinyint, @Cl char(1)
)
RETURNS char(2)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @Result char(2)
SELECT @Result =
CASE
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'A' AND 'C' THEN 'S1'
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'D' AND 'F' THEN 'S2'
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'G' AND 'I' THEN 'S3'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'A' AND 'C' THEN 'S4'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'D' AND 'F' THEN 'S5'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'G' AND 'I' THEN 'S6'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'A' AND 'C' THEN 'S7'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'D' AND 'F' THEN 'S8'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'G' AND 'I' THEN 'S9'
END
RETURN @Result
END
[These code samples are just to explain the solution and won’t run in the order presented, but all the code in a runnable script is attached to this post.]
The domains for allowed values, rows and columns are very useful to explicitly define, as we’ll see later, so I have four reference tables containing the possible digits, row, column and square labels, all following this pattern:
CREATE TABLE dbo.Cls(
Label char(1) NOT NULL,
CONSTRAINT PK_Cls PRIMARY KEY CLUSTERED ( Label ASC )
) ;
INSERT dbo.Cls ( Label ) VALUES ( 'A' );
INSERT dbo.Cls ( Label ) VALUES ( 'B' );
INSERT dbo.Cls ( Label ) VALUES ( 'C' );
INSERT dbo.Cls ( Label ) VALUES ( 'D' );
INSERT dbo.Cls ( Label ) VALUES ( 'E' );
INSERT dbo.Cls ( Label ) VALUES ( 'F' );
INSERT dbo.Cls ( Label ) VALUES ( 'G' );
INSERT dbo.Cls ( Label ) VALUES ( 'H' );
INSERT dbo.Cls ( Label ) VALUES ( 'I' );
Given those objects, the rules of the game can be encoded with some foreign keys and unique constraints. Only one of each digit allowed per row, column and square, and only those values defined in the domain tables:
ALTER TABLE dbo.Solution WITH CHECK ADD CONSTRAINT FK_Solution_Vals FOREIGN KEY ( Value )
REFERENCES dbo.Digits ( Value );
ALTER TABLE dbo.Solution WITH CHECK ADD CONSTRAINT FK_Solution_Cols FOREIGN KEY( Cl )
REFERENCES dbo.Cls ( Label );
ALTER TABLE dbo.Solution WITH CHECK ADD CONSTRAINT FK_Solution_Rows FOREIGN KEY( Rw )
REFERENCES dbo.Rws ( Label );
CREATE UNIQUE NONCLUSTERED INDEX UniqueClValue ON dbo.Solution
(
Value ASC,
Cl ASC,
Branch ASC
);
CREATE UNIQUE NONCLUSTERED INDEX UniqueRowValue ON dbo.Solution
(
Value ASC,
Rw ASC,
Branch ASC
);
CREATE UNIQUE NONCLUSTERED INDEX UniqueSqValue ON dbo.Solution
(
Value ASC,
Sq ASC,
Branch ASC
);
Note the addition of Branch in the indexes, so that we can store multiple versions of a puzzle, while still being constrained to the puzzle rules.
Now, how to populate the Solution table with a puzzle? In the code sample attached to this post, there are several example puzzles prebuilt, but this is the method. There’s a default on the Solutions table to make new inserts as Branch 1:
ALTER TABLE dbo.Solution ADD CONSTRAINT DF_Solution_Branch DEFAULT (1) FOR Branch
That allows setting up a puzzle by entering just the values and row and column position:
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 1, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 1, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 2, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 2, 'E' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 2, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 2, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 3, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 3, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 3, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 4, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 4, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 4, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 5, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 5, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 5, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 5, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 6, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 6, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 6, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 7, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 7, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 7, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 8, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 8, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 8, 'E' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 8, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 9, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 9, 'I' );
It’s even faster to enter the values and positions using SSMS > Edit Top 200 (gasp!) on the Solutions table. I also found it helpful to back the puzzles up with SELECT INTO as I was working on the solver, because entering them is definitely tedious. In the code sample there are several example puzzles that can be loaded like this:
TRUNCATE TABLE dbo.Solution;
INSERT dbo.Solution ( Value, Rw, Cl )
SELECT Value, Rw, Cl
FROM dbo.Sample5;
One last bit of setup: even though a relational table doesn’t directly model the Sudoku square, requiring this other method to model the puzzle, it’s incredibly helpful to be able to view the puzzle like a Sudoku square. In fact, I found working on this was impossible without rendering the puzzle in the paper layout. The paper display can be presented using a pivot:
CREATE VIEW dbo.DisplaySolution AS
SELECT Branch, Rw,
MIN( CASE Cl WHEN 'A' THEN Value ELSE NULL END ) AS A,
MIN( CASE Cl WHEN 'B' THEN Value ELSE NULL END ) AS B,
MIN( CASE Cl WHEN 'C' THEN Value ELSE NULL END ) AS C,
MIN( CASE Cl WHEN 'D' THEN Value ELSE NULL END ) AS D,
MIN( CASE Cl WHEN 'E' THEN Value ELSE NULL END ) AS E,
MIN( CASE Cl WHEN 'F' THEN Value ELSE NULL END ) AS F,
MIN( CASE Cl WHEN 'G' THEN Value ELSE NULL END ) AS G,
MIN( CASE Cl WHEN 'H' THEN Value ELSE NULL END ) AS H,
MIN( CASE Cl WHEN 'I' THEN Value ELSE NULL END ) AS I
FROM dbo.Solution
GROUP BY Branch, Rw
SELECT
A, B, C, D, E, F, G, H, I
FROM dbo.DisplaySolution
ORDER BY Rw;
Now we have all the setup required to make the solver work. Returning to the original algorithm above, we have a few discrete tasks that we can break out into procedures:
- Deriving a table of all the possible candidate values for empty cells in the solution
- Searching that candidates table for solved cells and pushing those into the solution
- Eliminating candidates
- Branching the puzzle into versions if we get stuck in a situation where the rules we have in #2 and #3 don’t work
Candidates
So, deriving all possible candidate values is fairly straightforward: if we cross join the rows, the columns and the digits 1-9, we get a derived table of every possibility for any puzzle. From that we can eliminate the candidate values from the cells that are already in our specific puzzle. We can also eliminate all the candidates that conflict with the values in the given puzzle, in the same row, column or square.
In order to make this manageable, I used a “bag” analogy. Imagine a bag full of tiles, like scrabble tiles, where there are 9 ones, 9 twos, 9 threes, etc. From that bag we know that 1 one will go in the first Sudoku square, 1 one into the second, one into the third, and so on. It’s possible to “label” the tiles ahead of time with the 9x9 square to which they will belong even though we don’t know which specific cell they will ultimately occupy.
So this code starts with a CTE called “bag” that has just this structure, made using a cross join of all digits and all squares. From the bag, we discard a bunch of the tiles for the already-occupied cells in the solution. Next, we take the remaining contents of the bag and use it to make a Candidates table, further discarding tiles that conflict with the already-solved cells in the solution by occupying the same row, column or square. It’s a great CTE exercise:
CREATE PROCEDURE dbo.MakeCandidates
AS
BEGIN
-- Make a table containing all possible "candidate" values that could go in
-- empty cells in the puzzle
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Candidates]') AND type in (N'U'))
DROP TABLE dbo.Candidates;
WITH Bag AS (
SELECT Digits.Value, Squares.Label AS Sq, 1 AS Branch
FROM dbo.Digits cross join dbo.Squares
EXCEPT SELECT Value, Sq, Branch FROM dbo.Solution
),
AllCells AS (
SELECT Rws.Label AS Rw,
Cls.Label AS Cl,
( SELECT dbo.SquareFor( Rws.Label, Cls.Label ) ) AS Sq
FROM Rws CROSS JOIN Cls
),
Placements AS (
SELECT Bag.Value, allCells.Rw, allCells.Cl, allCells.Sq
FROM AllCells JOIN Bag ON allCells.Sq = bag.Sq
)
SELECT p.Value, p.Rw, p.Cl, p.Sq, 1 AS Branch
INTO dbo.Candidates
FROM placements p
WHERE NOT EXISTS( SELECT 1 FROM dbo.Solution
WHERE Solution.Rw = p.Rw AND Solution.Cl = p.Cl
)
AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
WHERE Solution.Value = p.Value AND Solution.Rw = p.Rw
)
AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
WHERE Solution.Value = p.Value AND Solution.Cl = p.Cl
)
AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
WHERE Solution.Value = p.Value AND Solution.Sq = p.Sq
);
END
The result is that we have a Solution table with one version of the puzzle at its starting point, and a Candidates table that contains all the plausible values that might go into all empty cells in the puzzle. All the rows in both tables are identified as “Branch 1.” Going forward we’ll work back and forth between the two tables to solve the puzzle by process of elimination.
Solved Cells
The next chunk of the algorithm is to try to search the candidates for solved cells and push those into the solution. This code uses about four of the common Sudoku solving methods (those that I could sanely encode in T-SQL). Many sites describe them, such as http://www.sudoku129.com/puzzles/tips_intro.php or http://www.sudokuoftheday.com/pages/techniques-overview.php or your favorite search engine’s suggestions.
CREATE PROCEDURE dbo.PushSolvedCells ( @branch int, @count int OUTPUT )
AS
BEGIN
-- Cells having exactly one possible value must be that value
WITH solvedCells AS (
SELECT Value, p.Rw, p.Cl, p.Branch
FROM dbo.Candidates p
INNER JOIN (
SELECT Rw, Cl, Branch
FROM dbo.Candidates
GROUP BY Rw, Cl, Branch
HAVING COUNT(*) = 1
) AS onePossibleValue
ON p.Rw = onePossibleValue.Rw
AND p.Cl = onePossibleValue.Cl
AND p.Branch = onePossibleValue.Branch
AND p.Branch = @branch
) ,
-- Numbers that are possible in only one cell of a row must occupy that cell
unqInRow AS (
SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
FROM dbo.Candidates p
INNER JOIN (
SELECT Value, Rw, Branch
FROM dbo.Candidates
GROUP BY Value, Rw, Branch
HAVING COUNT(*) = 1
) AS uniques
ON p.Rw = uniques.Rw
AND p.Value = uniques.Value
AND p.Branch = uniques.Branch
AND p.Branch = @branch
),
-- Numbers that are possible in only one cell of a column must occupy that cell
unqInCol AS (
SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
FROM dbo.Candidates p
INNER JOIN (
SELECT Value, Cl, Branch
FROM dbo.Candidates
GROUP BY Value, Cl, Branch
HAVING COUNT(*) = 1
) AS uniques
ON p.Cl = uniques.Cl
AND p.Value = uniques.Value
AND p.Branch = uniques.Branch
AND p.Branch = @branch
),
-- Numbers that are possible in only one cell of a square must occupy that cell
unqInSquare AS (
SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
FROM dbo.Candidates p
INNER JOIN (
SELECT Value, Sq, Branch
FROM dbo.Candidates
GROUP BY Value, Sq, Branch
HAVING COUNT(*) = 1
) AS uniques
ON p.Sq = uniques.Sq
AND p.Value = uniques.Value
AND p.Branch = uniques.Branch
AND p.Branch = @branch
)
INSERT dbo.Solution( Value, Rw, Cl, Branch )
SELECT Value, Rw, Cl, Branch FROM solvedCells
UNION
SELECT Value, Rw, Cl, Branch FROM unqInRow
UNION
SELECT Value, Rw, Cl, Branch FROM unqInCol
UNION
SELECT Value, Rw, Cl, Branch FROM unqInSquare;
SET @count = @@ROWCOUNT;
END
The procedure processes one branch at a time given by @branch, and outputs the number of cells that were solved with the rules in the code. The count will become important when we need to track “stalled” branches of the solution.
Elimination
The next chunk of the algorithm says, “OK, now that those cells’ values are known, what other candidates are eliminated.” This is sort of the other half of the solving logic – take the remaining candidates and try to eliminate as many as possible, with other solving techniques:
CREATE PROCEDURE dbo.EliminateCandidates as
BEGIN
DELETE dbo.Candidates
WHERE EXISTS ( SELECT 1 FROM dbo.Solution s
WHERE s.Cl = Candidates.Cl
AND s.Rw = Candidates.Rw
AND s.Branch = Candidates.Branch
)
OR EXISTS ( SELECT 1 FROM dbo.Solution s1
WHERE s1.Value = Candidates.Value
AND s1.Rw = Candidates.Rw
AND s1.Branch = Candidates.Branch
)
OR EXISTS ( SELECT 1 FROM dbo.Solution s2
WHERE s2.Value = Candidates.Value
AND s2.Cl = Candidates.Cl
AND s2.Branch = Candidates.Branch
)
OR EXISTS ( SELECT 1 FROM dbo.Solution s3
WHERE s3.Value = Candidates.Value
AND s3.Sq = Candidates.Sq
AND s3.Branch = Candidates.Branch
);
-- Narrow rows by square: for each row, find values that can only be in one square
-- and delete other candidates for those values in the same square
WITH irs AS (
SELECT DISTINCT Value, Rw, Sq, Branch
FROM dbo.Candidates p
WHERE ( SELECT COUNT ( DISTINCT Sq )
FROM dbo.Candidates p2
WHERE p2.Rw = p.Rw
AND p2.Value = p.Value
AND p2.Branch = p.Branch ) = 1
)
DELETE dbo.Candidates
FROM dbo.Candidates p
JOIN irs ON p.Sq = irs.Sq
AND p.Value = irs.Value
AND p.Branch = irs.Branch
AND p.Rw != irs.Rw;
-- Narrow columns by square: for each column, find values that can only be in one square
-- and delete other candidates for those values in the same square
WITH irs AS (
SELECT DISTINCT Value, Cl, Sq, Branch
FROM dbo.Candidates p
WHERE ( SELECT COUNT ( DISTINCT Sq )
FROM dbo.Candidates p2
WHERE p2.Cl = p.Cl AND p2.Value = p.Value AND p2.Branch = p.Branch ) = 1
)
DELETE dbo.Candidates
FROM dbo.Candidates p
JOIN irs ON p.Sq = irs.Sq
AND p.Value = irs.Value
AND p.Branch = irs.Branch
AND p.Cl != irs.Cl;
END
Alternating these two procedures – locating solved cells and pushing them into the solution, and then eliminating other candidates – will solve most of the “easy” and some “intermediate” level puzzles in books or Sudoku apps. The next challenge is what to do when these rules are exhausted and can’t narrow the field of candidate values. A mathematician might be able to add more solving methods to the handful I have here and make the algorithm solve ANY puzzle. I’m not quite that smart, so for some advanced puzzles I had to introduce branching, or live with the fact that some would not be solved.
(Technically I did a lot of work implementing one more advanced method, which was pages and pages of code – but I found that it was never invoked in any sample puzzles I tried, perhaps because the logic was already implied by one of the other statements above.)
Branch
Branching presents a few challenges: first, how to know when to branch. Second, some branches are by definition invalid, so how to we identify and cull those? Third, simply, what are the logistics to make branches? How?
The answer I came up with to the first is just to count how many cells are solved in each pass through the solver. If we execute a pass and we don’t see any new values being inserted, then the branch(es) we have in play are obviously stalled. When they stall, make a new branch.
Invalid branches are interesting. They fail in one of two ways: either there is a cell that has NO candidates, which ultimately is caught by the check above, or there is a contradiction that surfaces where, for example, a digit appears to belong in two places. The contradictions violate the rules of the puzzle, and because we encoded the rules of the puzzle in a series of unique constraints they become constraint violations in SQL Server. They actually cause the script to throw an error where the implied solved cells can’t be inserted into the solution because they violate a unique index. I decided just to roll with that idea: branches work until they throw an error, at which point they can be discarded from the process.
So, when we run the PushSolvedCells procedure, and @count shows we didn’t solve any cells, then we branch. Branching works by identifying a cell in Candidates that has a minimal number of possible values (practically always 2). We then rank the possible values for that cell, insert the #1 possibility into the existing branch and use the #2 and higher in new copy(ies) of the solution and all the candidates:
CREATE PROCEDURE dbo.Branch AS
BEGIN
IF OBJECT_ID( 'tempdb..#branchCells' ) IS NOT NULL DROP TABLE #branchCells;
CREATE TABLE #branchCells ( Branch int, Value int, Rw int, Cl char(1) );
WITH
BranchCell AS (
-- Locate a cell with a small number of candidates
SELECT TOP ( 1 ) Rw, Cl, Branch
FROM dbo.Candidates
GROUP BY Rw, Cl, Branch
ORDER BY COUNT(*), Rw, Cl
) ,
LastBranch AS (
SELECT MAX( branch ) AS id FROM dbo.Solution
) ,
BranchVals as (
-- Find the candidate values for that cell
SELECT ( ROW_NUMBER() OVER ( order by c.value ) - 1 ) as branchOffset,
c.Value, c.Rw, c.Cl, c.Branch
FROM dbo.Candidates c
INNER JOIN BranchCell ON c.Rw = BranchCell.Rw
AND c.Cl = BranchCell.Cl AND c.Branch = BranchCell.Branch
) INSERT #branchCells ( Branch, Value, Rw, Cl )
SELECT
-- Number the candidates by branch
CASE WHEN branchOffset = 0 THEN BranchVals.Branch
ELSE branchOffset + LastBranch.id END AS Branch,
BranchVals.Value,
BranchVals.Rw,
BranchVals.Cl
FROM BranchVals
CROSS APPLY LastBranch;
-- For branch cells other than the first one, make a copy of all candidates
WITH RankedBranches AS (
SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
FROM #branchCells
) INSERT INTO dbo.Candidates ( Branch, Value, Rw, Cl, Sq )
SELECT r.Branch,
c.Value,
c.Rw,
c.Cl,
c.Sq
FROM RankedBranches r
JOIN dbo.Candidates c ON c.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
AND r.rnk > 1;
-- For branch cells other than the first one, make a copy of the solution
WITH RankedBranches AS (
SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
FROM #branchCells
) INSERT INTO dbo.Solution ( Branch, Value, Rw, Cl )
SELECT r.Branch,
s.Value,
s.Rw,
s.Cl
FROM RankedBranches r
JOIN dbo.Solution s ON s.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
AND r.rnk > 1;
-- Delete the specific candidates from the copies of all candidates
-- corresponding to the branching values
DELETE dbo.Candidates
FROM dbo.Candidates c
JOIN #branchCells b
ON c.Branch = b.Branch
AND c.Cl = b.Cl
AND c.Rw = b.Rw
AND c.Value != b.Value;
END
If you are still with me, wonderful! This is a bit of a marathon post, but we are near the end.
Solver
The last piece we need is just a “driver” script to loop through these procedures in the right order, with some flow control. The idea here follows the algorithm at the top of this post: run a loop, alternate inserting solved cells and removing candidates, until a solution surfaces. If the loop stalls, then branch, and if a branch fails, then delete it. The script is also decorated with statements to load and display the puzzle, to track progress, and a timer to see how long the solution took to produce:
USE Sudoku
-- Setup for a puzzle
TRUNCATE TABLE dbo.Solution;
-- Insert the values for a new puzzle here OR:
INSERT dbo.Solution ( Value, Rw, Cl )
SELECT Value, Rw, Cl
FROM dbo.Sample5; -- < Change this sample table to try other puzzles
-- Display the unsolved puzzle
SELECT
A, B, C, D, E, F, G, H, I
FROM dbo.DisplaySolution
ORDER BY Rw;
-- End Setup
-- Solve the puzzle
SET NOCOUNT ON;
DECLARE @starttime datetime2,
@endtime datetime2
SET @starttime = SYSDATETIME();
PRINT 'Compute all candidate values'
EXEC dbo.MakeCandidates;
DECLARE @currBranch int = 0,
@branchSolvedCells int = 0,
@passSolvedCells int = 0,
@puzzleSolved bit = 0;
WHILE 1 = 1
BEGIN
-- Attempt to move solved cells' values from Candidates to Solution
-- one branch at a time
-- This variable tracks whether the solutions are making any progress
-- and controls branching solutions that have "stalled"
SET @passSolvedCells = 0;
DECLARE branches CURSOR FAST_FORWARD
FOR
SELECT DISTINCT Branch FROM dbo.Solution;
OPEN branches;
FETCH NEXT FROM branches INTO @currBranch;
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
PRINT 'Insert solved cells branch ' + cast( @currBranch as varchar(5) );
EXEC dbo.PushSolvedCells @branch = @currBranch, @count = @branchSolvedCells OUTPUT;
IF ( ( SELECT COUNT(*) FROM dbo.Solution where Branch = @currBranch ) = 81 )
BEGIN
SET @puzzleSolved = 1;
BREAK; -- Solved!
END;
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2601 BEGIN
-- Constraint violation means branch has a contradiction and cannot be the solution
-- Remove invalid branch
PRINT ERROR_MESSAGE()
PRINT 'Constraint Violation, Purge Branch ' + cast( @currBranch as varchar(5) );
DELETE dbo.Solution WHERE Branch = @currBranch
DELETE dbo.Candidates WHERE Branch = @currBranch
END
ELSE BEGIN
-- If the code is correct we should never hit this block
RAISERROR( 'Unhandled error', 11, 1 );
BREAK;
END
END CATCH
SET @passSolvedCells += @branchSolvedCells;
FETCH NEXT FROM branches INTO @currBranch;
END
CLOSE branches;
DEALLOCATE branches;
IF ( @puzzleSolved = 1 ) BREAK; -- Solved in the last pass!
IF @passSolvedCells = 0
BEGIN
-- Our rules of elimination didn't isolate any new values
-- branch one of the solutions
PRINT 'Stuck! Branch!' ;
EXEC dbo.Branch ;
END ;
-- Remove candidate values from all branches using process of elimination
PRINT 'Eliminate Candidate Values'
EXEC dbo.EliminateCandidates ;
END ;
SET @endtime = SYSDATETIME();
-- Display the solution
SELECT
A, B, C, D, E, F, G, H, I
FROM dbo.DisplaySolution
WHERE Branch = ( SELECT Branch
FROM dbo.Solution
GROUP BY Branch
HAVING COUNT(*) = 81 )
ORDER BY Rw ;
SELECT DATEDIFF( millisecond, @starttime, @endtime ) AS Duration;
Working sample code for the database and the solver script will be attached to this post if you want to try this out. It was authored on SQL Server 2008 R2. I’d welcome any and all observations, suggestions, improvements! Just leave notes below. Enjoy!