THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Using constraints to enforce uniqueness of ordered sets of rows

Even the simplest data integrity rules are not easy to implement in the database if instead of individual rows we need to deal with groups or subsets.

For example, making sure that a column value is unique in a table is as trivial as creating a unique constraint or index. However, what are we supposed to do if we do not want to store a group of values twice? What if we store cooking recipes as sequences of steps (sets of values), and we do not want to store one and the same recipe (one set of values) more than once?

In the following sections we shall solve this problem using only constraints and indexed views, just for fun, as a SQL puzzle. (Of course, we could use a trigger or some other thing with subqueries - but that would be a different puzzle).

At the time of this writing I have not yet used in production the technique I am going to describe. I think it is quite complex and looks tricky, and the reason is simple: in my opinion RDBMS have not been designed to deal with this kind of problems. As such, we have to come up with complex workarounds.

Enforcing Uniqueness of Ordered Sets, of Limited Size Only

In this solution we shall deal with sets that have at most five elements.

In some cases, the order of elements in a subset matters. For example, the following two sequences of instructions yield different results.

Cooking a hard-boiled egg:

 

  • Put eggs into pan, add water
  • Bring water to boiling, boil 4 minutes;
  • Remove eggs and cool them down in cold water

 

Keeping the egg uncooked:

 

  • Put eggs into pan, add water
  • Remove eggs and cool them down in cold water
  • Bring water to boiling, boil 4 minutes

 

These two recipes are clearly different, and they yield different results. Of course, the second recipe makes no sense, but we want the database to store them both.

Suppose, however, that the database should not be able to store one and the same recipe more than once.

Setting up tables and test data

Before implementing this requirement, let us create the tables and add test data. Both recipes discussed above involve the same three steps:


CREATE TABLE dbo.RecipeSteps
    
(
      
RecipeStepId INT NOT NULL ,
      
Intsructions VARCHAR(50) NOT NULL ,
      
CONSTRAINT PK_RecipeSteps PRIMARY KEY ( RecipeStepId ) 
    ) ;
GO
INSERT  INTO dbo.RecipeSteps
        
( RecipeStepId, Intsructions )
VALUES  ( 1, 'Put eggs into pan, add water' ),
        ( 
2, 'Bring water to boiling, boil 4 minutes' ),
        ( 
3, 'Remove eggs and cool them down in cold water' ) ; 

 

Let us upload upload the first recipe, as follows:

 

CREATE TABLE dbo.RecipesStepByStep
    
(
      
RecipeId INT NOT NULL ,
      
StepNumber SMALLINT NOT NULL ,
      
RecipeStepId INT NOT NULL ,
      
CONSTRAINT PK_RecipesStepByStep PRIMARY KEY ( RecipeId, StepNumber ) ,
      
CONSTRAINT FK_RecipesStepByStep_RecipeSteps FOREIGN KEY ( RecipeStepId ) REFERENCES dbo.RecipeSteps ( RecipeStepId )
    ) ;
GO

INSERT  INTO dbo.RecipesStepByStep
        
( RecipeId, StepNumber, RecipeStepId )
VALUES  ( 1, 1, 1 ),
        ( 
1, 2, 2 ),
        ( 
1, 3, 3 ) ;

SELECT  StepNumber ,
        
Intsructions
FROM    dbo.RecipeSteps AS rs
        
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE   sbs.RecipeId = 1 ;

StepNumber Intsructions
---------- --------------------------------------------------
1          Put eggs INTO pan, ADD water
2          Bring water 
TO boiling, boil 4 minutes
3          Remove eggs 
AND cool them down IN cold water

 

Let us also add the second recipe:



INSERT  INTO dbo.RecipesStepByStep
        
( RecipeId, StepNumber, RecipeStepId )
VALUES  ( 2, 1, 1 ),
        ( 
2, 2, 3 ),
        ( 
2, 3, 2 ) ;

SELECT  StepNumber ,
        
Intsructions
FROM    dbo.RecipeSteps AS rs
        
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE   sbs.RecipeId = 2 ;

StepNumber Intsructions
---------- --------------------------------------------------
1          Put eggs INTO pan, ADD water
2          Remove eggs 
AND cool them down IN cold water
3          Bring water 
TO boiling, boil 4 minutes 

 

Right now nothing prevents us from adding a second copy of the first recipe. You can run the following script and see for yourself that it succeeds:

 


BEGIN TRANSACTION ;

INSERT  INTO dbo.RecipesStepByStep
        
( RecipeId, StepNumber, RecipeStepId )
VALUES  ( 3, 1, 1 ),
        ( 
3, 2, 2 ),
        ( 
3, 3, 3 ) ;

SELECT  StepNumber ,
        
Intsructions
FROM    dbo.RecipeSteps AS rs
        
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE   sbs.RecipeId = 3 ;

ROLLBACK ;

To enforce uniqueness of recipes, we shall use an indexed view and a unique index on top of it.

To keeps the scripts simple, let us assume that recipies cannot consist of more than five steps:

ALTER TABLE dbo.RecipesStepByStep 
ADD CONSTRAINT RecipesStepByStepMax5 CHECK (StepNumber BETWEEN 1 AND 5) ;

Let us also create an indexed view that stores all the steps of a recipe in one row.

Once we have all the steps in one row, we can use a unique index to make sure that a sequence of steps is unique. We want the contents of the indexed view to look as follows, with the first line representing the first recipe:

 


RecipeId    Step1       Step2       Step3       Step4       Step5
----------- ----------- ----------- ----------- ----------- -----------
1           1           2           3           0           0
2           1           3           2           0           0

 

Note: To get around the limitations of indexed views, we need to make sure that all the pivoted columns are not nullable. This is why columns Step4 and Step5 are zeroes instead of nulls – otherwise we would not be able to create the unique index RecipesStepByStepsPivoted_UniqueRecipes in the following script:

 


CREATE VIEW dbo.RecipesStepByStepsPivoted
WITH SCHEMABINDING
AS
SELECT 
RecipeId, 
count_big(*) AS Cnt,
SUM(CASE WHEN StepNumber = 1 THEN RecipeStepId ELSE 0 END) AS Step1,
SUM(CASE WHEN StepNumber = 2 THEN RecipeStepId ELSE 0 END) AS Step2,
SUM(CASE WHEN StepNumber = 3 THEN RecipeStepId ELSE 0 END) AS Step3,
SUM(CASE WHEN StepNumber = 4 THEN RecipeStepId ELSE 0 END) AS Step4,
SUM(CASE WHEN StepNumber = 5 THEN RecipeStepId ELSE 0 END) AS Step5
FROM dbo.RecipesStepByStep
GROUP BY RecipeId ;
GO

CREATE UNIQUE CLUSTERED INDEX RecipesStepByStepsPivoted_CI
ON dbo.RecipesStepByStepsPivoted(RecipeId) ;
GO

CREATE UNIQUE INDEX RecipesStepByStepsPivoted_UniqueRecipes
ON dbo.RecipesStepByStepsPivoted(Step1, Step2, Step3, Step4, Step5) ;
GO
  
SELECT  RecipeId ,
        
Step1 ,
        
Step2 ,
        
Step3 ,
        
Step4 ,
        
Step5 
FROM    dbo.RecipesStepByStepsPivoted ;


RecipeId    Step1       Step2       Step3       Step4       Step5
----------- ----------- ----------- ----------- ----------- -----------
1           1           2           3           0           0
2           1           3           2           0           0

It is easy to verify that the unique index built on top of the indexed view does enforce the uniqueness of recipes:


BEGIN TRANSACTION ;

INSERT  INTO dbo.RecipesStepByStep
        
( RecipeId, StepNumber, RecipeStepId )
VALUES  ( 3, 1, 1 ),
        ( 
3, 2, 2 ),
        ( 
3, 3, 3 ) ;

SELECT  StepNumber ,
        
Intsructions
FROM    dbo.RecipeSteps AS rs
        
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE   sbs.RecipeId = 3 ;

ROLLBACK ;

Cannot INSERT duplicate KEY row IN object 'dbo.RecipesStepByStepsPivoted' WITH UNIQUE INDEX 'RecipesStepByStepsPivoted_UniqueRecipes'.
The statement has been terminated.


RecipeId    Step1       Step2       Step3       Step4       Step5
----------- ----------- ----------- ----------- ----------- -----------
1           1           2           3           0           0
2           1           3           2           0           0

 

Note: because we are using an undex to enforce uniqueness, we must set a limit on the number of elements. It does not have to be as low as five – we set it that low just to keep the examples short.

Let us close a couple of loopholes in this limited solution and move on to another problem.

First, we used zeroes in the indexed view to indicate that there is no such step in the recipe. In our example both recipies had three steps, so the indexed view had zeroes in Step4 and Step5 columns in both rows. To distinguish between missing steps and valid step IDs, we need to make sure that zero is not a valid ID:

 

ALTER TABLE dbo.RecipeSteps
ADD CONSTRAINT CHK_RecipeSteps_PositiveId CHECK ( RecipeStepId > 0 );

Also we need to make sure that step numbers in the recipe have no gaps, otrherwise we still can store the same recipe twice. The following script demonstrates the loophole:

 

BEGIN TRANSACTION ;

INSERT  INTO dbo.RecipesStepByStep
        
( RecipeId, StepNumber, RecipeStepId )
VALUES  ( 3, 1, 1 ),
        ( 
3, 3, 2 ),
        ( 
3, 5, 3 ) ;

SELECT  StepNumber ,
        
Intsructions
FROM    dbo.RecipeSteps AS rs
        
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE   sbs.RecipeId = 3 ;

ROLLBACK ;

StepNumber Intsructions
---------- --------------------------------------------------
1          Put eggs INTO pan, ADD water
3          Bring water 
TO boiling, boil 4 minutes
5          Remove eggs 
AND cool them down IN cold water
 

 

To close the loophole, we need to make sure that step numbers have no gaps. The following script enforces the rule:

 

ALTER TABLE dbo.RecipesStepByStep
ADD PreviousStepNumber AS CAST(CASE WHEN StepNumber > 1 THEN StepNumber-1 END AS SMALLINT) PERSISTED ;
GO

ALTER TABLE dbo.RecipesStepByStep
ADD CONSTRAINT FK_RecipesStepByStep_PreviousStep
FOREIGN KEY(RecipeId, PreviousStepNumber) 
REFERENCES dbo.RecipesStepByStep(RecipeId, StepNumber);

 

Conclusion

As we have seen, enforcing the uniqueness of ordered groups of rows with constraints is doable, although somewhat complex. I am not sure if I would use this approach with a large production table. I think that RDBMS is not designed to natively solve these kinds of problems - this is why solving this apparently simple problem is so complicated.

If you are using some platform other than RDBMS to solve such problems, please comment this post and share your experiences.

 

 

 

Published Monday, January 28, 2013 3:10 PM by Alexander Kuznetsov
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jason said:

Okay maybe I don't understand the problem statement clearly.  I can't help but restate it as "I have to store recipes and their associated steps, each recipe must have unique steps (meaning no recipes may have the same step number twice as in Step 2 Boil Water and Step 2 Boil water)"

This article claims that RDBMS are not equipped to handle this.  I disagree completely.  Create a table, tRecipe (iRecipeId INT, vcRecipeName VARCHAR(50),...).  Create another table tRecipeSteps (iRecipieStepId INT, iRecipeID,vcRecipeStepName VARCHAR(50)...) then add a unique constrain to (iRecipeId and iStepId)

Again, I may be misunderstanding the problem statement, but unique groups are very much within the design of RDBMS.

February 4, 2013 11:42 AM
 

AlexK said:

Jason,

What you are suggesting will ensure that every step can occur only once in a recipe. For example, you won't be able to flip your burger twice, or add one cup of flour twice. It will not prevent us from storing a recipe with exactly the same steps twice.

This is not what we want to accomplish.

February 4, 2013 11:57 AM
 

Gabriel said:

What you're essentially doing is creating a narrow table, then creating a pivoted version of that table in the indexed view. I don't know how this data is queried, but is this more beneficial than creating the original table as a pivoted table (RecipeId, Step1Id, Step2Id, Step3Id, Step4Id,...) and creating a unique index on the Step1Id columns?

July 11, 2013 9:50 AM
 

AlexK said:

Gabriel,

A pivoted table would violate 1NF, which would cause all kinds of problems with data integrity.

July 11, 2013 1:47 PM
 

Geri Reshef said:

I would use XML data type to store the receips:

--Create the table

If Object_ID('tempdb..#RecipeSteps2','U') Is Not Null Drop Table #RecipeSteps2;

Go

Create Table #RecipeSteps2(ID Int Primary Key,

Recipe XML,

RecipeIdx As Cast(Recipe As Varchar(900)));

Create Unique Index IX_#RecipeSteps2 On #RecipeSteps2(RecipeIdx);

Go

--Insert the receipes

Insert

Into #RecipeSteps2(ID,Recipe)

Select 1 ID,

(Select *

From (Values (1,1),

(2,2),

(3,3)) UniqueColumn(StepNumber, RecipeStepId)

For XML Raw('Recipe'), Root('Root')) Recipe;

Insert

Into #RecipeSteps2(ID,Recipe)

Select 2 ID,

(Select *

From (Values (1,1),

(2,3),

(3,2)) UniqueColumn(StepNumber, RecipeStepId)

For XML Raw('Recipe'), Root('Root')) Recipe;

--A duplicate receipe is rejected

Insert

Into #RecipeSteps2(ID,Recipe)

Select 3 ID,

(Select *

From (Values (1,1),

(2,2),

(3,3)) UniqueColumn(StepNumber, RecipeStepId)

For XML Raw('Recipe'), Root('Root')) Recipe;

--Data extracrtion

Select * From #RecipeSteps2;

Select RS2.ID,

RS2xml.Loc.query('.').value('(/Recipe/@StepNumber)[1]','int') StepNumber,

RS2xml.Loc.query('.').value('(/Recipe/@RecipeStepId)[1]','int') RecipeStepId

From    #RecipeSteps2 RS2

Cross Apply Recipe.nodes('/Root/Recipe') As RS2xml(Loc)

Order By RS2.ID,

StepNumber;

September 9, 2013 2:52 PM
 

AlexK said:

Geri, if we do not store rows in normal tables, we throw away a lot of data integrity which could be enforced by normalization and constraints. This is why I cannot imagine a case when we would care for data integrity and go for your solution.

September 10, 2013 10:49 AM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works as an agile developer.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement