THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Learning PostgreSql: UPDATE...FROM works differently

UPDATE...FROM command on PostgreSql may raise no errors, but produce completely different results. Later I shall provide a repro, but first let us briefly refresh how UPDATE...FROM works on Sql Server.

UPDATE...FROM on SQL Server ignores ambiguity

We shall need the following test data:

CREATE TABLE #Problems(
ProblemID INT NOT NULL, 
Problem VARCHAR(30) NOT NULL,
Solution VARCHAR(30) NULL);

CREATE TABLE #SuggestedSolutions(
SuggestedSolutionID INT NOT NULL,
ProblemID INT NOT NULL, 
IsAccepted CHAR(1) NULL,
Solution VARCHAR(30) NOT NULL);

INSERT INTO #Problems
        
( ProblemID, Problem, Solution )
VALUES  ( 0, 'Washer won''t drain', NULL),
( 
1, 'Kitchen looks dirty', NULL);

INSERT INTO #SuggestedSolutions
        
( SuggestedSolutionID ,
          
ProblemID ,
          
IsAccepted ,
          
Solution
        
)
VALUES  
(0, 0, 'Y', 'Rebalance the load'),
(
1, 0, 'N', 'Turn washer off then on'),
(
2, 1, 'N', 'Turn off the light'),
(
3, 1, 'Y', 'Clean up the kitchen');

While we are at it, let us also make sure that at most one proposed solution per problem can be accepted:

CREATE UNIQUE INDEX OneSolutionPerProblem 
ON #SuggestedSolutions(ProblemID) 
WHERE (IsAccepted='Y');

The following update ignores ambiguity: two proposed solutions match each problem, and the database engine picks one of them to update. It does not raise any errors. I have no idea how the engine chooses the value to update in case of ambiguity, so the output on your server may be different:

UPDATE #Problems SET Solution = s.Solution
FROM #Problems AS p, #SuggestedSolutions AS s 
WHERE p.ProblemID = s.ProblemID;

SELECT * FROM #Problems;

0  Washer won''t drain   Rebalance the load
1  Kitchen looks dirty   Turn off the light

We can narrow down the WHERE clause, so that there is no ambiguity at all, and the results of UPDATE are predictable:

UPDATE #Problems SET Solution = s.Solution
FROM #Problems AS p, #SuggestedSolutions AS s 
WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';

SELECT * FROM #Problems;

0  Washer won''t drain   Rebalance the load
1  Kitchen looks dirty   Clean up the kitchen
  

As we have seen, SQL Server interprets the FROM clause in this UPDATE as a correlated subquery - if it uniquely identifies a matching row, we can predict the results of UPDATE command.

PostgreSql interprets the same UPDATE...FROM differently

Let us set up test data again:

CREATE TEMP TABLE Problems(
ProblemID INT NOT NULL,
Problem VARCHAR(30) NOT NULL,
Solution VARCHAR(30) NULL);

CREATE TEMP TABLE SuggestedSolutions(
SuggestedSolutionID INT NOT NULL,
ProblemID INT NOT NULL,
IsAccepted CHAR(1) NULL,
Solution VARCHAR(30) NOT NULL);

CREATE UNIQUE INDEX OneSolutionPerProblem
ON SuggestedSolutions(ProblemID)
WHERE (IsAccepted='Y');

INSERT INTO Problems
        
( ProblemID, Problem, Solution )
VALUES  ( 0, 'Washer won''t drain', NULL),
(
1, 'Kitchen looks dirty', NULL);

INSERT INTO SuggestedSolutions
        
( SuggestedSolutionID ,
          
ProblemID ,
          
IsAccepted ,
          
Solution
        
)
VALUES  
(0, 0, 'Y', 'Rebalance the load'),
(
1, 0, 'N', 'Turn washer off then on'),
(
2, 1, 'Y', 'Turn off the light'),
(
3, 1, 'N', 'Clean up the kitchen');
Note: OneSolutionPerProblem used to be called a filtered index in SQL Server universe. In PostgreSql, it is a partial index.
Let us rerun the second update, which was producing predictable results on SQL Server:
UPDATE Problems SET Solution = s.Solution
FROM Problems AS p, SuggestedSolutions AS s
WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';

SELECT * FROM Problems;

0;"Washer won't drain""Rebalance the load" 1;"Kitchen looks dirty""Rebalance the load"
See how the second problem was updated with a solution for the first one? 
The reason is simple: PostgreSql interprets the FROM clause of this UPDATE as the following uncorrelated subquery:
SELECT s.Solution  FROM Problems AS p, SuggestedSolutions AS s 
WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';
So every row in its output is a potential match for every row in Problem, which is why we are getting these results.
To demonstrate this behavior one more time, let us add a problem without any suggested solutions, and rerun the update:
INSERT INTO Problems
        
( ProblemID, Problem, Solution )
VALUES  ( 2, 'Fridge is empty', NULL);
UPDATE Problems SET Solution = s.Solution
FROM Problems AS p, SuggestedSolutions AS s
WHERE p.ProblemID = s.ProblemID AND s.IsAccepted='Y';

SELECT * FROM Problems;
0;"Washer won't drain";"Rebalance the load"
1;"Kitchen looks dirty";"Rebalance the load"
2;"Fridge is empty";"Rebalance the load"
Rewriting UPDATE..FROM
The following script shows the correct way to update in PostgreSql
-- erase wrong values
UPDATE Problems SET Solution = NULL;

-- the correct UPDATE
UPDATE Problems SET Solution = s.Solution
FROM SuggestedSolutions AS s
WHERE Problems.ProblemID = s.ProblemID AND s.IsAccepted='Y';

SELECT * FROM Problems ORDER BY ProblemID;

0;"Washer won't drain""Rebalance the load" 1;"Kitchen looks dirty""Turn off the light" 2;"Fridge is empty"""
Conclusion
To ensure correct results, all UPDATE...FROM commands need to be rewritten, because UPDATE...FROM is interpreted differently by PostgreSql
 

 

Published Wednesday, November 20, 2013 12:33 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

tobi said:

Wow is there any RDBMS on the planet that has consistently well-designed semantics? SQL Server and Postgres each have their own horrible quirks.

November 21, 2013 8:06 AM
 

Alexander Kuznetsov said:

Hi Tobi,

Can you suggest how would you implement the functionality equivalent to UPDATE ... FROM?

November 21, 2013 8:13 AM
 

Adam Machanic said:

Alex: UPDATE ... FROM should not be supported at all, by either DBMS. Subqueries and/or row-value constructors are defined in the standard as the proper way to handle this. There is absolutely no reason for ambiguous syntax here!

November 21, 2013 9:54 AM
 

Alexander Kuznetsov said:

Adam,

I am with you on row-value constructors.

However, if we need to update multiple columns via subqueries, performance is going to suck, or the optimizer needs to be very smart to recognize redundancies - and performance will still suck anyway, because the optimizer will become over-complicated ;).

November 21, 2013 10:18 AM
 

Adam Machanic said:

Alex:

Agreed, subqueries are definitely a tricky area. Not just for the optimizer but also for people writing the SQL. An UPDATE using subqueries for both restriction and projection is effectively twice as much code to maintain as compared with an UPDATE FROM or row-value constructor approach.

Anyway, whether or not the optimizer sucks is an implementation detail, not a concern of the standard. The only concern of the standard is creating a language that is more or less unambiguous. I believe they've done a good job of that, and then every vendor has completely messed up various pieces.

--Adam

November 21, 2013 10:56 AM
 

Alexander Kuznetsov said:

Adam,

I think that usually common mistakes indicate design flaws. If two vendors with so very different attitudes messed up with one and the same part of the standard, then maybe, just maybe the standard was flawed?

Good standards - and good designs - should be concerned whether they are implementable with reasonable effort or not, otherwise practitioners get around them rather then comply with them. Which is exactly what we are observing in this case, are we not?

November 21, 2013 11:19 AM
 

Alexander Kuznetsov said:

I think I accidentally deleted Adam's next comment: "Alex:

Nope, that's not at all what we're observing. RVC should be relatively easy to properly implement. At least as easy as UPDATE FROM, if the DBMS in question already supports correlated subqueries. It's just not a "hot" feature, nor have many vendors bothered with it, so most simply don't prioritize it. Better to implement XML support or Cloud support or some other buzzword-of-the-moment, which will drive a lot more uptake than will some little feature to make developers' lives easier.

--Adam"

November 21, 2013 12:19 PM
 

Alexander Kuznetsov said:

Adam,

I could ask why RVC is not implemented here, and so can you: http://postgresql.1045698.n5.nabble.com/PostgreSQL-hackers-f1928748.html

Also anyone can download the source code and see if this is easy to implement or not. What do you think?

November 21, 2013 12:46 PM
 

Adam Machanic said:

Alex, if you like reading other people's source code, then go for it. I have better things to do with my time :-)

November 21, 2013 12:54 PM
 

Alexander Kuznetsov said:

November 21, 2013 2:11 PM
 

AlexK said:

November 22, 2013 4:34 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement