THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

INSERT SELECT is Broken. Ask Microsoft to Fix It.

Imagine that you're moving thousands or millions of rows between two tables. Maybe it's between a staging table and a data warehouse in an ETL process. Maybe you're manipulating some data via a temp table as you're preparing lookup data for your OLTP system. Maybe you're preparing data for an end user.

It doesn't really matter what your use case, because there are so many of them. We, as database developers, spend all day moving data back and forth.

Unfortunately, sometimes our processes break. It's just a fact of life. Sometimes a bad row will sneak in and will violate a constraint, or won't be implicitly convertable to the target data type, or whatever. It happens all the time, and when it does debugging the problem is excruciatingly painful and overly time-consuming.

The error messages aren't great, it's often difficult to identify the problem row, and so we wind up manually editing out parts of code and slogging through row after row hoping to find the issue.

When this happens, the entire INSERT, or UPDATE, or whatever breaks. Even if we would have really liked all of the rows except the single bad row to get through.

Wouldn't it be nice if there were a better way to tackle this problem?

Someone named Dan Holmes thought up the perfect solution, and posted a Connect item

Microsoft, in its infinite wisdom, has responded that this scenario is "not common enough," and has stated that it will be closing the issue. Let's not let that happen.

If you agree that this is a major concern and that this feature would help you, please click on the link above and vote. Leave a comment, too.

Thank you.

 

Note: This doesn't just apply to INSERT SELECT, but I wanted to get your attention. And if you're reading this, it worked!

Published Wednesday, May 15, 2013 1:42 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AlexK said:

Such scenarios just do not happen in Open Source products. If, for example, you want an improvement in PostgreSql, you can just go ahead and do it yourself, or ask somebody to do it for you. You do not have to plead online.

You may help yourself and have a robust working solution within just a few days.

May 15, 2013 1:41 PM
 

Ben Thul said:

The open source argument is a bit of a strawman. While you're absolutely right that you're free to d/l the source and hack whatever feature you want in there, the maintainers are under no obligation to accept your patch once it's complete. At that point, you have a couple of options: fork (and subsequently maintain) the project or keep your own private version and it's either static from that point forward or you have to keep accepting patches from the mainline into your local branch. Don't get me wrong: I *love* open source. But to say that it doesn't suffer from these sorts of problems is idealizing the process.

May 15, 2013 2:27 PM
 

AlexK said:

Open source does not have the following problems: I do not have to plead online with someone I do not know, and who can dismiss my needs as "not a big issue", which is an exact quote. I do not have to wait for indefinite time for a fix I need.

In other words, I am in control, and can get the fix I need soon, which is often very cheap as compared to having to live without the fix and come up with workarounds every time I encounter the problem.

Of course, my fix might not make it into the main version. I might choose not to submit it, or it may be rejected. That is a different sort of problem. Anyway, I am not completely helpless, my needs are not dismissed, I can make my own choices, and I can have my problem resolved quickly.

Either way, merging from a branch into another with git is very easy, and overall I can be much better off even with periodic merge, as compared to having to implement hundreds of repetitive workarounds.

May 15, 2013 2:49 PM
 

jchang said:

In the deep past, SQL 2K time frame, when SQL wish implemented many of my suggests. Since 2008 or so, everything has come back: not common enough.

Of course they are considering multi-core processors not common enough,

so I regard Connect as a waste time,

either work around the problem, or look for another product

May 15, 2013 3:35 PM
 

Adam Machanic said:

Alex, let's be realistic. Most IT people have neither time nor inclination to spend hours and days doing the in-depth study required to get to the point where they can make such a modification -- let alone actually doing it. Most open source users don't ever actually look at the source code. They download and use a precompiled version.

Sure, maybe YOU can implement a patch like the one described in this Connect item. But the VAST majority of people can't, and won't. And so your argument completely breaks down.

No matter what product you use, unless you happen to be both technically elite and have a huge amount of free time, you absolutely ARE dependent upon others to make sweeping changes.

May 15, 2013 3:54 PM
 

Dave said:

Do you mean something like DML Error Logging in Oracle?  http://www.orafaq.com/node/76

This has been a feature of Oracle since 10gR2, why would Microsoft say it isn't common if Oracle has had the feature since 2005?

May 15, 2013 3:59 PM
 

AlexK said:

Adam,

I am being absolutely realistic. I did make a few changes to a rather complex open source library, so I know what's involved. I did not bother to submit it for public use, so I merge my changes a few times every year. No big deal.

I know quite a few people who have paid for various customizations, and it was not very expensive, nor it was difficult to find someone capable of doing what they wanted.

Overall, it may be cheaper to pay and get your problem fixed, than to waste time on workarounds.

May 15, 2013 4:25 PM
 

MattK said:

Could those who made the decision to close the issue have a lack of understanding about typical set based operations in an RDBMS, and thinks that rows are always processed in a loop one-at-a-time?

Or perhaps have been given the agenda of not approving features that conflict with SSIS, which has this mechanism, but is a poor substitute for anything that can be performed more efficiently in T-SQL?

May 15, 2013 7:09 PM
 

Ian Yates said:

The other approach to this is to file a design change request via the MS support people rather than via Connect.  The former costs but you get to at least correspond one on one with someone via email about it.  I'm go to Connect now to vote :)

I suppose MS would say that we should use SSIS for this sort of thing, but frankly a few SQL statements is often much faster to develop, test, deploy and execute whilst being easier to comprehend and less likely to break between versions.

May 15, 2013 7:31 PM
 

Steven Durette said:

Even if they would just tell me in the error the true reason the data couldn't be inserted and the offending row. For example, If the data violated a constraint tell me the constraint name and the offending row. If it was because of passing a datetime2 date into a smalldatetime column, tell me that (instead of just something like string or binary data would be truncated) and tell me the row of data. If I don't have to spend hours searching for an offending row then I get to be more productive. That makes my company more profitable, which means more budget, which means I can buy new versions of software!  Come on make it easier for my to by your stuff!.

May 16, 2013 10:55 AM
 

sql_noob said:

i've had this happen in SSIS and it outputs the error and the row where it happened. or at least enough info to make the offending data fairly easy to find.

May 16, 2013 11:04 AM
 

dan holmes said:

Thanks for reposting Adam.  your reach is much larger than mine.

May 16, 2013 11:34 AM
 

aspiringgeek said:

Got my vote.  Lordie, the amount of time I’ve squandered troubleshooting these issues can’t be measured.  @MattK, the SQL PM who's closing the item is a competent, indeed brilliant engineer.  It's too bad he's unable to push this through.

May 16, 2013 6:08 PM
 

bstamos said:

Agree with Adam. AlexK can safely be considered an exclusion to the rule; and not the norm.

May 17, 2013 3:40 PM
 

Geoff said:

Regardless of the open source idea, this idea is long past due and would save people countless man-hours which could be better spent doing something else.

May 21, 2013 4:07 PM
 

Wes Brown said:

Hey AlexK, while you are adding this to PostgreSQL can you build out parallel execution plans? There are reasons we buy products over other products or open source solutions. Modifying a library is a FAR cry over modifying something as complex as an optimizer or data engine. I know, I have written both, yet I still stand by SQL Server and other commercial offerings in this area. I also use PostgreSQL where I can as well but your argument in this case is a complete troll on this topic. I buy SQL Server for the same reason I don't build my own cars or planes. Others do it better.

-wes

July 23, 2013 2:34 PM
 

Cody said:

I'm a little confused about exactly how this is going to be implemented.

I've noticed that the error messages have changed slightly from version to version of SQL. In 2012 you often (but I'm not sure if always, because sometimes it's missing) get told in the returned error message what the constraint was and what the duplicate primary key was.

Except yes, by the time it happens, that information is already too late because the original data source has sometimes been lost and you need to recreate it.

July 23, 2013 10:40 PM
 

dan holmes said:

October 15, 2014 8:16 AM
 

Adam Machanic said:

@dan

Thanks for the update. I'm totally gutted. Sent a personal e-mail to my last contact on the programmability team. (Everyone else I knew has left!) Worth a shot at least. :-(

--Adam

October 15, 2014 10:23 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement