THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

  • Tabular Data Packages and a BIML challenge

    Update. The challenge set below has been responded to by Scott Currie who also happens to be the inventor of BIML. He has made his work  available at https://github.com/bimlscript/BETDPI

    In February 2004 I joined a company called Conchango and had the pleasure of working with some very very smart people (Mick Horne, Dan Perrin, Pete Spencer, Paul Mcmillan, Steve Wright amongst others) who schooled me in the ways and means of building data integration solutions. One piece of IP that Conchango had revolved around a means of transporting a collection of data files in a well-known format. We called it the "Generic Feed Interface" and it broke down like this:

    • Data was kept in CSV files. There was no limit to how many files were in a collection.
    • Each collection had two extra files, an inventory file and a metadata file. Both of these were also CSV files.
    • The inventory file listed all of the data files
    • The metadata file defined the data type of each column in each data file

    Here’s a very simple example of what this looked like, a collection consisting of one datafile:

    image

    image

    image

    As well as this format of defining data our tangible IP consisted of two DTS packages that could:

    • be pointed at a database and produce such a collection
    • be given a collection and import it into a specified database, creating target tables if required

    This was used to extract data from systems of record which could then later be imported elsewhere for additional processing (e.g. as a source for a data warehouse). We liked this approach because we were loosely-coupling our data warehouse from the systems-of-record, this brought benefits such as the ability to extract data from source at a different time from which it was loaded into the target. It was a great piece of IP and was used at a time which heralded a very successful period for us in the Microsoft BI space. Doubtless many of you reading this have invented and/or used similar mechanisms for moving data.


    It was with interest then that earlier this week I was alerted (by Matt Smith) to an effort to achieve the same called Tabular Data Package.

    image

    Tabular Data Packages include a JSON file for the same as which we used the Inventory and Metadata file but nevertheless its the same basic concept, it is a well-known format for transporting data in human-readable files. Tabular Data Packages are advocated by the Open Data Institute, see Jeni Tennison's blog post on the subject 2014: The Year of CSV.

    It occurs to me that it would be really useful to have a couple of SSIS packages that work with a Tabular Data Package in the same manner that our DTS packages worked with our data files collection all those years ago. Then again, using the SSIS object model to dynamically generate packages based on some known metadata (which is what would be required here) is notoriously difficult, better would be if there existed a code-generation tool for SSIS packages. Luckily such a thing exists, its called BIML, it is free and is becoming a very very popular means for developing SSIS solutions.

    Therefore I’m issuing a friendly challenge. Can anyone out there build a BIML script that can, given a database connection string, generate a Tabular Data Package containing all the data in that database? Furthermore, can someone also build a BIML script that can consume a Tabular Data Package and push all the data therein into a SQL Server database?

    The challenge is set. Is anyone game?

    @Jamiet

  • Why won’t you implement my little feature request, Microsoft?

    I’ve been an active member of the Microsoft developer community* for many years now and in that time one of my frustrations has been numerous futile attempts to get new features into SQL Server and SQL Server Integration Services. As of today, 27th March 2015, I’ve submitted 109 bugs and 289 suggestions

    SNAGHTMLa9a5190

    to Microsoft Connect and while there are occasional successes (TOKEN function in SSIS sticks in my mind as one feature request that led directly to inclusion in the product**) the most common response is “Closed as won’t fix” (which is almost a meme in itself these days). I’m as guilty as anyone at getting irate at such responses.

    In the cold light of day I like to think I can be empathetic kind of guy and can accept that getting a feature implemented in a behemoth product such as SQL Server perhaps is not as simple as a developer bashing out a few lines of code and committing to source control. With that in mind today (thanks to Paul Stovell) I stumbled upon a very old blog post from Eric Lippert  How many Microsoft employees does it take to change a lightbulb? which explains some of the contortions one has to go through to get a feature implemented in a Microsoft product and only one of these (the first one) actually involves writing the code.

    SNAGHTMLaaae882

    Now granted there is an argument to say they’re making it more difficult than it needs to be and 12 years on from that blog post shipping software is arguably easier than it was back then, but its worth remembering that implementing features for millions of customers is likely a different category of problem to any that us developers who use these products have to deal with (the product I work on arguably counts its customers in tens rather than millions). Call me a Microsoft apologist if you like, you might be right, but I do think that some people should perhaps weigh up what their “5 minute fix” actually entails before they start ranting about “Closed as wont fix” on social media. I include myself in that.

    @Jamiet 

    * assuming you consider SQL server developers to be members of the Microsoft developer community. That is up for debate of course.

    **I remember Matt Masson telling me at the time that he implemented this on a plane journey home to Canada one day

  • Analysing SQLBlog using Power Query

    In December 2011 I wrote a blog post entitled Querying RSS feed subscriber count on Google Reader using Data Explorer‏ in which I used a product called Data Explorer, that was in beta at the time, to find out how many people subscribed on Google Reader to various bloggers on http://sqlblog.com. Here were the results:

    It was a nice demo of what was possible using Data Explorer however it did have some limitations, mainly that I had to type in the list of bloggers and their URLs. It would have been much more elegant to have Data Explorer request a list of bloggers from http://sqlblog.com and iterate over that list, querying for information about each in turn.

    Times move on and today, 3 and a bit years later, Google Reader is defunct and Data Explorer has morphed into Power Query. It looks different and has lots of new features which make it much more conducive to garnering information from the web. I thought it might be time well spent (and kinda fun) to revisit my previous blog post and see if Power Query can collate more useful information regarding http://sqlblog.com than it could back then.


    tl;dr

    If you can’t be bothered reading the detail of this post then just take a look at this screenshot that I shared on Twitter earlier today of a Power BI dashboard that shows information pertaining to http://sqlblog.com:

    image

    This dashboard shows:

    That screenshot is taken from http://powerbi.com which is a place where Power BI reports (that are based on Power Query queries) can be stored and shared. Unfortunately they can only be shared with people whose email domain is the same as yours so I can’t share a link directly to the page which is a real shame because Power BI reports are interactive and the benefit is much greater when one is able to interact with them. If you would like this situation to change (and you should) then go and vote for these requests:

    The real work here goes on in the Power Query query that retrieves the data that underpins these reports. If you’d like to know more about that, read on!

    The Power Query part

    Power Query is, to all intents and purpose, a draggy-droppy-pointy-clicky UI over a functional language called M. The M queries that I built to obtain the data depicted above I have shared in this github gist: https://gist.github.com/jamiekt/bac21f6169a9853dcef5 or you could download as a Power BI Designer file: sqlblog.pbix. Whether you copy-and-paste the M code or you download the .pbix you will need to download and install Power BI Designer in order to run the queries and see the resultant data.

    One caveat to running the queries, ensure you have created a Data Source setting on your machine for http://sqlblog.com, it should look just like this:

    image

    Retrieving data from a website using Power Query is essentially a web scraping exercise. Here are the steps the M query basically goes through to retrieve this information:

      SNAGHTML34123979

      • For each link in the list of monthly archives
        • retrieve the monthly archive
        • for each blog post summary in the monthly archive
          • retrieve the title and number of comments
      image

    That’s pretty much it. The code looks a bit gnarly at first glance but all its essentially doing is navigating the DOM and following a few hyperlinks.

    The reports

    The reports are Power View reports. Power View is fairly intuitive so I don’t think its worth talking about it here too much. I do however think it is worth showing some examples of the insights that Power View can provide on top of this data.

    How many blog posts and how many comments on SQLBlog

    image image

    Slicing by blogger

    Charting all of the blog posts and ordering by number of comments is useful, that’s how we see that Marco Russo’s post DateTool dimension: an alternative Time Intelligence implementation is the most popular:

    image

    However if we introduce a slicer we can analyse a single blogger’s blog posts, let’s take Aaron Bertrand for example seeing as he has posted the most:

    image

    Aaron’s most popular blog post (going by the number of comments) is his ditty from October 2010 Fun with software : uninstalling SQL Server 2008 R2 Evaluation Edition.

    How many bloggers ARE there on http://sqlblog.com

    image

    62!

    Taking it further

    There is a heck of a lot of other data available on http://sqlblog.com should one want to go and fetch it and Power Query makes it really easy to do that. It would be interesting, for example, to chart the tally of blog posts per month and see when “peak blog month” occurred. I’ll leave that as an exercise for the reader.

    @Jamiet

  • Implementing a build and deploy pipeline for Cronacle

    In my new role I am working on a system that makes heavy use of Redwood Software’s Cronacle. If you hit that link you’ll find a lot of marketing mumbo jumbo about Cronacle such as:

    • Cronacle gives you greater process consistency and quality
    • speed up your mission critical IT and business processes by as much as 90%

    Read into that what you will, to me Cronacle is a scheduler and orchestration tool. That might come across as slightly belittling but its not intended to be, in my (limited) experience Cronacle does a couple of things and does them very well (certainly in comparison to the tool I’m more familiar with, SSIS, that’s a comparison I hope to explain more fully in a future blog post).

    In the aforementioned post New year, new beginnings I said

    My aim is to instil my beliefs about continuous integration, unit testing, failing fast, on-demand provisioning into my new team

    which, to use a buzzword du jour, might well have been written as “My aim is to instil a devops culture into my team”. First step has been to integrate Cronacle into our build/deploy pipeline and in this post I’d like to describe how one goes about doing that.

    Scene setting

    We are using Cronacle v9. We are developing Job Chains and Job Definitions (aka Process Chains and Process Definitions) and it is those objects that we want to:

    1. store in a version control system (aka source control)
    2. compose into a single deployable artefact as part of a Continuous Integration (CI) build
    3. deploy to our various dev, test and prod environments in a repeatable fashion

    Storing in VCS

    The first pre-requisite to implementing a successful devops culture (in my opinion) is to use a version control system (VCS). I’m not fussy about which VCS one is using, as long as one uses one of them. We are using Subversion. My only stipulation is that no code in our build and deployment pipeline relies on us using Subversion because I don’t want to be tightly coupled to a particular technology.

    Typically one stores human-readable files in a VCS and I was keen that we did the same for our Cronacle Job Chains and Job Definitions. Cronacle enables one to export objects:

    SNAGHTMLa54c3a2

    when selecting that option a binary file with a .car extension is downloaded:

    image

    .car file is shorthand for “Cronacle Jar”. Jar files are (as I understand them) used in the Java ecosystem as a mechanism for distributing stuff, they’re basically just .zip files with a different extension. Hence, get hold of some compression software that understands the .zip format (I recommend 7zip*), unzip your .car file and in the unzipped folder you’ll find a JobDefinition folder containing an XML file that defines the object that was exported:

    image

    That XML file is the human-readable file that we check in to our VCS.

    Building

    We use TeamCity on which to run our CI builds (its my first time using TeamCity in anger and I have become an unabashed fan) in combination with an internally-built build automation framework called PSP.Build that is built atop PowerShell and Psake. PSP.Build is loosely based on Lloyd Holman’s OneBuild and follows the principles that Lloyd sets out at Run the same build process everywhere. We also use Pester for unit testing our Powershell code - I’m loving me some Pester.

    Building for Cronacle means reconstituting those XML files that we have in Subversion into a deployable .car file. The easiest way to do that is simply to zip them up for which, again, we are using 7zip. In addition there are some other files that need to be bundled into the .car file, one can see those files in the aforementioned .car file that is used for exporting from Cronacle.

    image

    You can simply take a copy those files, they don’t need changing. (We bundle them inside PSP.Build so that we always have them available to our build process.)

    So in summary, in order to build a .car file you need three things:

    • XML files that define the Cronacle objects
    • A zip utility
    • com and META-INF folders

    Once you have them it just takes a bit of PowerShell to bundle them up. Here’s a direct copy of our code that does that:

    Remove-CarFiles -sourcePath $sourcePath #Simply deletes all the .car files it finds underneath $sourcePath

    Get-ChildItem -path "$basePath" -Recurse | Where-Object {$_.Name -eq "CronacleCarIncludes"} |

           Get-ChildItem | Copy-Item -Destination "$sourcePath\$nameofFolderToBeZippedUp" -Recurse -Force #fetches "com" & "META-INF" folders

     

    $zipperExe = Get-ChildItem -Path $basePath -Include "7za.exe" -Recurse |

           Select-Object -First 1 | foreach {$_.FullName} #find 7zip, which is bundled in our build framework, PSP.Build

    if ($zipperExe -eq $null) {throw "7za.exe could not be found in $basepath"}

     

    "Creating .car for sourcePath: $sourcePath\$nameofFolderToBeZippedUp" | Write-Verbose

    Push-Location "$sourcePath\$nameofFolderToBeZippedUp"

    & $zipperExe a -tzip "$nameofFolderToBeZippedUp.car" * #zip it all up!

    Pop-Location 

    [I’m slightly reluctant to show my PowerShell code in public as it nearly always get criticised, so be gentle OK Smile]

    Deploying

    So, the output from building all our Cronacle objects is a .car file. How do we deploy that .car file to our Cronacle cluster? Its actually quite simply, we execute our .car file using java.exe and pass in a few parameters. The Cronacle documentation informs us of the generic form of the command and gives an example:

    image

    (This documentation does not appear online anywhere so I can’t link to it I‘m afraid)

    The smarts to say what to do with all the Cronacle object definitions is within that com folder that was mentioned above, hence its imperative that that com folder is included in the .car file.

    Again, we wrote a load of PowerShell code to automate the process of building up the command-line and executing it, here’s that code if you’re interested:

    function Invoke-DeployCronacleCarFile () {

        [CmdletBinding()]

        Param(

            [Parameter(Mandatory=$true)][System.IO.FileInfo]$carFilePath,

            [Parameter(Mandatory=$true)][System.IO.FileInfo]$javaExeFilePath,

            [Parameter(Mandatory=$true)][string]$server,

            [Parameter(Mandatory=$false)][string]$username,

            [Parameter(Mandatory=$false)][string]$password,

            [Parameter(Mandatory=$false)][string]$importRuleset

        )

        try {

            if ($carFilePath.Extension -ne ".car") {throw "File: '$carFilePath' is invalid. File must have a .car extension."}

            if (-not (Test-Path $carFilePath)) {throw "File: '$carFilePath' does not exist."}

           

            $cmd =  "$javaExeFilePath"

            $arguments = '-jar',"$carFilePath", '-server', "$server", '-txt','-ruleset',"$importRuleset"

     

            if (-not ($username -eq $null -or $username -eq "")){$arguments += '-username', "$username"}

            "cmd: $cmd" | Write-Verbose

            "arguments: $arguments" | Write-Verbose

            if (-not ($password -eq $null -or $password -eq "")){$arguments += "-password","$password"}

            Invoke-CommandLine -cmd $cmd -arguments $arguments

        }

        catch {

            throw

        }

    }

    function Invoke-CommandLine ([string]$cmd, [string[]]$arguments, [string]$errorMessage = "Error executing command: " + $cmd) {

        & $cmd @arguments #| write-host

      if ($LastExitCode -ne 0) {

        throw $errorMessage

      }

    }

    That may look slightly daunting but there’s not too much going on here. It takes the path to a .car file, the path to java.exe, server, username, password and an ImportRuleSet**, concatenates things together into a bunch of arguments, then sends it to Invoke-CommandLine to be executed.

    Summary

    That’s pretty much it. The nuts and bolts of this is relatively simple, you’re simply zipping your definitions up and deploying that zip file using java.exe.

    Hope this helps.

    @Jamiet 

    * We bundle 7zip inside PSP.Build so we don’t have to rely on the actor who kicks off the build (which would be TeamCity or one of our team members) having it installed - I’m pretty fierce about reducing external dependencies for our build/deploy pipeline. If its a pre-requisite then it gets bundled into PSP.Build.

    ** An ImportRuleSet defines environment-specific settings. A discussion of ImportRuleSets is outside the scope of this article however if you’ve come this far you’ll probably already know whether you need to be using them or not.

  • Test for a warning message using Pester

    Here’s a little ditty that’s worth throwing out (if nothing else so that I can find it later), if you’re not using Pester to test your PowerShell code then this post probably isn’t for you. If you’re not using PowerShell stop reading now.

    I wanted to write a Pester test that tested whether or not a piece of code threw a warning or not. I discovered from Understanding Streams, Redirection, and Write-Host in PowerShell that it is possible to redirect stuff in the PowerShell Warning (or Verbose or Debug) stream to the output stream. Here’s an example of doing just that:

    "Here is a message that I will send to the warning stream then redirect back to the output stream" | Write-Warning 3>&1 | %{$_.message}

    image

    The “3>&1” part says “take the contents of stream #3 (i.e. the Warning stream) and send it to stream #1 (i.e. the Output stream)”. Once its in the output stream you can operate upon it as normal.

    In my Pester test I wanted to know whether or not a particular piece of code returned a warning matching a given string. Using the above technique its simply:

    (Some-Code-That-Is-Getting-tested) 3>&1) -match "Desired warning message" | Should Be $true

    Cool stuff! I love Pester.

    @Jamiet

  • New OneDrive API, possibly unified with OneDrive for Business

    Microsoft offers two similar, but separate, services containing the OneDrive moniker. There’s OneDrive (which is free) and OneDrive for Business (which isn’t). Strangely (as my ex-colleague Mark Wilson points out at OneDrive for Business: lots of cloud storage; terrible sync client) the free offering is much better than the paid-for one.

    Microsoft state in their blog post Taking the Next Step in Sync for OneDrive on 7th January 2015:

    It was clear that the right approach was to converge to a single sync engine and experience that would be able to provide all of the benefits of the consumer and business service to all customers faster. We decided to start with the consumer sync engine foundation from Windows 7 and Windows 8 and add the right capabilities from the other two engines.

    That’s Microsoft-speak for “Yes, OneDrive for Business is shit so we’re dumping it in favour of the OneDrive stuff”. Good news.


    Back in October 2014 Microsoft announced the availability of a Files API that allowed developers the ability to access files stored in Office 365

    we’re enhancing the opportunity for developers with Office 365, with new APIs for mail, files, calendar and contacts. These new robust REST-based APIs empower all developers to leverage the more than 400 petabytes of data (as of March 2014) and popular services across Office 365 in any application.

    New Office 365 extensibility for Windows, iOS, Android and web developers

    I asked a couple of the guys that worked on the API would the files API work against OneDrive and/or OneDrive 4 Business? The answer from Chris Johnson was “working on it”:

    image

    That sounded like good news too.


    Yesterday Microsoft announced the availability of The new OneDrive API which is a RESTful API. Given that they are attempting to unify OneDrive and OneDrive for Business, and also given Chris Johnson’s response to my question in October, I wondered whether this API would cover OneDrive for Business as well so I decided to check out the new OneDrive API using the API console and found something quite promising. A request to https://api.onedrive.com/v1.0/drive/ which GETs a user’s default drive returns:

    image

    Note the mention of

    "driveType": "consumer"

    That says to me that there will be other driveTypes and so I assume that this API will be used to access OneDrive for Business stuff also. More good news if that’s the case. I tweeted that question but at the time of writing haven’t had an answer.


    Some questions still remain in my mind:

    • What’s the difference between this new OneDrive API and the files API announced back in October? Do we have two teams building essentially the same thing? That sounds like the bad old Microsoft of old so I do hope not.
    • Are we ever going to get a RESTful API that will enable us to push data into an Excel workbook that is stored on OneDrive? Chris Webb and I have been asking for this for five years now. Again I asked the question on Twitter and am still hoping for a response. I do have high hopes that the new Power BI REST API will sate my needs here (right now I wouldn’t know because if one isn’t in the US then one cannot access it).

    @Jamiet

  • Response to “SSDT limitations”

    A commenter on my blog post Considerations when starting a new SSDT database project asked me to comment on his blog post SSDT Limitations. i wrote a response but when I tried to post it I got told it was too long:

    image

    so I’m posting my response here instead. I’d suggest taking a read of the commenter’s post before reading on.


    "SSDT does not handle multiple file groups"
    I haven't experienced this. I'm pretty sure I've used SSDT to deploy multiple filegroups. Can you clarify what is meant by "does not handle"?

    "SSDT wants to control everything"
    I'd restate that as "SSDT gives you the option to control everything". If you don't want to put all your database objects into your SSDT project, then don't.

    "Production will not look like UAT. For SSDT, everything looks the same."
    Yes, this is a problem. SSDT espouses a mantra of "build once, deploy anywhere" but this breaks down if you want something to be different on each environment - security is the obvious thing here. My approach is to specify roles in the database project that have permissions assigned to them and then later (perhaps in a post-deployment script) add users into those roles. I describe this more fully here: A strategy for managing security for different environments using the Database Development Tools in Visual Studio 2010

    "This means storing in version control production:
    -Logins & passwords
    -app role passwords
    "
    If you're storing passwords in source control then that must mean you're using SQL authentication instead of Windows authentication - that's your decision and I don't see how that is a shortcoming of SSDT. If you don't want to store them in source control - don't. SSDT has mechanisms that allow you to specify values at deployment time (search for SSDT sqlcmd variables). Bottom line, if you're storing passwords in source control then more fool you - nothing in SSDT forces you to do this.

    "It is generally preferred to at least have the option to inspect what will be deployed"
    Correct. And that is why (as you pointed out) SSDT has "the option of either publishing directly to the database or generating a script to run for the deployment."

    "Unfortunately for SSDT the upgrade generates a single file. In practice this means is that if the database upgrade fails for any reason recovery is rather complex. You'll need to work out where the upgrade script failed and then what action you will take from there."
    True. There are things you can do to mitigate this:
    -Test your deployments on a copy of your production database first
    -Do small, frequent releases rather than the big bang approach of infrequently deploying large, monolithic sets of lots of changes

    "The product feels rather like a beta release, bringing even a simple project into production will expose some pretty serious limitation."
    Beta release (to me) implies that it doesn't behave the way its been designed to. I put it to you that it behaves exactly the way its been designed to, it just so happens that that you don't like that behaviour. Fair enough if that's the case, you're not alone in that regard. There are other options for doing deployments if you don't like the way that SSDT espouses. Perhaps look at migrations (https://msdn.microsoft.com/en-gb/data/jj591621.aspx?f=255&MSPPError=-2147217396) or the old-fashioned method of manually writing change scripts.

    "However the largest issue is more fundamental: the approach where you are expected to trust the magic box to generate a valid upgrade script."
    Yes, you do have to put your trust in SSDT. If you're not happy doing so, don't do it, or just use the option to generate the deployment script and not run it. I personally am happier putting my trust in a repeatable algorithm rather than human beings who tend to be error prone. That's just me, your mileage may vary.

    "It is very easy to generate scripts that will fail to upgrade the database successfully."
    There are certain changes that SSDT is not easily able to cater for (e.g. adding a non-nullable column to a table that contains data) however I'm not sure that constitutes "very easy to generate failing scripts". Personally I think SSDT is better at generating working deployment scripts than a human being is because its quicker and less error prone. As I've alluded, its imperative that you test your deployments before pushing to a production database.

    You asked me to comment and I have done. Clearly we have a difference of opinion about SSDT and that's OK. I hope you can find a way around your complaints because I honestly believe SSDT is the best tool out there for managing SQL Server deployments - I hope you come to the same conclusion one day.


    Regards
    Jamie

  • Dumb querying, Impala vs Hive

    I alluded yesterday that I wouldn’t be posting much on this blog for the foreseeable future but I’ve come across something today that was quite interesting so I’m already back.

    I’m using Hadoop and using Hive (version 0.12) to query the contents of some tables. Here’s the output from issuing “SELECT COUNT(*) FROM tablename”

    2015-01-07_10-19-04

    Your eyes don’t deceive you, that’s 223seconds to determine that this table has 0 rows in it. That staggered me. A colleague suggested I try the same query using Impala instead:

    2015-01-07_10-26-07

    Same result, but this time it came back in <1second.

    What’s the moral here? I’m not really sure. I won’t go out on a limb and say “Impala is quicker” because all I’ve proved is that its quicker to count the number of rows in an empty table which is a fairly pointless exercise. Hive is designed for running operations in a batch manner over massive data sets, nonetheless I think its interesting and I suspect some of my buddies in the SQL server community (whom I suspect will be the only ones reading this) will find it interesting also. Note that Hive v0.12 is old (came out in October 2013) and also that the Stinger initiative is in full flow which promises much much faster response times to Hive queries like these:

    image

    @Jamiet

  • New Year, new beginnings

    2014 was a year of great change in my personal life and 2015 has begun with a significant change in my work life, I have decided to end my 5-year sojourn into the world of freelancing by taking up a permanent role at Dunnhumby. This is a significant move for me and I wanted to explain why I’d done it (a number of folks have been asking) and to note the implications.

    As I alluded above 2014 was a very turbulent year and one that I am glad to put behind me. My experiences in that year have caused me to re-evaluate my priorities in life and I have have taken the decision to “go permie” because it means I can spend more time with my daughters, Bonnie and Nicki; Dunnhumby’s office is not a long commute which means I can be home by 1730 every night – anyone who works in and around London will tell you that that is virtually unheard of and for me it more than justifies losing the obvious benefits of being a freelancer.

    Working at Dunnhumby is a great move for me too. For 15 years I’ve built a career in the ETL/Data Integration space where the systems I worked on were all about deriving value from data yet were secondary to the client’s core business. At Dunnhumby data integration is their business and that excites me, I’m going to be Lead Developer on a core system that feeds data to some of the world’s largest retailers (with a little research you can probably guess one of them) so there is a tangible opportunity to contribute to the bottom line. Moreover, its refreshing to go into a meeting where everyone is already talking your language, when shifting, aggregating and managing data is what the company does you invariably find that everyone you speak to has more than a passing appreciation of what you do – that’s a welcome change for me. Can you name another company where someone in HR knows what ETL stands for?

    There will be opportunities to push our solution into the cloud and anyone knows me in a work capacity knows that that is an area that excites me. My first aim is to instil my beliefs about continuous integration, unit testing, failing fast, on-demand provisioning and other associated buzzwords into my new team and then lead them on a journey to a truly world class solution. Check back in a few years to see how I did!

    One negative aspect of the new role is that my 15 year association with SQL Server has reached a hiatus. The system I’m working on is built on Hadoop, there’s no SQL Server in sight and that saddens me. I’ve loved every minute of my involvement with SQL Server and have met some wonderful people on that journey, some of whom have become very close friends. On the positive side my SQL Server experience is not going to go to waste as I’ll still be writing SQL day-in, day-out (on Hive and Impala) and the principles that I stand by when I build ETL solutions apply equally in a none-SQL Server landscape. I’m looking forward to learning more about the other half live.

    Blogging about SQL Server and associated interests has been an intrinsic part of my life for the last ten years and in many ways has come to define who I am, I wonder now how it will continue. I admit that my desire to blog has waned somewhat since the heady days of 2006 and 2007 when I was averaging almost one blog post every two days but I’m hoping I still have that spark that persuades me to commit my learnings in writing from time to time. I’ll also keep answering questions on the SSDT forum as I’m still a massive fan of that toolset and believe that its differentiating value to SQL Server hasn’t been realised as it should.

    Not much else to say on this subject really. See you on the other side. Or on Twitter, you can’t keep me quiet on there.

    @Jamiet

  • Parallel Foreach loops, one reason to use Powershell Workflow instead of SSIS

    Lately I’ve been using Azure Automation which is a service on Azure for running Powershell Workflow scripts, as such as I’ve diving deep into Powershell Workflow. What’s Powershell Workflow? I’m glad you asked:

    A workflow is a sequence of programmed, connected steps that perform long-running tasks or require the coordination of multiple steps across multiple devices or managed nodes. Windows PowerShell Workflow lets IT pros and developers author sequences of multi-device management activities, or single tasks within a workflow, as workflows. By design, workflows can be long-running, repeatable, frequent, parallelizable, interruptible, stoppable, and restartable. They can be suspended and resumed; they can also continue after an unexpected interruption, such as a network outage or computer restart.
    Getting Started with Windows PowerShell Workflow

    So Powershell Workflow does exactly what it says on the tin, its a workflow engine. Those of you that, like me, have been knee deep in SQL Server Integration Services (SSIS) for most of the past decade will know that SSIS too has its own workflow engine – its got a different name, Control Flow, but its still a workflow engine. One frequent ask of SSIS’s Control Flow is a Parallel ForEach Loop but there seems little hope that we’re going to get one (raised on Connect in March 2014, closed as Won’t Fix 5 months later) without going to 3rd party solutions like Cozyroc’s Parallel Loop Task.

    As the title of this post has already told you Powershell Workflow has a big advantage over SSIS, it includes a Parallel Foreach Loop out-of-the-box and you can read all about it at about_Foreach-Parallel. Or type “help about_Foreach-Parallel” into your nearest Powershell window. From there:

    The Parallel parameter of the ForEach keyword runs the commands in a ForEach script block once for each item in a specified collection.
    The items in the collection, such as a disk in a collection of disks, are processed in parallel. The commands in the script block run sequentially on each item in the collection.

    That’s fairly self-explanatory. If you’re interested to know more I’ve put together a cool little Powershell Workflow demo that makes calls to a URI, first serially (using a conventional Foreach Loop) then parallelly (which isn’t, I’m pretty sure, a real word but I’m using it anyway) and hosted it as a githib gist: Parallel-vs-Serial.ps1. You can simply copy-paste the script into Powershell ISE, hit F5 and you’re away. On my machine the serial calls completed in 19seconds, the parallel calls in 13seconds. No brainer!

    Take a look if you have 5 minutes. This is good stuff.

    @Jamiet 

    image

  • Learn from me about SSDT in London, February 2015

    Microsoft released SQL Server Data Tools (SSDT) along with SQL Server 2012 in the Spring of 2012. Since then I’ve noticed an upward tick in both the number of organisations that are using SSDT and the number of questions that are getting asked about it on forums. There is some confusion about what SSDT actually is (Microsoft hasn’t helped there), why people should be using SSDT and, most importantly, how to make best use of it. If you want to know more then a good place to start is my blog series 10 days of SSDT or Get to Know SQL Server 2012's SQL Server Data Tools on devproconnections.com.

    Its clear that people want to learn more about SSDT so if those articles don’t satiate you know that I have joined forces with Technitrain to offer a 2–day training course, in London, in February 2015 called Introduction to SQL Server Data Tools

    image 

    image

    The course will cover:

    Day 1

    • SSDT Positioning
    • IDE Tour
    • Connected Database Development
    • Declarative, offline, database development
    • Publishing
    • Continuous Integration (CI) and Continuous Deployment

    Day 2

    • Data Publishing
    • Refactoring
    • Database unit testing
    • References and composite projects
    • Database Drift
    • Code analysis

    If this sounds like your bag then please sign up on the Technitrain website. I ran this course for Technitrain in March 2014 after which 100% rated the trainer and the course content as outstanding. 100% also rated the course overall as outstanding.

    @Jamiet

  • Archiving Azure Automation logs to Azure BLOB Storage

    For the past few months at work I’ve been diving headlong into a service on Microsoft Azure called Azure Automation which is, to all intents and purposes, a service for hosting and running Powershell scripts (properly termed “runbooks”). I’ve got a lot to say and a lot to share about Azure Automation and this is the first such post.

    Each time Azure Automation runs a runbook (a running instance of a runbook is termed a job) it stores a log of that job however that log isn’t readily available as a log file on a file system as one might be used to. There are two ways to access the log, either via the Azure Management Portal at http://manage.windowsazure.com or programatically using Powershell cmdlets. Both are handy however I wanted the logs available as a file in Azure BLOB Storage so that they could be easily viewed and archived (in the future we’ll be building Hive tables over the top of the logs – remind me to blog about that later once its done). I have written a runbook called Move-AzureAutomationRunbookLogs that will archive logs for a given runbook and for a given time period into a given Azure BLOB Storage account and I’ve made it available on the Azure Automation gallery:

    image

    https://gallery.technet.microsoft.com/scriptcenter/Archive-Azure-Automation-898a1aa8

    It does pretty much what it says on the tin so if you’re interested hit the link and check it out. For those that are simply interested in reading the code I’ve provided a screenshot of it below too (suggestions for how to copy-paste Powershell code into Live Writer so that it keeps its formatting are welcome!!). There’s a screenshot of the output too if that’s the sort of thing that floats your boat.

    After this archiving runbook has run you’ll find two folders under the root folder that you specified:

    image

    and inside each of those is a folder for each runbook whose logs you are collecting (the folder contains a file per job). The “job” folder contains a short summary of each job however the “joboutput” folder contains the good stuff – the actual output from the run.

    One last point, if you’re going to use this runbook I highly recommend doing so in conjunction with a technique outlined by Eamon O’Reilly at Monitoring Azure Services and External Systems with Azure Automation. Eamon’s post describes a technique for running operations such as these on a schedule and its something that I‘ve found to be very useful indeed.

    Hope this is useful!

    @Jamiet

    image

    image

  • An appeal for SSDT developers – Red Gate want to hear from you

    I've been contacted by Filippa Vajda from Red Gate Software who wants me to pass on the following:

    I’m a UX Specialist at Red Gate, working on a plugin for Visual Studio.

    I need some help in finding some people for some SSDT research. We really need to speak to some SSDT users to have a 30 minute – 1 hour max. chat on a few things we’ll be working on.

    I was wondering if you’d be able to either point me at some people to contact, or put some sort of appeal on your blog for any SSDT users interested in helping us out and getting a £25 or $/euro equivalent as a thank you?

    Hence this blog post. If you’d like to speak to Filippa then simply reply to this blog post with some contact details and I’m sure Filippa will be in touch. Hopefully the opportunity to provide feedback on a future product should be incentive enough, if not the 25 notes isn’t to be sniffed at.

    @Jamiet

  • Visual Studio is free for developers. How about SQL Server?

    Big announcements for developers in the Microsoft ecosystem today, Visual Studio is now free to folks that like to write code outside of enterprises (i.e. students, open source devs, home tinkerers etc…)

    Visual Studio Community 2013 is a new, free and fully featured edition of Visual Studio, available today, with access to the full Visual Studio extensibility ecosystem and support for targeting any platform, from devices and desktop to web and cloud services.

    http://blogs.msdn.com/b/somasegar/archive/2014/11/12/opening-up-visual-studio-and-net-to-every-developer-any-application-net-server-core-open-source-and-cross-platform-visual-studio-community-2013-and-preview-of-visual-studio-2015-and-net-2015.aspx

    As usual though the the most maligned developers in the Microsoft ecosystem, SQL Server devs, have been forgotten. There’s no full-featured version of SQL Server available for them is there? Add this to the ever-growing list of evidence that Microsoft doesn’t give a rat’s ass about the SQL Server dev community any more:

    • T-SQL is a decrepit, sad, excuse for a development language with only two* significant changes (TRY…CATCH, Windowing functions) this century
    • At Microsoft’s *developer* conference earlier this year there were 185 sessions. How many of those were devoted to SQL Server? One! (OK, two).
    • Archaic, buggy, IDEs for SSIS & SSRS
    • Ubiquitous “Won’t fix” responses on Connect

    Here’s a suggestion for you Microsoft. Throw us a bone and make SQL Server Developer Edition free too. Doing so isn’t going to save the community wads of cash (it can be bought today for only $59.95 on the Microsoft Store, I’m sure that doesn’t make up much of the double-digit annual growth that they’re currently enjoying) but that’s not the point, it would send out a message. It tells us in the SQL Server community that Microsoft still cares about us, that they still want us to use their stuff. That we’re not a forgotten breed. As a SQL Server developer I haven’t felt like that for quite some time.

    A pipe dream? Let’s see.

    @Jamiet 

    *Just my opinion. Feel free to disagree in the comments below.

  • Hey, Windows 10, fix this!

    So I hear that in Windows 10 they’re making improvements to the DOS window. You’ll be able to resize it now. And use Shift plus the arrow keys to select text. That’s great. Stunning in fact. Now here’s some more things they might want to have a go at at the same time.

    Make the environment variable editor bigger

    Would it really be a great hardship to put a drag handle onto this thing?

    SNAGHTMLc13a16

    Make PATH editing work. All the time.

    A 1024 character limit? Really? Thanks for completely obliterating my %PATH% in the meantime!

    image

     

    I don’t think I’m asking for the world. Am I?

     

    If anyone has installed Windows 10 feel free to ask them to fix this. Thank you kindly.

    @Jamiet

More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement