THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

StreamInsight Team

  • StreamInsight: More Than Just an API

    If you're evaluating StreamInsight you should know that it's more than just an API for writing streaming queries. Included with the StreamInsight installation are powerful tools that allow you to manage, monitor, tune, and troubleshoot your streaming data applications. These tools range from diagnostic methods and properties built into the API to a full graphical debugging application.

    To read more, see the TechNet Wiki article, StreamInsight: More Than Just an API.


    The StreamInsight Team

  • What Is StreamInsight? A Primer for Non-Programmers

    Are you trying to figure out whether StreamInsight might be something you could use, but you’re having trouble sifting through all the programming jargon that’s used to describe it? StreamInsight is, ultimately, a set of programming tools, and at some point it takes a programmer to implement a StreamInsight solution. But it really should be possible to get a handle on what StreamInsight is all about even if you’re not a programmer yourself.

    A new article published in the TechNet Wiki may be able to help: StreamInsight for Non-Programmers. It gives an overview of the technology, but it leaves out the C# references and relates StreamInsight to more familiar SQL databases and queries. Check it out.

    When you’re done there and are ready to dig a little deeper, take a look at Get Started with StreamInsight 2.1. That article should help you navigate through the StreamInsight official documentation and other resources.

    And, as always, you can post questions or comments here or on the TechNet Wiki.


    The StreamInsight Team

  • Getting Started with StreamInsight 2.1

    If you're just beginning to get familiar with StreamInsight, you may be looking for a way to get started. What are the basics? How can I get my first StreamInsight application running so I can see how it works? Where is the 'front door' that will get me going?

    If that describes you, then this blog entry might be just what you need.

    If you're already a StreamInsight wiz, keep reading anyway - you may find some helpful links here that you weren't aware of. But here's what we'd like from you experienced readers in particular: if you know of other good resources that we missed, please feel free to add them in the comments below. We appreciate you sharing your expertise.

    The Book

    The basic documentation for StreamInsight is located in the MSDN Library (Microsoft StreamInsight 2.1). You'll notice that previous versions of StreamInsight are still there (1.2 and 2.0), but if you're just getting started you can stick to the 2.1 section.

    The documentation has been organized to function as reference material, which is fine after you're familiar with the technology. But if you're trying to learn the basics, you might want to take a different path instead of just starting at the top.

    The following is one map you can use.

    What Is StreamInsight?

    Here is a sequence of topics that should give you a good overview of what StreamInsight is and how it works:

    Overview answers the question, "what is it?"
    StreamInsight Server Architecture gives you a quick look at a high-level architectural drawing
    StreamInsight Concepts lays out an overview of the basic components
    Deploying StreamInsight Entities to a StreamInsight Server describes the mechanics of how these components work together

    Getting an Example Running

    Once you have this background, go ahead and install StreamInsight and get a basic example up and running:

    Installation download and install the software
    StreamInsight Examples walk through a set of 3 simple StreamInsight applications that work together to demonstrate what you learned in the topics above; you can copy and paste the code into Visual Studio, compile, and run

    That's it - you now have a real, functioning StreamInsight system!

    Now that you have a handle on the basics, you might want to start digging deeper.

    Digging Deeper

    Here's a suggested path through the documentation to help you understand the next layer of StreamInsight technologies:

    Using Event Sources and Event Sinks sources supply data and sinks consume it; this topic gives you an overview of how they work
    Publishing and Connecting to the StreamInsight Server practical details on how to set up a StreamInsight server
    A Hitchhiker’s Guide to StreamInsight 2.1 Queries queries are the heart of how StreamInsight performs data analytics, and this whitepaper will help you really understand how they work
    Using StreamInsight LINQ root through this section for technical details on specific query components
    Using the StreamInsight Event Flow Debugger in addition to troubleshooting, the debugger is a great way to learn more about what goes on inside a StreamInsight application

    And Even Deeper

    Finally, to get a handle on some of the more complex things you can do with StreamInsight, dig into these:

    Input and Output Adapters adapters can be useful for handling more complex sources and sinks
    Building Resilient StreamInsight Applications a resilient application is able to recover from system failures
    Operations this section will help you monitor and troubleshoot a running StreamInsight system

    The StreamInsight Community

    As you're designing and developing your StreamInsight solutions, you probably will find it helpful to see working examples or to learn tips and tricks from others. Or maybe you need a place to post a vexing question.

    Here are some community resources that we have found useful. If you know of others, please add them in the comments below.

    Code samples and tools

    The following case studies are based on earlier versions of StreamInsight, but they still are useful examples:






    You’re a StreamInsight Expert

    That should get you going. Please add any other resources you have found useful in the comments below.



    The StreamInsight Team

  • StreamInsight will not push feature releases through Microsoft Update going forward

    Until now, we've released StreamInsight through the Microsoft Download Center, and also released it out through Microsoft Update. Going forward, we will only release new StreamInsight versions through the Microsoft Download Center and only use MU to release service packs and security fixes (should any be needed). As a result of this decision, we are pulling off the recent StreamInsight 2.1 release from MU; this release is still available in Download Center.

    Don’t worry: there’s nothing wrong with the versions we’ve shipped in MU, we’ve just adjusted how we use MU. There is no action necessary from our customers as a result of this change, and we are not rolling back any changes to your current installation, so if you have installed StreamInsight 2.1 recently through the Microsoft Update, they will still work fine.


    The StreamInsight Team

  • Using Subjects to Deploy Queries Dynamically

    In the previous blog posting, we showed how to construct and deploy query fragments to a StreamInsight server, and how to re-use them later. In today’s posting we’ll integrate this pattern into a method of dynamically composing a new query with an existing one.

    The construct that enables this scenario in StreamInsight V2.1 is a Subject. A Subject lets me create a junction element in an existing query that I can tap into while the query is running.

    To set this up as an end-to-end example, let’s first define a stream simulator as our data source:

    var generator = myApp.DefineObservable(
        (TimeSpan t) => Observable.Interval(t).Select(_ => new SourcePayload()));

    This ‘generator’ produces a new instance of SourcePayload with a period of t (system time) as an IObservable. SourcePayload happens to have a property of type double as its payload data.

    Let’s also define a sink for our example—an IObserver of double values that writes to the console:

    var console = myApp.DefineObserver(
        (string label) => Observer.Create<double>(e => Console.WriteLine("{0}: {1}", label, e)))

    The observer takes a string as parameter which is used as a label on the console, so that we can distinguish the output of different sink instances. Note that we also deploy this observer, so that we can retrieve it later from the server from a different process.

    Remember how we defined the aggregation as an IQStreamable function in the previous article? We will use that as well:

    var avg = myApp
        .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) =>
            from win in s.TumblingWindow(w)
            select win.Avg(e => e.Value))

    Then we define the Subject, which acts as an observable sequence as well as an observer. Thus, we can feed a single source into the Subject and have multiple consumers—that can come and go at runtime—on the other side:

    var subject = myApp.CreateSubject("Subject", () => new Subject<SourcePayload>());

    Subject are always deployed automatically. Their name is used to retrieve them from a (potentially) different process (see below).

    Note that the Subject as we defined it here doesn’t know anything about temporal streams. It is merely a sequence of SourcePayloads, without any notion of StreamInsight point events or CTIs. So in order to compose a temporal query on top of the Subject, we need to 'promote' the sequence of SourcePayloads into an IQStreamable of point events, including CTIs:

    var stream = subject.ToPointStreamable(
        e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),

    In a later posting we will show how to use Subjects that have more awareness of time and can be used as a junction between QStreamables instead of IQbservables.

    Having turned the Subject into a temporal stream, we can now define the aggregate on this stream. We will use the IQStreamable entity avg that we defined above:

    var longAverages = avg(stream, TimeSpan.FromSeconds(5));

    In order to run the query, we need to bind it to a sink, and bind the subject to the source:

    var standardQuery = longAverages
        .Bind(console("5sec average"))

    Lastly, we start the process:


    Now we have a simple query running end-to-end, producing results. What follows next is the crucial part of tapping into the Subject and adding another query that runs in parallel, using the same query definition (the “AverageQuery”) but with a different window length. We are assuming that we connected to the same StreamInsight server from a different process or even client, and thus have to retrieve the previously deployed entities through their names:

    // simulate the addition of a 'fast' query from a separate server connection,
    // by retrieving the aggregation query fragment
    // (instead of simply using the 'avg' object)
    var averageQuery = myApp
        .GetStreamable<IQStreamable<SourcePayload>, TimeSpan, double>("AverageQuery");

    // retrieve the input sequence as a subject
    var inputSequence = myApp
        .GetSubject<SourcePayload, SourcePayload>("Subject");

    // retrieve the registered sink
    var sink = myApp.GetObserver<string, double>("ConsoleSink");

    // turn the sequence into a temporal stream
    var stream2 = inputSequence.ToPointStreamable(
        e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),

    // apply the query, now with a different window length
    var shortAverages = averageQuery(stream2, TimeSpan.FromSeconds(1));

    // bind new sink to query and run it
    var fastQuery = shortAverages
        .Bind(sink("1sec average"))

    The attached solution demonstrates the sample end-to-end.

    The StreamInsight Team

  • Parameterized StreamInsight Queries

    The changes in our APIs enable a set of scenarios that were either not possible before or could only be achieved through workarounds. One such use case that people ask about frequently is the ability to parameterize a query and instantiate it with different values instead of re-deploying the entire statement. I’ll demonstrate how to do this in StreamInsight 2.1 and combine it with a method of using subjects for dynamic query composition in a mini-series of (at least) two blog articles.

    Let’s start with something really simple: I want to deploy a windowed aggregate to a StreamInsight server, and later use it with different window sizes. The LINQ statement for such an aggregate is very straightforward and familiar:

    var result = from win in stream.TumblingWindow(TimeSpan.FromSeconds(5)) 
                 select win.Avg(e => e.Value);

    Obviously, we had to use an existing input stream object as well as a concrete TimeSpan value. If we want to be able to re-use this construct, we can define it as a IQStreamable:

    var avg = myApp
        .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) =>
            from win in s.TumblingWindow(w)
            select win.Avg(e => e.Value));

    The DefineStreamable API lets us define a function, in our case from a IQStreamable (the input stream) and a TimeSpan (the window length) to an IQStreamable (the result). We can then use it like a function, with the input stream and the window length as parameters:

    var result = avg(stream, TimeSpan.FromSeconds(5));

    Nice, but you might ask: what does this save me, except from writing my own extension method? Well, in addition to defining the IQStreamable function, you can actually deploy it to the server, to make it re-usable by another process! When we deploy an artifact in V2.1, we give it a name:

    var avg = myApp
        .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) =>
            from win in s.TumblingWindow(w)
            select win.Avg(e => e.Value))

    When connected to the same server, we can now use that name to retrieve the IQStreamable and use it with our own parameters:

    var averageQuery = myApp
        .GetStreamable<IQStreamable<SourcePayload>, TimeSpan, double>("AverageQuery");

    var result = averageQuery(stream, TimeSpan.FromSeconds(5));

    Convenient, isn’t it?

    Keep in mind that, even though the function “AverageQuery” is deployed to the server, its logic will still be instantiated into each process when the process is created. The advantage here is being able to deploy that function, so another client who wants to use it doesn’t need to ask the author for the code or assembly, but just needs to know the name of deployed entity.

    A few words on the function signature of GetStreamable: the last type parameter (here: double) is the payload type of the result, not the actual result stream’s type itself. The returned object is a function from IQStreamable<SourcePayload> and TimeSpan to IQStreamable<double>.

    In the next article we will integrate this usage of IQStreamables with Subjects in StreamInsight, so stay tuned!

    The StreamInsight Team

  • A Hitchhiker's Guide to StreamInsight 2.1 Queries

    Back in 2010 we released the first version of an educational guide to StreamInsight Queries to help developers mind shift into the new paradigm. With the release of StreamInsight 2.1 and introducing the New Programming Model, we thought this was the right time to freshen up this document.

    The format and the storyline stay mostly untouched from the original. There are several improvements to the document. It is updated to the new programming model introduced in v2.1. As a consequence we do not talk about adapters anymore, as they are not necessary to introduce the query language, and it helps simplify the narrative. We have also improved and extended event data visualizations, which makes it easier to understand and follow the queries.

    The guide consists of a PDF document and a Visual Studio solution that accompanies the paper.

    The VS solution is also part of the StreamInsight 2.1 samples on CodePlex and the queries discussed in the paper are also included in the StreamInsight 2.1 LinqPad samples.


    The StreamInsight Team

  • Dangers of time travel

    I tend to partition StreamInsight operators into three classes: the basic “temporal/relational” operators; the “time travel” operators, and; the “elephant” operators. The basic operators illustrate the rules. The time travel operators bend the rules. The scan operators break them altogether. While I’d like to focus on time travel today, we can’t bend the rules before knowing what they are. Let’s begin with the basics!

    A StreamInsight query describes how the results of a normal relational query change over time. The start time of an event indicates the time that a row is inserted into the relational input. The end time indicates when that row is deleted. StreamInsight’s basic temporal/relational operators can be understood in those terms. For instance, the StreamInsight ‘join’ operator produces output for coincident input tuples matching some predicate. Tuples are coincident if their time intervals overlap. Another way of looking at it: all matching tuples across all time logically produce output, but that output may have an empty temporal intersection. The basic temporal/relational operators:

    • Join
    • Left anti-join
    • Selection
    • Projection
    • Group and apply
    • Snapshot window operators (aggregate, top-K, user-defined operator).

    Time travel operators bend the rules because they allow an input event to contribute to results outside of their original time interval! The time travel operators:

    • Hopping/tumbling window operators, because an event contributes to the window result even if was active for only a portion of the window.
    • Clip event duration, because the clipper masks the clippee for the rest of time.
    • Alter event lifetime, because it changes the time interval for an event.

    The fundamental time travel operator is alter event lifetime because the others can be expressed in terms of it. (I’ll leave this as an exercise for the reader). A hint for hopping window operators: one way of thinking about a hopping window is that events belonging to the same window are reassigned to the same time interval so that they can contribute to the same output time interval!) While there are several variations on the alter lifetime operator, they all boil down to the following components:

    • An input stream.
    • An (optional) function that modifies the start time of events in the stream.
    • An (optional) function that modifies the duration of events in the stream.

    If you do modify start times and durations, make sure that the modifications to start and end times are monotonic. In other words, make sure that the changes do not affect the relative order of start and end edge events. Otherwise, you risk introducing CTI violations. CTI violations bring down the query: they indicate that a new event has arrived affecting a result that has already been committed! Like traveling through time to alter the past, violating CTIs causes (can I even use that word here?) bad things to happen.

    Shifting all events forward or backward by a fixed amount is safe because the CTIs are moving with the events*:

    var shiftedBack = source.ShiftEventTime(_ => TimeSpan.FromSeconds(-1));
    var shiftedForward = source.ShiftEventTime(_ => TimeSpan.FromDays(7));

    Making time pass faster or slower doesn’t violate the monotonicity requirement, but depending on the nature of the input can introduce CTI violations. If the input stream contains end edge events, then they could also violate CTIs if not modified, as in the following example which attempts to double the speed at which time passes:

    var startTime = DateTimeOffset.UtcNow;

    var source = this.Application
        .DefineEnumerable(() => new[]
                EdgeEvent.CreateStart(startTime, 0),
                EdgeEvent.CreateEnd(startTime, startTime.AddSeconds(10), 0),

    var fastForward = source.AlterEventStartTime(
        e => e.StartTime.AddTicks(e.StartTime.Ticks));

    We can address the failure by ensuring that end edges also advance at double-speed:

    var fastForward = source.AlterEventLifetime(
        e => e.StartTime.AddTicks(e.StartTime.Ticks),
        e => TimeSpan.FromTicks((e.EndTime - e.StartTime).Ticks * 2L));

    There is a very important loophole that can be exploited for duration modifications. If the duration selector returns a fixed value for every event, StreamInsight no longer needs to produce end edges internally, and those non-existent end edges no longer pose a CTI violation risk:

    var fastForwardFixedInterval = source.AlterEventLifetime(
        e => e.StartTime.AddTicks(e.StartTime.Ticks),
        _ => TimeSpan.FromSeconds(1));

    In summary, there are several completely safe forms of time travel in StreamInsight:

    1. Shift all events forward or backward in time by a fixed amount.
    2. Established a fixed duration for events.
    3. Apply modifications that do not affect the relative order of start and end edges.
    4. The built-in time travel operators, clip and hopping window.

    While it may be safe to stray from these rules for specific data and query combinations, I suggest playing it safe!

    * Notice that the start time selector function does not allow behavior to be affected by event payload. This is because CTI events, which are also modified by the selector, do not carry payloads.

    Colin Meek/The StreamInsight Team

  • StreamInsight 2.1, meet LINQ

    Someone recently called LINQ “magic” in my hearing. I leapt to LINQ’s defense immediately. Turns out some people don’t realize “magic” is can be a pejorative term. I thought LINQ needed demystification. Here’s your best demystification resource: I won’t repeat much of what Matt Warren says in his excellent series, but will talk about some core ideas and how they affect the 2.1 release of StreamInsight.

    Let’s tell the story of a LINQ query.

    Compile time

    It begins with some code:

    IQueryable<Product> products = ...;
    var query = from p in products
                where p.Name == "Widget"
                select p.ProductID;
    foreach (int id in query)

    When the code is compiled, the C# compiler (among other things) de-sugars the query expression (see C# spec section 7.16):

    var query = products.Where(p => p.Name == "Widget").Select(p => p.ProductID);

    Overload resolution subsequently binds the Queryable.Where<Product> and Queryable.Select<Product, int> extension methods (see C# spec sections 7.5 and 7.6.5). After overload resolution, the compiler knows something interesting about the anonymous functions (lambda syntax) in the de-sugared code: they must be converted to expression trees, i.e.,“an object structure that represents the structure of the anonymous function itself” (see C# spec section 6.5). The conversion is equivalent to the following rewrite:

    var prm1 = Expression.Parameter(typeof(Product), "p");
    var prm2 = Expression.Parameter(typeof(Product), "p");
    var query = Queryable.Select<Product, int>(
            Expression.Lambda<Func<Product, bool>>(Expression.Property(prm1, "Name"), prm1)),
            Expression.Lambda<Func<Product, int>>(Expression.Property(prm2, "ProductID"), prm2));

    If the “products” expression had type IEnumerable<Product>, the compiler would have chosen the Enumerable.Where and Enumerable.Select extension methods instead, in which case the anonymous functions would have been converted to delegates.

    At this point, we’ve reduced the LINQ query to familiar code that will compile in C# 2.0. (Note that I’m using C# snippets to illustrate transformations that occur in the compiler, not to suggest a viable compiler design!)


    When the above program is executed, the Queryable.Where method is invoked. It takes two arguments. The first is an IQueryable<> instance that exposes an Expression property and a Provider property. The second is an expression tree. The Queryable.Where method implementation looks something like this:

    public static IQueryable<T> Where<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate)
        return source.Provider.CreateQuery<T>(
        Expression.Call(this method, source.Expression, Expression.Quote(predicate)));

    Notice that the method is really just composing a new expression tree that calls itself with arguments derived from the source and predicate arguments. Also notice that the query object returned from the method is associated with the same provider as the source query.

    By invoking operator methods, we’re constructing an expression tree that describes a query. Interestingly, the compiler and operator methods are colluding to construct a query expression tree. The important takeaway is that expression trees are built in one of two ways: (1) by the compiler when it sees an anonymous function that needs to be converted to an expression tree, and; (2) by a query operator method that constructs a new queryable object with an expression tree rooted in a call to the operator method (self-referential).

    Next we hit the foreach block. At this point, the power of LINQ queries becomes apparent. The provider is able to determine how the query expression tree is evaluated! The code that began our story was intentionally vague about the definition of the “products” collection. Maybe it is a queryable in-memory collection of products:

    var products = new[]
        { new Product { Name = "Widget", ProductID = 1 } }.AsQueryable();

    The in-memory LINQ provider works by rewriting Queryable method calls to Enumerable method calls in the query expression tree. It then compiles the expression tree and evaluates it. It should be mentioned that the provider does not blindly rewrite all Queryable calls. It only rewrites a call when its arguments have been rewritten in a way that introduces a type mismatch, e.g. the first argument to Queryable.Where<Product> being rewritten as an expression of type IEnumerable<Product> from IQueryable<Product>. The type mismatch is triggered initially by a “leaf” expression like the one associated with the AsQueryable query: when the provider recognizes one of its own leaf expressions, it replaces the expression with the original IEnumerable<> constant expression. I like to think of this rewrite process as “type irritation” because the rewritten leaf expression is like a foreign body that triggers an immune response (further rewrites) in the tree. The technique ensures that only those portions of the expression tree constructed by a particular provider are rewritten by that provider: no type irritation, no rewrite.

    Let’s consider the behavior of an alternative LINQ provider. If “products” is a collection created by a LINQ to SQL provider:

    var products = new NorthwindDataContext().Products;

    the provider rewrites the expression tree as a SQL query that is then evaluated by your favorite RDBMS. The predicate may ultimately be evaluated using an index! In this example, the expression associated with the Products property is the “leaf” expression.

    StreamInsight 2.1

    For the in-memory LINQ to Objects provider, a leaf is an in-memory collection. For LINQ to SQL, a leaf is a table or view.

    When defining a “process” in StreamInsight 2.1, what is a leaf? To StreamInsight a leaf is logic: an adapter, a sequence, or even a query targeting an entirely different LINQ provider!

    How do we represent the logic? Remember that a standing query may outlive the client that provisioned it. A reference to a sequence object in the client application is therefore not terribly useful. But if we instead represent the code constructing the sequence as an expression, we can host the sequence in the server:

    using (var server = Server.Connect(...))
        var app = server.Applications["my application"];
        var source = app.DefineObservable(() => Observable.Range(0, 10, Scheduler.NewThread));
        var query = from i in source where i % 2 == 0 select i;

    Example 1: defining a source and composing a query

    Let’s look in more detail at what’s happening in example 1. We first connect to the remote server and retrieve an existing app. Next, we define a simple Reactive sequence using the Observable.Range method. Notice that the call to the Range method is in the body of an anonymous function. This is important because it means the source sequence definition is in the form of an expression, rather than simply an opaque reference to an IObservable<int> object. The variation in Example 2 fails. Although it looks similar, the sequence is now a reference to an in-memory observable collection:

    var local = Observable.Range(0, 10, Scheduler.NewThread);
    var source = app.DefineObservable(() => local); // can’t serialize ‘local’!

    Example 2: error referencing unserializable local object

    The Define* methods support definitions of operator tree leaves that target the StreamInsight server. These methods all have the same basic structure. The definition argument is a lambda expression taking between 0 and 16 arguments and returning a source or sink. The method returns a proxy for the source or sink that can then be used for the usual style of LINQ query composition. The “define” methods exploit the compile-time C# feature that converts anonymous functions into translatable expression trees!

    Query composition exploits the runtime pattern that allows expression trees to be constructed by operators taking queryable and expression (Expression<>) arguments. The practical upshot: once you’ve Defined a source, you can compose LINQ queries in the familiar way using query expressions and operator combinators. Notably, queries can be composed using pull-sequences (LINQ to Objects IQueryable<> inputs), push sequences (Reactive IQbservable<> inputs), and temporal sequences (StreamInsight IQStreamable<> inputs). You can even construct processes that span these three domains using “bridge” method overloads (ToEnumerable, ToObservable and To*Streamable).

    Finally, the targeted rewrite via type irritation pattern is used to ensure that StreamInsight computations can leverage other LINQ providers as well. Consider the following example (this example depends on Interactive Extensions):

    var source = app.DefineEnumerable((int id) =>
        EnumerableEx.Using(() =>
            new NorthwindDataContext(), context =>
                from p in context.Products
                where p.ProductID == id
                select p.ProductName));

    Within the definition, StreamInsight has no reason to suspect that it ‘owns’ the Queryable.Where and Queryable.Select calls, and it can therefore defer to LINQ to SQL! Let’s use this source in the context of a StreamInsight process:

    var sink = app.DefineObserver(() => Observer.Create<string>(Console.WriteLine));

    var query = from name in source(1).ToObservable()
                where name == "Widget"
                select name;

    using (query.Bind(sink).Run("process"))

    When we run the binding, the source portion which filters on product ID and projects the product name is evaluated by SQL Server. Outside of the definition, responsibility for evaluation shifts to the StreamInsight server where we create a bridge to the Reactive Framework (using ToObservable) and evaluate an additional predicate.

    It’s incredibly easy to define computations that span multiple domains using these new features in StreamInsight 2.1!

    The StreamInsight Team

  • Taming Hopping Windows

    At first glance, hopping windows seem fairly innocuous and obvious. They organize events into windows with a simple periodic definition: the windows have some duration d (e.g. a window covers 5 second time intervals), an interval or period p (e.g. a new window starts every 2 seconds) and an alignment a (e.g. one of those windows starts at 12:00 PM on March 15, 2012 UTC).

    var wins = xs
                       new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc));

    Logically, there is a window with start time a + np and end time a + np + d for every integer n. That’s a lot of windows. So why doesn’t the following query (always) blow up?

    var query = wins.Select(win => win.Count());

    A few users have asked why StreamInsight doesn’t produce output for empty windows. Primarily it’s because there is an infinite number of empty windows! (Actually, StreamInsight uses DateTimeOffset.MaxValue to approximate “the end of time” and DateTimeOffset.MinValue to approximate “the beginning of time”, so the number of windows is lower in practice.)

    That was the good news. Now the bad news. Events also have duration. Consider the following simple input:

    var xs = this.Application
                    .DefineEnumerable(() => new[]
                        { EdgeEvent.CreateStart(DateTimeOffset.UtcNow, 0) })

    Because the event has no explicit end edge, it lasts until the end of time. So there are lots of non-empty windows if we apply a hopping window to that single event! For this reason, we need to be careful with hopping window queries in StreamInsight. Or we can switch to a custom implementation of hopping windows that doesn’t suffer from this shortcoming.

    The alternate window implementation produces output only when the input changes. We start by breaking up the timeline into non-overlapping intervals assigned to each window. In figure 1, six hopping windows (“Windows”) are assigned to six intervals (“Assignments”) in the timeline. Next we take input events (“Events”) and alter their lifetimes (“Altered Events”) so that they cover the intervals of the windows they intersect. In figure 1, you can see that the first event e1 intersects windows w1 and w2 so it is adjusted to cover assignments a1 and a2. Finally, we can use snapshot windows (“Snapshots”) to produce output for the hopping windows. Notice however that instead of having six windows generating output, we have only four. The first and second snapshots correspond to the first and second hopping windows. The remaining snapshots however cover two hopping windows each! While in this example we saved only two events, the savings can be more significant when the ratio of event duration to window duration is higher.


    Figure 1: Timeline

    The implementation of this strategy is straightforward. We need to set the start times of events to the start time of the interval assigned to the earliest window including the start time. Similarly, we need to modify the end times of events to the end time of the interval assigned to the latest window including the end time. The following snap-to-boundary function that rounds a timestamp value t down to the nearest value t' <= t such that t' is a + np for some integer n will be useful. For convenience, we will represent both DateTime and TimeSpan values using long ticks:

    static long SnapToBoundary(long t, long a, long p)
        return t - ((t - a) % p) - (t > a ? 0L : p);

    How do we find the earliest window including the start time for an event? It’s the window following the last window that does not include the start time assuming that there are no gaps in the windows (i.e. duration < interval), and limitation of this solution. To find the end time of that antecedent window, we need to know the alignment of window ends:

    long e = a + (d % p);

    Using the window end alignment, we are finally ready to describe the start time selector:

    static long AdjustStartTime(long t, long e, long p)
        return SnapToBoundary(t, e, p) + p;

    To find the latest window including the end time for an event, we look for the last window start time (non-inclusive):

    public static long AdjustEndTime(long t, long a, long d, long p)
        return SnapToBoundary(t - 1, a, p) + p + d;

    Bringing it together, we can define the translation from events to ‘altered events’ as in Figure 1:

    public static IQStreamable<T> SnapToWindowIntervals<T>(IQStreamable<T> source, TimeSpan duration, TimeSpan interval, DateTime alignment)
        if (source == null) throw new ArgumentNullException("source");

        // reason about DateTime and TimeSpan in ticks
        long d = Math.Min(DateTime.MaxValue.Ticks, duration.Ticks);
        long p = Math.Min(DateTime.MaxValue.Ticks, Math.Abs(interval.Ticks));

        // set alignment to earliest possible window
        var a = alignment.ToUniversalTime().Ticks % p;

        // verify constraints of this solution
        if (d <= 0L) { throw new ArgumentOutOfRangeException("duration"); }
        if (p == 0L || p > d) { throw new ArgumentOutOfRangeException("interval"); }

        // find the alignment of window ends
        long e = a + (d % p);

        return source.AlterEventLifetime(
            evt => ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p)),
            evt => ToDateTime(AdjustEndTime(evt.EndTime.ToUniversalTime().Ticks, a, d, p)) -
                ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p)));

    public static DateTime ToDateTime(long ticks)
        // just snap to min or max value rather than under/overflowing
        return ticks < DateTime.MinValue.Ticks
            ? new DateTime(DateTime.MinValue.Ticks, DateTimeKind.Utc)
            : ticks > DateTime.MaxValue.Ticks
            ? new DateTime(DateTime.MaxValue.Ticks, DateTimeKind.Utc)
            : new DateTime(ticks, DateTimeKind.Utc);

    Finally, we can describe our custom hopping window operator:

    public static IQWindowedStreamable<T> HoppingWindow2<T>(
        IQStreamable<T> source,
        TimeSpan duration,
        TimeSpan interval,
        DateTime alignment)
        if (source == null) { throw new ArgumentNullException("source"); }
        return SnapToWindowIntervals(source, duration, interval, alignment).SnapshotWindow();

    By switching from HoppingWindow to HoppingWindow2 in the following example, the query returns quickly rather than gobbling resources and ultimately failing!

    public void Main()
        var start = new DateTimeOffset(new DateTime(2012, 6, 28), TimeSpan.Zero);
        var duration = TimeSpan.FromSeconds(5);
        var interval = TimeSpan.FromSeconds(2);
        var alignment = new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc);

        var events = this.Application.DefineEnumerable(() => new[]
            EdgeEvent.CreateStart(start.AddSeconds(0), "e0"),
            EdgeEvent.CreateStart(start.AddSeconds(1), "e1"),
            EdgeEvent.CreateEnd(start.AddSeconds(1), start.AddSeconds(2), "e1"),
            EdgeEvent.CreateStart(start.AddSeconds(3), "e2"),
            EdgeEvent.CreateStart(start.AddSeconds(9), "e3"),
            EdgeEvent.CreateEnd(start.AddSeconds(3), start.AddSeconds(10), "e2"),
            EdgeEvent.CreateEnd(start.AddSeconds(9), start.AddSeconds(10), "e3"),

        var adjustedEvents = SnapToWindowIntervals(events, duration, interval, alignment);

        var query = from win in HoppingWindow2(events, duration, interval, alignment)
                    select win.Count();

        DisplayResults(adjustedEvents, "Adjusted Events");
        DisplayResults(query, "Query");

    As you can see, instead of producing a massive number of windows for the open start edge e0, a single window is emitted from 12:00:15 AM until the end of time:

    Adjusted Events




    6/28/2012 12:00:01 AM

    12/31/9999 11:59:59 PM


    6/28/2012 12:00:03 AM

    6/28/2012 12:00:07 AM


    6/28/2012 12:00:05 AM

    6/28/2012 12:00:15 AM


    6/28/2012 12:00:11 AM

    6/28/2012 12:00:15 AM






    6/28/2012 12:00:01 AM

    6/28/2012 12:00:03 AM


    6/28/2012 12:00:03 AM

    6/28/2012 12:00:05 AM


    6/28/2012 12:00:05 AM

    6/28/2012 12:00:07 AM


    6/28/2012 12:00:07 AM

    6/28/2012 12:00:11 AM


    6/28/2012 12:00:11 AM

    6/28/2012 12:00:15 AM


    6/28/2012 12:00:15 AM

    12/31/9999 11:59:59 PM


    The StreamInsight Team

  • Changes to the LINQ-to-StreamInsight Dialect

    In previous versions of StreamInsight (1.0 through 2.0), CepStream<> represents temporal streams of many varieties:

    1. Streams with ‘open’ inputs (e.g., those defined and composed over CepStream<T>.Create(string streamName)
    2. Streams with ‘partially bound’ inputs (e.g., those defined and composed over CepStream<T>.Create(Type adapterFactory, …))
    3. Streams with fully bound inputs (e.g., those defined and composed over To*Stream – sequences or DQC)
      1. The stream may be embedded (where Server.Create is used)
      2. The stream may be remote (where Server.Connect is used)

    When adding support for new programming primitives in StreamInsight 2.1, we faced a choice: Add a fourth variety (use CepStream<> to represent streams that are bound the new programming model constructs), or introduce a separate type that represents temporal streams in the new user model. We opted for the latter. Introducing a new type has the effect of reducing the number of (confusing) runtime failures due to inappropriate uses of CepStream<> instances in the incorrect context. The new types are:

    • IStreamable<>, which logically represents a temporal stream.
    • IQStreamable<> : IStreamable<>, which represents a queryable temporal stream. Its relationship to IStreamable<> is analogous to the relationship of IQueryable<> to IEnumerable<>. The developer can compose temporal queries over remote stream sources using this type.

    The syntax of temporal queries composed over IQStreamable<> is mostly consistent with the syntax of our existing CepStream<>-based LINQ provider. However, we have taken the opportunity to refine certain aspects of the language surface. Differences are outlined below. Because 2.1 introduces new types to represent temporal queries, the changes outlined in this post do no impact existing StreamInsight applications using the existing types!


    StreamInsight does not support the SelectMany operator in its usual form (which is analogous to SQL’s “CROSS APPLY” operator):

    static IEnumerable<R> SelectMany<T, R>(this IEnumerable<T> source, Func<T, IEnumerable<R>> collectionSelector)

    It instead uses SelectMany as a convenient syntactic representation of an inner join. The parameter to the selector function is thus unavailable. Because the parameter isn’t supported, its type in StreamInsight 1.0 – 2.0 wasn’t carefully scrutinized. Unfortunately, the type chosen for the parameter is nonsensical to LINQ programmers:

    static CepStream<R> SelectMany<T, R>(this CepStream<T> source, Expression<Func<CepStream<T>, CepStream<R>>> streamSelector)

    Using Unit as the type for the parameter accurately reflects the StreamInsight’s capabilities:

    static IQStreamable<R> SelectMany<T, R>(this IQStreamable<T> source, Expression<Func<Unit, IQStreamable<R>>> streamSelector)

    For queries that succeed – that is, queries that do not reference the stream selector parameter – there is no difference between the code written for the two overloads:

    from x in xs
    from y in ys
    select f(x, y)


    The Take operator used in StreamInsight causes confusion for LINQ programmers because it is applied to the (unbounded) stream rather than the (bounded) window, suggesting that the query as a whole will return k rows:

    (from win in xs.SnapshotWindow()
    from x in win
    orderby x.A
    select x.B).Take(k)

    The use of SelectMany is also unfortunate in this context because it implies the availability of the window parameter within the remainder of the comprehension. The following compiles but fails at runtime:

    (from win in xs.SnapshotWindow()
    from x in win
    orderby x.A
    select win).Take(k)

    The Take operator in 2.1 is applied to the window rather than the stream:



    (from win in xs.SnapshotWindow()
    from x in win
    orderby x.A
    select x.B).Take(k)
    from win in xs.SnapshotWindow()
    from b in
        (from x in win
        orderby x.A
        select x.B).Take(k)
    select b


    We are introducing an explicit multicast operator in order to preserve expression identity, which is important given the semantics about moving code to and from StreamInsight. This also better matches existing LINQ dialects, such as Reactive. This pattern enables expressing multicasting in two ways:



    var ys = from x in xs
             where x.A > 1
             select x;
    var zs = from y1 in ys
             from y2 in ys.ShiftEventTime(_ => TimeSpan.FromSeconds(1))
             select y1 + y2;
    var ys = from x in xs
             where x.A > 1
             select x;
    var zs = ys.Multicast(ys1 =>
        from y1 in ys1
        from y2 in ys1.ShiftEventTime(_ => TimeSpan.FromSeconds(1))
        select y1 + y2;

    Notice the product translates an expression using implicit multicast into an expression using the explicit multicast operator. The user does not see this translation.

    Default window policies

    Only default window policies are supported in the new surface. Other policies can be simulated by using AlterEventLifetime.




    Not supported


    Representation of LASJ as a correlated sub-query in the LINQ surface is problematic as the StreamInsight engine does not support correlated sub-queries (see discussion of SelectMany). The current syntax requires the introduction of an otherwise unsupported ‘IsEmpty()’ operator. As a result, the pattern is not discoverable and implies capabilities not present in the server. The direct representation of LASJ is used instead:



    from x in xs
        (from y in ys
        where x.A > y.B
        select y).IsEmpty()
    select x
    xs.LeftAntiJoin(ys, (x, y) => x.A > y.B)
    from x in xs
        (from y in ys
        where x.A == y.B
        select y).IsEmpty()
    select x
    xs.LeftAntiJoin(ys, x => x.A, y => y.B)


    The ApplyWithUnion methods have been deprecated since their signatures are redundant given the standard SelectMany overloads:



    xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow()
    select win.Count())
    xs.GroupBy(x => x.A).SelectMany(
        gs =>
        from win in gs.SnapshotWindow()
        select win.Count())
    xs.GroupBy(x => x.A).ApplyWithUnion(gs => from win in gs.SnapshotWindow()
    select win.Count(), r =>
    new { r.Key, Count = r.Payload })
    from x in xs
    group x by x.A into gs
    from win in gs.SnapshotWindow()
    select new { gs.Key, Count = win.Count() }

    Alternate UDO syntax

    The representation of UDOs in the StreamInsight LINQ dialect confuses cardinalities. Based on the semantics of user-defined operators in StreamInsight, one would expect to construct queries in the following form:

    from win in xs.SnapshotWindow()
    from y in MyUdo(win)
    select y

    Instead, the UDO proxy method is referenced within a projection, and the (many) results returned by the user code are automatically flattened into a stream:

    from win in xs.SnapshotWindow()
    select MyUdo(win)

    The “many-or-one” confusion is exemplified by the following example that compiles but fails at runtime:

    from win in xs.SnapshotWindow()
    select MyUdo(win) + win.Count()

    The above query must fail because the UDO is in fact returning many values per window while the count aggregate is returning one.

    Original syntax

    New alternate syntax

    from win in xs.SnapshotWindow()
    select win.UdoProxy(1)
    from win in xs.SnapshotWindow()
    from y in win.UserDefinedOperator(() => new Udo(1))
    select y


    from win in xs.SnapshotWindow()
    from y in win.UdoMacro(1)
    select y

    Notice that this formulation also sidesteps the dynamic type pitfalls of the existing “proxy method” approach to UDOs, in which the type of the UDO implementation (TInput, TOuput) and the type of its constructor arguments (TConfig) need to align in a precise and non-obvious way with the argument and return types for the corresponding proxy method.

    UDSO syntax

    UDSO currently leverages the DataContractSerializer to clone initial state for logical instances of the user operator. Initial state will instead be described by an expression in the new LINQ surface.



    xs.Scan(new Udso())
    xs.Scan(() => new Udso())

    Name changes

    ShiftEventTime => AlterEventStartTime: The alter event lifetime overload taking a new start time value has been renamed.

    CountByStartTimeWindow => CountWindow

  • Introduction to LinqPad Driver for StreamInsight 2.1

    We are announcing the availability of the LinqPad driver for StreamInsight 2.1. The purpose of this blog post is to offer a quick introduction into the new features that we added to the StreamInsight LinqPad driver. We’ll show you how to connect to a remote server, how to inspect the entities present of that server, how to compose on top of them and how to manage their lifetime.

    Installing the driver

    Info on how to install the driver can be found in an earlier blog post here.

    Establishing connections

    As you click on the “Add Connection” link in the left pane you will notice that now it’s possible to build the data context automatically. The new driver appears as an option in the upper list, and if you pick it you will open a connection dialog that lets you connect to a remote StreamInsight server.


    The connection dialog lets you specify the address of the remote server. You will notice that it’s possible to pick up the binding information from the configuration file of the LinqPad application (which is normally in the same folder as LinqPad.exe and is called LinqPad.exe.config).


    In order for the context to be generated you need to pick an application from the server. The control is editable hence you can create a new application if you don’t want to make changes to an existing application. If you choose a new application name you will be prompted for confirmation before this gets created.

    Once you click OK the connection is created and you can start issuing queries against the remote server. If there’s any connectivity error the connection is marked with a red X and you can see the error message informing you what went wrong (i.e., the remote server could not be reached etc.).

    The context for remote servers

    Let’s take a look at what happens after we are connected successfully. Every LinqPad query runs inside a context – think of it as a class that wraps all the code that you’re writing. If you’re connecting to a live server the context will contain the following:

    • The application object itself.
    • All entities present in this application (sources, sinks, subjects and processes).

    The picture below shows a snapshot of the left pane of LinqPad after a successful connection. Every entity on the server has a different icon which will allow users to figure out its purpose. You will also notice that some entities have a string in parentheses following the name. It should be interpreted as such: the first name is the name of the property of the context class and the second name is the name of the entity as it exists on the server. Not all valid entity names are valid identifier names so in cases where we had to make a transformation you see both.

    Note also that as you hover over the entities you get IntelliSense with their types – more on that later.


    Remoting is not supported

    As you play with the entities exposed by the context you will notice that you can’t read and write directly to/from them. If for instance you’re trying to dump the content of an entity you will get an error message telling you that in the current version remoting is not supported. This is because the entity lives on the remote server and dumping its content means reading the events produced by this entity into the local process.


    Will yield the following error:

    Reading from a remote 'System.Reactive.Linq.IQbservable`1[System.Int32]' is not supported. Use the 'Microsoft.ComplexEventProcessing.Linq.RemoteProvider.Bind' method to read from the source using a remote observer.

    This basically tells you that you can call the Bind() method to direct the output of this source to a sink that has to be defined on the remote machine as well. You can’t bring the results to the LinqPad window unless you write code specifically for that.

    Compose queries

    You may ask – what's the purpose of all that? After all the same information is present in the EventFlowDebugger, why bother with showing it in LinqPad? First of all, What gets exposed in LinqPad is not what you see in the debugger. In LinqPad we have a property on the context class for every entity that lives on the server. Because LinqPad offers IntelliSense we in fact have much more information about the entity, and more importantly we can compose with that entity very easily.
    For example, let’s say that this code creates an entity:

    using (var server = Server.Connect(...))
        var a = server.CreateApplication("WhiteFish");

        var src = a
            .DefineObservable<int>(() => Observable.Range(0, 3))

    If later we want to compose with the source we have to fetch it and then we can bind something to


    This means that we had to know a bunch of things about this: that it’s a source, that it’s an observable, it produces a result with payload Int32 and it’s named “ObservableSource”. Only the second and last bits of information are present in the debugger, by the way.


    As you type in the query window you see that all the entities are present, you get IntelliSense support for them and it’s much easier to make sense of what’s available.

    Let’s look at a scenario where composition is plausible. With the new programming model it’s possible to create “cold” sources that are parameterized. There was a way to accomplish that even in the previous version by passing parameters to the adapters, but this time it’s much more elegant because the expression declares what parameters are required. Say that we hover the mouse over the ThrottledSource source – we will see that its type is Func<int, int, IQbservable<int>> - this in effect means that we need to pass two int parameters before we can get a source that produces events, and the type for those events is int – in the particular case of my example I had the source produce a range of integers and the two parameters were the start and end of the range. So we see how a developer can create a source that is not running yet. Then someone else (e.g. an administrator) can pass whatever parameters appropriate and run the process.


    Proxy Types

    Here’s an interesting scenario – what if someone created a source on a server but they forgot to tell you what type they used. Worse yet, they might have used an anonymous type and even though they can refer to it by name you can’t figure out how to use that type.

    Let’s walk through an example that shows how you can compose against types you don’t need to have the definition of. This is how we can create a source that returns an anonymous type:

    Application.DefineObservable(() => Observable.Range(1, 10).Select(i => new { I = i })).Deploy("O1");

    Now if we refresh the connection we can see the new source named O1 appear in the list. But what’s more important is that we now have a type to work with. So we can compose a query that refers to the anonymous type.

    var threshold = new StreamInsightDynamicDriver.TypeProxies.AnonymousType1_0<int>(5);

    var filter = from i in O1
                 where i > threshold
                 select i;


    You will notice that the anonymous type defined with this statement: new { I = i } can now be manipulated by a client that does not have access to it because the LinqPad driver has generated another type in its stead, named StreamInsightDynamicDriver.TypeProxies.AnonymousType1_0. This type has all the properties and fields of the type defined on the server, except in this case we can instantiate values and use it to compose more queries.

    It is worth noting that the same thing works for types that are not anonymous – the test is if the LinqPad driver can resolve the type or not. If it’s not possible then a new type will be generated that approximates the type that exists on the server.

    Control metadata

    In addition to composing processes on top of the existing entities we can do other useful things. We can delete them – nothing new here as we simply access the entities through the Entities collection of the application class. Here is where having their real name in parentheses comes handy.

    There’s another way to find out what’s behind a property – dump its expression. The first line in the output tells us what’s the name of the entity used to build this property in the context.


    Runtime information

    So let’s create a process to see what happens. We can bind a source to a sink and run the resulting process. If you right click on the connection you can refresh it and see the process present in the list of entities. Then you can drag the process to the query window and see that you can have access to process object in the Processes collection of the application. You can then manipulate the process (delete it, read its diagnostic view etc.).


    The StreamInsight Team

  • StreamInsight 2.1 Released

    The wait is over—we are pleased to announce the release of StreamInsight 2.1. Since the release of version 1.2, we have heard your feedbacks and suggestions and based on that we have come up with a whole new set of features. Here are some of the highlights:

    • A New Programming Model – A more clear and consistent object model, eliminating the need for complex input and output adapters (though they are still completely supported). This new model allows you to provision, name, and manage data sources and sinks in the StreamInsight server.
    • Tight integration with Reactive Framework (Rx) – You can write reactive queries hosted inside StreamInsight as well as compose temporal queries on reactive objects.
    • High Availability – Check-pointing over temporal streams and multiple processes with shared computation.

    Here is how simple coding can be with the 2.1 Programming Model:

    class Program
        static void Main(string[] args)
            using (Server server = Server.Create("Default"))
                // Create an app
                Application app = server.CreateApplication("app");
                // Define a simple observable which generates an integer every second
                var source = app.DefineObservable(() =>
                // Define a sink.
                var sink = app.DefineObserver(() =>
                    Observer.Create<long>(x => Console.WriteLine(x)));
                // Define a query to filter the events
                var query = from e in source
                            where e % 2 == 0
                            select e;
                // Bind the query to the sink and create a runnable process
                using (IDisposable proc = query.Bind(sink).Run("MyProcess"))
                    Console.WriteLine("Press a key to dispose the process...");


    That’s how easily you can define a source, sink and compose a query and run it. Note that we did not replace the existing APIs, they co-exist with the new surface.

    Stay tuned, you will see a series of articles coming out over the next few weeks about the new features and how to use them.

    Come and grab it from our download center page and let us know what you think! You can find the updated MSDN documentation here, and we would appreciate if you could provide feedback to the docs as well—best via email to

    Moreover, we updated our samples to demonstrate the new programming surface.

    The StreamInsight Team

  • Building the Internet of Things – with Microsoft StreamInsight and the Microsoft .Net Micro Framework

    Fresh from the press – The March 2012 issue of MSDN Magazine features an article about the Internet of Things. It discusses in depth how you can use StreamInsight to process all the data that is continuously produced in typical Internet of Things scenarios. It also gives you an end-to-end perspective on developing Internet of Things solutions in the .NET world, ranging from the .NET Micro Framework application running on the device, the communication between the devices and the server-side all the way to powerful cross-device streaming analytics implemented in StreamInsight LINQ.

    You can find an online version of the article here. Happy reading!

    The StreamInsight Team

  • LINQPad for StreamInsight 2.0

    This has come up several times in the wake of our 2.0 release: The new LINQPad driver supporting StreamInsight 2.0 is now live! You can download it as usual through the LINQPad UI:


    There are separate versions of the driver only because assembly references are not automatically forwarded for StreamInsight 2.0.

    The StreamInsight Team

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement