meekhttps://blogs.msdn.microsoft.com/meek
crossing the streamsThu, 12 Apr 2012 11:09:00 +0000en-UShourly1Dynamic Unionhttps://blogs.msdn.microsoft.com/meek/2012/04/12/dynamic-union/
https://blogs.msdn.microsoft.com/meek/2012/04/12/dynamic-union/#respondThu, 12 Apr 2012 11:09:00 +0000https://blogs.msdn.microsoft.com/meek/2012/04/12/dynamic-union/Let’s say there are several agents – e.g. devices – producing temporal streams. It may be interesting to merge these sequences into a single stream that can be processed by StreamInsight. The “union” operator allows you to merge a fixed number of inputs, but what happens when inputs come and go over time? Some policy is needed that can make sense of these dynamic inputs since they may disagree on how time is progressing. A “correct” policy that doesn’t propagate a CTI that might later be violated is no good: some new input can always come on line with an earlier CTI value so such a policy doesn’t allow any CTIs through. A “loose” policy that propagates all CTIs is useless because it assumes that all inputs are progressing in lockstep – implausible given network latency, clock drift, etc.

Temporal stream: a sequence of timestamped events. An event may be a Current Time Increment (CTI) event. A CTI is a promise that no subsequent events will have timestamps lower than that of the CTI.

A possible Goldilocks policy: allow for some maximum deviation between the most and least advanced input (let’s call this timespan the delay). Whenever a CTI event from any input is processed – and whenever that CTI is greater than any seen so far – subtract the delay so that the CTI doesn’t need to disqualify events from a less advanced input. If after delaying the CTI an incoming event still violates the CTI, drop or adjust the event.

Transform Subject

The ISubject<,> contract allows us to encapsulate the policy described above. Multiple producers can feed the subject (via calls to On*). Each producer can come and go at its own pace. How can we implement such a subject? Let’s start with a helper factory method that allows us to create a “transform subject”. This subject applies an arbitrary function to an input observable, where that function may represent a stateful computation.

Notice that this subject encapsulates two other subjects. One represents the input, a plain old Subject<> that allows input events to be passed to the transform logic. The other represents the output, a connectable observable that allows multiple consumers to read the output of the transform. When the subject is disposed, both the input subject and the output connection are released. A simple example of a transform subject that adds one to every incoming integer:

In case you haven’t encountered them before, a couple of Rx operators are worth calling out. First, we apply the Synchronize operator which serializes all incoming events – allows us to avoid any race conditions due to inputs running on different threads. Second, the Scan operator allows us to define an accumulator that tracks the highest CTI seen so far. If an event has a timestamp that precedes the highest CTI seen so far, it must be dropped to avoid a CTI violation.

Let’s look at how the subject behaves given interleaved events from two simulated input sources:

]]>https://blogs.msdn.microsoft.com/meek/2012/04/12/dynamic-union/feed/0StreamInsight Checkpoints: What, How and Why?https://blogs.msdn.microsoft.com/meek/2012/04/09/streaminsight-checkpoints-what-how-and-why/
https://blogs.msdn.microsoft.com/meek/2012/04/09/streaminsight-checkpoints-what-how-and-why/#respondMon, 09 Apr 2012 07:39:00 +0000https://blogs.msdn.microsoft.com/meek/2012/04/09/streaminsight-checkpoints-what-how-and-why/I’ve been fielding some questions this week on checkpoints in StreamInsight. I’ll share my way of thinking about checkpoints in the hopes that it will help others build applications leveraging the feature. First, I’ll define what a StreamInsight checkpoint represents. Then I’ll explain how to interpret the high-water mark information StreamInsight provides to input and output adapters when a query is resumed after StreamInsight downtime (planned or unplanned). There’s an important piece of trivia here that may surprise even veteran users!

The MSDN documentation outlines three levels of resiliency. In this post I will focus on the strictest resiliency level, but the concepts outlined are relevant for all three.

Think of a StreamInsight query as a black box. It takes a one or more sequences of events as inputs. It produces a sequence of events as output (in the upcoming 2.1 StreamInsight release, multiple output sequences are supported as well) *. It probably comes as no surprise that the black box contains some state. If your query is computing averages over windows, incoming events will contribute to some number of windows. Until the average over a particular window can be committed to the output, sums and counts need to be maintained internally. When you checkpoint a query, you’re really just saving the internal state of the query. It’s that simple!

Well, almost. In addition to the query state, a checkpoint captures the position of input and output sequences as of the checkpoint**. For example, if a checkpoint could speak it might say “between enqueuing input events x2 and x­3, and between dequeing output events y2 and y3, this was the state of the query”.

After a StreamInsight server instance has been stopped, a query can be resumed using the checkpoint state. Input and output adapters need to do some work as well. After the checkpoint was taken (but before the server instance was stopped), the input sequence may have progressed (let’s say x3 was enqueued) and the output sequence may have progressed as well (let’s say y3 was dequeued). Ideally, the input adapter would then replay x3 and subsequent events. And the ideal output adapter would forget that it had ever seen y3 or anything after it.

Instead of forgetting – by, say, deleting rows from a table or removing lines from a log file – the output adapter may instead choose to suppress output events it knows have been emitted already. The latter approach is relatively difficult to get right however:

The output may include multiple events with the same timestamp(s) and payload value. Consider the case where the checkpoint is taken between two identical events with same timestamp. A naïve de-duplication implementation may incorrectly suppress the second event because the first (equivalent) event has been emitted already.

While StreamInsight deterministically produces logically consistent results, there is no guarantee about the specific sequence of output events between runs. The number of CTIs may vary. The order of events with the same timestamp may also vary between runs (or even the order of events with different timestamps when using StreamEventOrder.ChainOrdered).

The output adapter needs to reliably determine which events have already been emitted.

In any case, the rule of thumb when resuming a query:

Input adapters must replay events after the checkpoint; output adapters must forget events after the checkpoint.

StreamInsight provides a high-water mark (HWM, pronounced huh-wim I think) value to input adapters that can be used to determine where in the input sequence a checkpoint occurred (see IHighWaterMarkInputAdapterFactory and IHighWaterMarkTypedInputAdapterFactory). Output adapters get both an HWM value and an offset (see IHighWaterMarkOutputAdapterFactory and IHighWaterMarkTypedOutputAdapterFactory). How can these values be used to identify an element of a sequence? First, let’s define HWM. An event xi has high-water mark value h if its timestamp is h and for all j < i, xj has a lower timestamp. Whenever an event in a sequence has a higher timestamp than any preceding event, it has a HWM value. Assuming that your input sequence is conveniently described by an IEnumerable<PointEvent<T>> sequence xs, you can find the event corresponding to an hwm as follows:

var inputCheckpointEvent = xs.FirstOrDefault(x => x.StartTime == hwm)

Assuming that the output from the query is captured in a (resilient and persistent) List<PointEvent<T>>, and given hwm and offset values for an output sequence, you just need to pop ahead offset positions to find the checkpoint event:

I promised a piece of trivia. While I’ve shown you how to find the “checkpoint event”, I haven’t told you where the checkpoint occurred: did it occur before the checkpoint event, or after? It turns out that the answer is before for an input checkpoint event but after for an output checkpoint event. Returning to the earlier example, if the checkpoint occurred between input events x2 and x3, the checkpoint event will be x3, but if the checkpoint occurred between output events y2 and y3, the checkpoint event will be y2! Again using the simple IEnumerable<> contract, I can demonstrate how replay works:

Notice that by replaying input events and forgetting output events, we see correct output after resuming the query!

* Notice that I’m talking about “sequences of events” rather than “temporal streams”. While StreamInsight operators are defined in terms of temporal streams – sequences of timestamped events with ordering constraints imposed by common time increments (or CTIs) – a checkpoint can be understood relative to generic sequences.

** StreamInsight doesn’t actually freeze the query operators, inputs and outputs in order to take a snapshot of the internal state, but this is a useful and accurate way of characterizing the logical contract for checkpoints. Details of the internal mechanism aren’t covered in this blog post.

When bridging between temporal streams (CepStream), pull-sequences (IEnumerable), and push sequences (IObservable) using the StreamInsight sequence integration feature, handling of computation lifetime is automatic: consuming results from a query generally kicks off upstream queries*, and disposing or deleting releases all upstream resources as well. While most of the time the policies governing lifetimes “just work”, the resulting behaviors sometimes confuse StreamInsight developers. Consider the following example:

Similar to the previous example, you’ll notice that the call to query.Start() does not return until all of the input has been consumed.

The following calls are used to initiate a computation in StreamInsight: Query.Start(), IEnumerable<>.GetEnumerator(), and IObservable<>.Subscribe(). As the examples 1 and 2 suggest, we’re particularly interested in when these calls return. First, we need to understand that the Reactive Framework embraces the “least concurrency” principle in its design, which can be roughly summarized as “don’t introduce concurrency if you don’t need to”. The MSDN article on “Using Schedulers” gives additional background; you may also consult the Rx Workshop: Schedulers on Channel 9 for an in-depth look.

What does this principle imply about the above examples?

Let’s first examine the ToObservable() method. By default, it uses the current thread scheduler so the work of consuming the source iterator happens on the “subscribe” thread! By the time the subscribe call returns, all 1024 source elements have already been processed.

What is the effect of the temporal query sandwiched between the observable source and published stream output in example 2?

When query.Start() or another ‘initiating’ method like GetEnumerator() or Subscribe() is called, the method doesn’t return until all inputs to the query have been initiated. This behavior allows the caller to know when a temporal query is primed to accept input. It also means that -- depending on the nature of the sources feeding your query -- you may need to wait a long time for the call to return!

How can you prevent generators like Observable.Range() and Observable.ToObservable() from blocking initiation requests?

We can manually introduce the desired concurrency by passing in a different scheduler! For instance, we could modify example 1 so that events are enqueued on thread pool threads rather than the current thread:

Apart from modifying the behavior of Subscribe(),the intervening buffer used by ToEnumerable() also makes calls to On* asynchronous.

In the pull-sequence space, propagation of initiation requests looks a little bit different. Because LINQ to Objects mostly follows yield return semantics, calls to GetEnumerator() do not in fact propagate immediately! It is not until the first call to MoveNext() on the iterator object that the compiler-generated iterator state machine advances beyond the GetEnumerator() calls for inputs. This behavior contrasts with those of Reactive and StreamInsight operators whose inputs are (generally) spun up immediately. There’s even an exception to the exception: for LINQ to SQL and LINQ to Entities queries, calls to GetEnumerator() immediately result in ExecuteReader() calls. Our hope is that the exact phase of the initiation is not as important for pull-based sources as it is for potentially hot push-based sources. In the latter case, it may be important to ensure that a downstream consumer is active before the upstream producer begins generating output**.

Let me know if you’d like to see a follow-up post on end-of-life concerns for computations in StreamInsight (auto-detach, synchronization, etc.)!

]]>https://blogs.msdn.microsoft.com/meek/2012/03/21/sequences-and-streaminsight-initiating-a-computation/feed/0LINQ “Macros” in StreamInsight 1.2: Left Outer Joinhttps://blogs.msdn.microsoft.com/meek/2011/07/25/linq-macros-in-streaminsight-1-2-left-outer-join/
https://blogs.msdn.microsoft.com/meek/2011/07/25/linq-macros-in-streaminsight-1-2-left-outer-join/#respondMon, 25 Jul 2011 10:56:03 +0000https://blogs.msdn.microsoft.com/meek/2011/07/25/linq-macros-in-streaminsight-1-2-left-outer-join/In an earlier post, I discussed implementation of custom query operators that combine existing built-in operators. In StreamInsight 1.2, we have made some changes to simplify implementation of custom operators. In the running example from the previous post, I showed how you can manually construct an expression tree representing a Left Anti Semi Join (LASJ), which required a fair amount of code in StreamInsight 1.1:

In StreamInsight 1.2, we have added support for invocation expressions. Take a look at Joe and Ben Albahari’s article on Dynamically Composing Expression Predicates for an idea of how to use this feature. Code fragment 1 can be rewritten as:

The invocation expression has been highlighted. Notice that we Compile the predicate expression. What does this call represent in a StreamInsight query? StreamInsight recognizes the Compile method but instead of compiling to a CLR delegate – an opaque delegate that cannot be evaluated remotely – it “compiles” a StreamInsight query in which the predicate expression has been inlined.

Now let’s take our LASJ operator and use it to construct a Left Outer Join (LOJ) operator. For most LINQ providers, there are several ways of describing LOJ, usually involving the DefaultIfEmpty operator. StreamInsight does not have a DefaultIfEmpty operator. Asking if an infinite stream is empty is potentially dangerous* so we’ll try something different. LOJ can also be written as the union of an inner join and LASJ, as in the following example:

// left elements with matching right elements: var inner = from l in left from r in right where predicate.Compile().Invoke(l, r) select innerSelector.Compile().Invoke(l, r);

return outer.Union(inner); }

Code fragment 3: LOJ in StreamInsight 1.2

Notice that you can use either the delegate “Invoke” method as in the LeftOuterJoin example:

predicate.Compile().Invoke(l, r)

or the more traditional invocation syntax used in the LeftAntiSemiJoinExample:

predicate.Compile()(l, r)

The “selector” expressions produce identically-typed results for the case where the left-hand side has a matching event on the right-hand side (innerSelector) and the case where it does not (outerSelector), as in the following example:

I mentioned a limitation on macro use within lambda bodies in my previous post. This limitation no longer exists: when StreamInsight encounters an unrecognized method (e.g. LeftOuterJoin in the above example) returning a stream definition, it will invoke the method to see what it returns, on the grounds that doing so may be better than failing.

* Yes, there’s something odd about StreamInsight’s IsEmpty operator. It is permitted strictly in the context of the LASJ pattern where it asks not whether a stream is empty in its entirety, but when the stream is empty. Appropriate for a query language with a temporal bias!

]]>https://blogs.msdn.microsoft.com/meek/2011/07/25/linq-macros-in-streaminsight-1-2-left-outer-join/feed/0LINQ Queries as Streams and Thread Safetyhttps://blogs.msdn.microsoft.com/meek/2011/01/27/linq-queries-as-streams-and-thread-safety/
https://blogs.msdn.microsoft.com/meek/2011/01/27/linq-queries-as-streams-and-thread-safety/#respondThu, 27 Jan 2011 11:32:00 +0000https://blogs.msdn.microsoft.com/meek/2011/01/27/linq-queries-as-streams-and-thread-safety/A few weeks ago, Torsten Grabs – a colleague on the StreamInsight team and accomplished violist – came to me with a potential bug. He was seeing an ADO.NET SqlClient exception when running a StreamInsight query: “There is already an open DataReader associated with this Command which must be closed first.” I suggested he enable Multiple Active Result Sets (MARS) in his SQL connection and this appeared to solve the problem. Digging a little deeper, a few issues occurred to us that will affect StreamInsight and Rx queries using external data sources. Before starting to dig, let’s dissect Torsten’s code to understand what’s happening.

(StreamInsight 1.1 adds support for .NET event sequences. Using the ToStream, ToPointStream, etc. extension methods, arbitrary LINQ queries (IEnumerable<>, IQueryable<>, IObservable<>, or IQbservable<>) can be turned into event streams processed by the StreamInsight engine. For a quick introduction to the feature, see my previous post.)

First of all, what’s a SqlClient exception doing in a StreamInsight query? StreamInsight doesn’t leverage the SqlClient libraries, or SQL Server for that matter. The exception originates in the LINQ to Entities query source and passes through the StreamInsight query. When the StreamInsight query is evaluated, it automatically triggers evaluation of the underlying LINQ to Entities queries where the problem occurs.

Why does this program result in multiple simultaneous data readers? The behavior of ToPointStream is the key. StreamInsight turns an IEnumerable into a stream by pulling on its iterator from a dedicated thread. In the above example, the same SQL connection is being used by two data readers simultaneously because stream1 and stream2 are being processed in parallel by the StreamInsight query.

By enabling MARS (setting MultipleActiveResultSets=true in our SQL connection string), we can partially address the problem Torsten encountered. However, you need to look closely at the thread-safety boilerplate in MSDN for the Entity Framework. In summary, it isn’t. In practice, you may see intermittent failures if you attempt to access the same ObjectContext simultaneously on two threads. See the EF FAQ site for more information.

]]>https://blogs.msdn.microsoft.com/meek/2011/01/27/linq-queries-as-streams-and-thread-safety/feed/0LINQ Macroshttps://blogs.msdn.microsoft.com/meek/2011/01/05/linq-macros/
https://blogs.msdn.microsoft.com/meek/2011/01/05/linq-macros/#commentsWed, 05 Jan 2011 13:30:20 +0000https://blogs.msdn.microsoft.com/meek/2011/01/05/linq-macros/A colleague was asking how to construct a particular LINQ “operator macro” today. Basically, he was finding it inconvenient to repeat boilerplate for particular operator patterns in his code, but was struggling to inline expression snippets into LINQ query expressions. Thought I’d share a sample macro because the pattern is generally useful. I’ll illustrate a simple Left Anti Semi Join (LASJ) operator macro, but you can hopefully recognize other possibilities as well.

The LASJ operator takes as arguments a “left” collection, a “right” collection and a join predicate. It returns all elements on the left that have no matching elements on the right. My first attempt uses the LINQ “Any” operator:

publicstatic IEnumerable<TLeft> LeftAntiSemiJoin<TLeft, TRight>(

this IEnumerable<TLeft> left,

IEnumerable<TRight> right,

Func<TLeft, TRight, bool> predicate)

{

return left.Where(l => !right.Where(r => predicate(l, r)).Any());

}

Works fine, but what if I want to issue a database query? In the above example, I’m potentially treating IQueryable collections as in-memory IEnumerable collections, which is horribly inefficient. LINQ to Objects will scan the contents of the left collection and for every element will scan the contents of right collection until it finds an element matching the predicate. Let’s change the operator slightly to address this concern:

publicstaticIQueryable<TLeft> LeftAntiSemiJoin<TLeft, TRight>(

thisIQueryable<TLeft> left,

IQueryable<TRight> right,

Func<TLeft, TRight, bool> predicate)

{

return left.Where(l => !right.Where(r => predicate(l, r)).Any());

}

Unfortunately, I now get a NotSupportedException at runtime from LINQ to SQL: “Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.” The problem is the opaque predicate argument, which cannot be evaluated remotely on SQL Server. So let’s turn the predicate into a lambda expression:

publicstatic IQueryable<TLeft> LeftAntiSemiJoin<TLeft, TRight>(

this IQueryable<TLeft> left,

IQueryable<TRight> right,

Expression<Func<TLeft, TRight, bool>> predicate)

{

return left.Where(l => !right.Where(r => predicate(l, r)).Any());

}

Still no luck: this time the C# compiler complains that “'predicate' is a 'variable' but is used like a 'method'”. We’ll need to manually inline the predicate expression instead:

·Notice that this approach does not rely on Expression.Invoke. Some LINQ providers (LINQ to Entities and LINQ to StreamInsight among them) do not support invocation expressions. By reusing the existing parameter expressions (leftPrm and rightPrm) from the predicate argument within the constructed expression, I avoid the need to “rebind” any parameters. See my earlier post for a more general workaround to the invocation expression limitation.

·One of my favorite tricks: rather than relying on Type.GetMethod and MethodInfo.MakeGenericMethod, I’m retrieving MethodInfos from typed delegates. More robust than the more conventional solution because the C# compiler statically binds to the appropriate method signature.

A gotcha: if you attempt to use this (or other) operator macros within a lambda body, some LINQ providers will balk at the unrecognized method. For instance, LINQ to SQL is fine with the following usage of the macro because it doesn’t see the LeftAntiSemiJoin method call in the resulting expression tree (the macro is expanded before reaching LINQ to SQL):

You can work around this (intentional) limitation by assigning the expanded query to a local variable (we’ve already assigned the expanded version to query1 so we can reuse it here):

var query3 = from c1 in Categories

from c2 inquery1

select c1;

StreamInsight implements a temporal version of LASJ as well. It’s similar to the familiar SQL Server operator but returns left-hand events for time intervals during which no corresponding right-hand event exists. The corresponding macro operator is shown below. The only significant difference from the IQueryable version is that the IsEmpty stream operator is used in place of the Any sequence operator:

]]>https://blogs.msdn.microsoft.com/meek/2011/01/05/linq-macros/feed/1StreamInsight Sequence Integration: Five Easy Pieceshttps://blogs.msdn.microsoft.com/meek/2010/10/27/streaminsight-sequence-integration-five-easy-pieces/
https://blogs.msdn.microsoft.com/meek/2010/10/27/streaminsight-sequence-integration-five-easy-pieces/#commentsWed, 27 Oct 2010 15:44:29 +0000https://blogs.msdn.microsoft.com/meek/2010/10/27/streaminsight-sequence-integration-five-easy-pieces/Over the past few weeks, I’ve spent time building a handful of applications using the new sequence integration APIs in StreamInsight 1.1. I think StreamInsight veterans will be pleasantly surprised at the seamlessness of the experience! If you’re new to StreamInsight, now’s your chance to quickly build a temporally aware application. In this post, I’ll walk through five components of a typical end-to-end StreamInsight query, from event source to event sink.

First, take some time to download the latest version and kick the tires:

.NET 4.0 Download (.NET 3.5 SP1 is sufficient if you are not using IObservable<> event sources or event sinks).

Now you're ready to build a StreamInsight application. A few considerations:

If you are creating a .NET 4.0 application, make sure it is targeting the full .NET 4.0 profile, not the client profile. Configure the target framework through project properties in Visual Studio 2010.

Add a reference to the Microsoft.ComplexEventProcessing assembly. If you plan on using an IObservable<> event source or event sink, add a reference to the Microsoft.ComplexEventProcessing.Observable assembly as well.

Use the Microsoft.ComplexEventProcessing and Microsoft.ComplexEventProcessing.Linq namespaces. The first allows you to embed and manage a StreamInsight server. The second exposes the StreamInsight LINQ dialect.

Now we'll travel downstream, using the SequenceIntegration\Northwind sample as our guide.

Event Sources

Event sources for a query can be based on custom input adapters, other StreamInsight queries, or – new in version 1.1 – .NET IObservable<> and IEnumerable<> sequences. The good news: in the .NET world, IEnumerable<>pull-based sequences are pervasive. SQL, OData, Sharepoint, you name it. Better news from the perspective of a Complex Event Processing system: asynchronous and push-based sequences can be easily exposed via the IObservable<> interface, particularly if you take advantage of the .NET Reactive Framework (Rx).

In this simple example, we use two OData service queries as our event sources:

The orderStartTimes and orderEndTimes queries implement IEnumerable<> which makes them suitable event sources for StreamInsight.

Temporal streams

An IObservable<> or IEnumerable<> event source can feed a temporal stream. A temporal stream is a sequence of events annotated with temporal information: timestamps for events and punctuation indicating when a particular point in time has been committed. In the above example, we have two sources orderStartTimes and orderEndTimes each including timestamp fields – StartTime and EndTime respectively – as well as a commitment based on orderby clauses that events timestamps are monotonic. We describe these temporal characteristics to StreamInsight using the ToPointStream method:

The arguments to the ToPointStream extension method are described below:

IEnumerable<T> source: the event source. In this case, the source is an OData query.

Application application: the StreamInsight application that will host the temporal query.

Func<TInput, PointEvent<TPayload> selector: takes an element of the source and turns it into a temporal event. The first selector loosely reads: “the event s happened at the point in time s.StartTime”. The first argument to PointEvent.CreateInsert indicates the timestamp for the event, and the second argument describe the payload of the event – in this case the entire row.

Aside: StreamInsight has restrictions on payload types. Basically, the payload must consist of a type or struct with only “primitive” (string, number, etc.) fields and properties. See the Payload Field Requirements section @ http://msdn.microsoft.com/en-us/library/ee378905.aspx for details. The event selector can be used to reshape your inputs to a supported payload type.

AdvanceTimeSetting advanceTimeSettings: an optional parameter that describes a policy for automatically generating punctuation. In the above example, we indicate that timestamps are increasing in the input. StreamInsight can then automatically generate Current Time Increment (CTI) punctuation for each event: “we commit to everything before the event’s timestamp”.

string streamName: an optional parameter allowing you to assign a name to the input stream. I have not specified a stream name in the above example. This feature is particularly useful if you’re importing punctuations from one stream to another (see the SequenceIntegration\PerformanceCounters sample for instance).

Note that there are several variations on the To*Stream method supporting IObservable<> or IEnumerable<> event sources and the shaping of point, interval or edge data.

Temporal query

Now that we have described the temporal characteristics of our event sources, we can compose a StreamInsight query. I’ll simply copy the code here without too much explanation since I’m focused on data ingress and egress in this post:

// Use clip to synthesize events lasting from the start of each order to the end
// of each order.
var clippedStream = startStream
.AlterEventDuration(e => TimeSpan.MaxValue)
.ClipEventDuration(endStream, (s, e) => s.OrderID == e.OrderID);
// Count the number of coincident orders per region
var counts = from o in clippedStream
group o by o.ShipRegion into g
from win in g.SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
select new { ShipRegion = g.Key, Count = win.Count() };
// Display output whenever there are more than 2 active orders in a region.
const int threshold = 2;
var query = from c in counts
where c.Count > threshold
select c;

Event sink

Creating an event sink is straightforward. Several extension methods support the transformation of a temporal query (CepStream<>) to an event sink, with support for permutations of IObservable or IEnumerable sequences and TPayload, PointEvent<TPayload>, IntervalEvent<TPayload> or EdgeEvent<TPayload> elements. In the following example, we translate the query to a sequence of interval events using ToIntervalEnumerable. We then filter out insert events – skipping CTI punctuation – and project out relevant temporal and payload fields:

Consuming results

Now that we’re back in the world of .NET sequences, there are many possibilities for consuming the results. For now, we’ll just write the event sink contents to the console:

foreach (var r in sink)
{
Console.WriteLine(r);
}

Interestingly, calling GetEnumerator on sink triggers a sequence of actions:

A stream query is deployed to the embedded server and started, which implicitly spins up the query inputs, triggering…

calls to GetEnumerator on the event sources we defined earlier, which…

causes a query against the OData service to be executed.

StreamInsight queries now composes seamlessly with other LINQ providers! In fact, if you review the code above you can see that we’ve leveraged LINQ to OData, LINQ to Objects, and LINQ to StreamInsight. The SequenceIntegration samples linked above illustrate some other integration possibilities as well:

A WPF control that observers an IObservable event sink.

An IObservable event source that polls performance counters.

An IEnumerable event source that reads the contents of a file.

There – you have the five easy pieces! Comes with a side-order of toast.

But there’s a hitch. What does“Default” mean? If you’ve spent time with the StreamInsight samples, you might imagine the “Default” incantation has a special meaning. The verification tests we run on the StreamInsight team require that you install a server instance named “Default”, and – not coincidentally – this tends to be the choice of instance name in StreamInsight samples as well. Let’s look at how to work with, discover, and understand server instances. First, some general notes:

A SQL Server 2008 R2 license is required to run StreamInsight, either as a dedicated service or an embedded server. The instance name passed to the Server.Create method associates the embedded server with a particular license.

There are two installers. One includes only the client libraries that can be used to connect to an existing StreamInsight service. The other installs the same libraries but additionally registers a server instance and (optionally) a service. The stand-alone service is optional because some users may want to just embed StreamInsight.

A StreamInsight server instance carefully controls resource allocation – memory, cores – for the stream queries it hosts. As a result, it is best to embed a single server instance in your application. Otherwise, the instances will compete with each other for resources, and no one wins.

If you write an application that embeds StreamInsight, you need to know the name of the corresponding instance. But what if your application is running on a strange machine? The usual official advice applies: use app/web config files to store the instance name for Server.Create. You can configure the management service URI used by Server.Connect in the same way.

Now the unofficial advice.

Server instance information is stored in the registry. Launch regedit and take a look under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft StreamInsight to see examples. Looking up values in the registry can be hazardous, particularly if you want to be robust to permutations of 32/64-bit operating system, StreamInsight install, and .NET application. Shahar Prish’s advice on the topic gives an idea of the challenges, but fortunately there is an easier alternative in .NET 4.0. The new OpenBaseKey method allows you to explicitly select a 64-bit registry view (I was pretty excited to find this – as far as I can tell, there were no announcements of this extension to the .NET registry key APIs). Here’s a utility you can use to reliably retrieve StreamInsight instance data:

I’ll call out one trick in the implementation: the ToString() override uses an anonymous type to format output. The compiler auto-generates a ToString() implementation for anonymous types, so we can write:

Let me know if you have any questions on StreamInsight instances! If I can’t answer them I can hopefully find someone who can.

]]>https://blogs.msdn.microsoft.com/meek/2010/10/26/using-and-discovering-streaminsight-instances/feed/0Scripting StreamInsight querieshttps://blogs.msdn.microsoft.com/meek/2010/08/31/scripting-streaminsight-queries/
https://blogs.msdn.microsoft.com/meek/2010/08/31/scripting-streaminsight-queries/#commentsTue, 31 Aug 2010 13:05:00 +0000https://blogs.msdn.microsoft.com/meek/2010/08/31/scripting-streaminsight-queries/Over the past couple of weeks, a handful of people have asked for help dynamically creating StreamInsight queries. I usually scrawl some boxes and arrows on the whiteboard and say “you could try something like this…” My hand-waving hasn’t been very helpful. I’ll write some code instead…

A StreamInsight query includes a “template” definition that essentially describes the operator tree. Normally, a developer describes a template using a LINQ query (StreamInsight understands a LINQ dialect for streaming/temporal queries). The LINQ query is then translated into an XML document (schema details are available here) which is compiled and executed by the StreamInsight engine. In one sense, you can see that StreamInsight already supports dynamic queries since an application can construct a query operator tree using the XML specification language directly. A visual design surface or DSL could potentially be used to generate the XML. My preferred general-purpose stream query language is LINQ however, so I’ll instead consider what it takes to dynamically generate a LINQ query using StreamInsight.

Anders Hejlsberg’s talk on The Future of C# gives a possible answer. Using the proposed C# eval facility, I could write something like:

Since I don’t want to wait for C# 5.0, I’ll use a more specialized “scripter” tool designed specifically for StreamInsight query templates:

// initialize a scripter

var scripter = newQueryTemplateScripter();

// add input streams

scripter.AddStream("input1", typeof(EventType1));

// script query

var template = scripter.CreateQueryTemplate(app, "MyTemplate", null,

"from i in input1 where i.X > 10 select i");

The scripter allows you to:

reference assemblies (the required StreamInsight assemblies are referenced by default);

register input streams for your query using the AddStream method;

add “using” namespaces (the expected namespaces are included by default), and finally;

create a query template based on a LINQ query specification.

The scripter essentially inlines the LINQ query into a program containing any event definitions (in the above example, we are merely referencing an existing type so no definition is required) and a query “context” that passes the LINQ query definition to an existing Application.CreateQueryTemplate method. An example of a generated program is shown below:

Such programs are generated, compiled and the query template creation method is dynamically invoked to register a new query template in StreamInsight. This approach has some limitations:

Compiler error reporting may or may not be useful. I use #line pragmas in the generated code to pinpoint the location of any local errors in the user’s query, but the LINQ query definition may not be a well-formed expression. Consider what would happen if you were to call scripter.CreateQueryTemplate(app, “q”, null, “input1; return null }”);

Related to the above point, there is an injection risk. The usual common sense security rules apply: do not evaluate code from untrusted sources.

Every call to CreateQueryTemplate results in the creation of a new assembly that remains loaded for the lifetime of the app domain.

Full source code for the scripter is attached. The event definition code is intentionally extensible to support specification of stream event types using something other than System.Type. Possible alternatives: CepEventType or some other representation of the event record layout.

]]>https://blogs.msdn.microsoft.com/meek/2010/08/31/scripting-streaminsight-queries/feed/5EF Extensions for Visual Studio 2010https://blogs.msdn.microsoft.com/meek/2010/08/19/ef-extensions-for-visual-studio-2010/
https://blogs.msdn.microsoft.com/meek/2010/08/19/ef-extensions-for-visual-studio-2010/#commentsThu, 19 Aug 2010 08:32:26 +0000https://blogs.msdn.microsoft.com/meek/2010/08/19/ef-extensions-for-visual-studio-2010/The EF Extensions sample has been updated for Visual Studio 2010. It’s available for download here. Nothing new in this release… some features have actually been removed because they’re no longer needed in .NET 4.0.

The EF proper now includes an ObjectSet<TEntity> class that makes the old EntitySet<TEntity> EFExtensions class redundant. The FindOrAttach and GetTrackedEntities methods from the original EFExtensions sample are preserved as extension methods over ObjectSet, but the EntitySet class is no more. A cheat sheet for people upgrading from EntitySet to ObjectSet (just some name changes):

Before

After

Microsoft.Data.Extensions.EntitySet<T>

System.Data.Objects.ObjectSet<T>

InsertOnSaveChanges(T)

AddObject(T)

DeleteOnSaveChanges(T)

DeleteObject(T)

FindOrAttach(T)

extension method in DataExtensions

GetTrackedEntities(T)

extension method in DataExtensions

Metadata

EntitySet

The Zip helper method is redundant with System.Linq.Enumerable.Zip.

The LINQ ExpressionVisitor class is now public, so Microsoft.Data.Extensions.ExpressionVisitor has been replaced with System.Linq.Expressions.ExpressionVisitor.

Quick aside on ExpressionVisitor: a couple of devs have mentioned a “bug” to me in the shipping ExpressionVisitor implementation. It’s actually a bug fix relative to the visitor code that has circulated via various blogs (including this one). The VisitMemberInit() method now visits the MemberInitExpression.NewExpression expression but requires that the returned expression is a NewExpression. This may cause trouble for partial evaluator implementations that unwittingly assume that, for instance, ‘new Foo { X = … }’ can be rewritten as ‘Constant(new Foo()) { X = … }’.

Although the EF now supports richer mapping of stored procedures and includes some ‘drill-through-to-the-underlying-store-provider’ surface, a few remaining gaps allowed Materializer, CreateStoreCommand and company to survive the chopping block in this release of EFExtensions. Some remaining benefits of the sample library: multiple result sets from stored procedures, the power of a complete programming language to describe result transformations, and support for arbitrary result shapes.