My thoughts on the field of Complex Event Processing. Since I'm a technical guy, they go mostly on the technical side. And mostly about my OpenSource project Triceps. To read the Triceps documentation, please follow from the oldest posts to the newest ones.

Friday, December 20, 2013

PowerShell

First, a status update: I've finally resumed the work on the docs for 2.0, albeit it's proceeding slowly yet.

I've been reading recently on PowerShell. A pretty cool tool. I've tried it before and it didn't work well for me then because I didn't understand its purpose. It's not a normal OS shell. Instead, it's the shell for the .NET virtual machine. Exactly the thing that Java is missing, and the gap that it tries to plug with the crap like Ant and Maven, unsuccessfully. PowerShell lets you run all the .NET methods interactively from the command line, and build the pipelines of them. It has some very cool syntax that lets you automatically apply the pipeline input in the same way as the command-line input. It also has the remote execution functionality, so it serves as an analog of the rsh/ssh (more advanced in some ways, less advanced in the others) in the Microsoft ecosystem.

But here is the CEP-related part: The Triceps TQL is not unique in handling the SQL-like queries in the form of pipelines. PowerShell does that too. I guess, it's a fairly obvious idea. You can even treat PowerShell as a rudimentary CEP system, and write the processing in the form of CEP pipelines. There is a major limitation that the pipelines are all linear, with no forking and joining, but on the other hand the pipelines can be used to pass the arbitrarily complex objects, and also a mix of objects of different types, so with some creativity the more complex topologies can be simulated (still no loops though).

Another catch with the pipelines is very similar to a current limitation of TQL: each stage of the pipeline works all by itself. In a SQL statement the query optimizer can turn a WHERE clause into an iteration by a small subset of an index. In TQL and PowerShell there will be an iteration on everything, followed by filtering in a WHERE. The grand plan for TQL is to add a query optimizer to it eventually, that could combine multiple sequential stages of the pipeline into one optimized stage. The other, simpler, alternative that I was considering for the short term is to specify the optimized selection manually as an option to the command that reads the table contents. PowerShell takes that one in many cases, so that say the command that pulls the data from a SqlServer database gets a full SQL query as an argument and does its filtering on the server. But I guess theoretically nothing really stops them from doing some pipeline optimization by pulling the WHERE conditions from a "where" command into the SQL statement for the database selection command. If would save the trouble of the SQL and "where" having different syntax.