Looks like Sybase is in the process of acquiring Aleri.
It’ll be interesting to see how Sybase manages Aleri and Coral8 products, in both development and deployment (licensing).
So I recently had a chance to play with StreamInsight CTP 2. Overall, it’s a good offering from MSFT but they have a ways to go before they can catch up with the competitors. Here’re some of my initial thoughts.
- CTIs are powerful but may prove to be a handful for developers to implement correctly.
- For the financial sector, one area they may affect is if one was to move end-of-day reconciliation or pricing processing to StreamInsight. Most of these CSVs from dealers will contain out of order marks/prices/trades. But with full control over the input adapter which will be used to turn static data into streaming data, we could easily issue a CTI after the CSV has finished uploading. However, moving this processing to StreamInsight is not something I’d recommend. Firstly, it’s due to the static->realtime data. Secondly, and more importantly, given how easily and frequently the EOD processing can break due to bad formats coming in from the dealers, it makes sense to leave this processing to SSIS packages. You wouldn’t want a developer to have to crack open an IDE each time this processing breaks.
- Since all of market data is chronological, it is necessary for the adapter developer to issue a CTI pulse after each tick is received so that it will appear in the input stream. It seems to me that this use case can be made easier if MSFT was to create a setting where these CTIs are automatically handled.
- CTIs make unordered time based edge events pretty hard to implement. Normally, one would set up time based patterns within a window to watch for either the edge start or end condition to occur. With StreamInsight, one would either have to move this logic to the adapter or issue the CTI immediately and then work with the event in the engine.
- It would be useful if StreamInsight allowed for a way to either handle out-of-order events chronologically (i.e. manage a different timestamp) or simply drop the event altogether (based on some setting).
- At present, it seems that there is a 1:1 mapping between an adapter and a stream. Hooking up multiple adapters on both input and output sides of a stream is a must-have.
- On the input side, multiple adapters can be used to normalize data into a common schema. As my colleague Kishor pointed out, there’s a way to fold multiple streams into one with LINQ. But this would result in adapter code ending up side by side with engine code.
- On the output side, for example, we’d want to hook up adapters for a log writer, db writer, along with one for a messaging bus before we feed the data into the next stream or query.
- Unit testing: This is often a sore point with CEP applications. CTP2 is completely lacking in this area at the moment. In my opinion, providing support for unit tests would be a big win for MSFT.
- Managing the adapter machine state is messy. Copy/pasting boilerplate-like code will create maintenance issues. I suppose we could abstract this in a base class but given that each adapter is likely to have its own custom cleanup code, I’m not sure I’ll gain much once I’ve done adding the equivalent events/delegates to pass control to the sub classes.
I haven’t yet found too much information about the behind-the-sceen storage and processing technology but am definitely interested in it. While it may not be a real-time CEP system like the LHC (and here), it still wouldn’t be your average large scale, high volume system. From a press release:
Over the course of the Kepler mission, NASA Ames anticipates requiring between 30 and 90 terabytes of capacity to allow storage and analysis of images captured by the telescope. The precise amount of capacity actually required depends on several variables so it cannot be determined at the outset.