The real-time stream processing analytics is of paramount significance for today‚Äôs agile business. There are many solutions out there like:
Amazon Kinesis Data Analytics¬†is available for a couple of years. ¬†Kafka‚Äôs¬†KSQL¬†has been released recently. Both products support¬†SQL¬†like continuous queries for data¬†filtering,¬†transformation,¬†aggregations,¬†windowing¬†etc.
However, it seems, up to my knowledge, one use case is not supported even if it is mentioned as a possible one in their documentation. That use case is detecting¬†missing events. In particular, through SQL like stream joins you can match an¬†order¬†with¬†shipment. However, I‚Äôm not aware that you can detect missing events. For example, this could be the scenario where we shall raise an alert if there is no¬†shipment¬†2 hours after an¬†order¬†has been received on the stream.
Surprisingly (or not) one open source¬†CEP¬†engine released under Apache V2 License supports the use case of detecting missing events. That product is¬†Siddhi, part of the¬†WSO2¬†platform:
You can download and try the “Hello World” example in a couple of minutes as explained in the article above.
WSO2 CEP documentation contains a recipe how to detect non-occurrences through pattern matching:
Let‚Äôs modify a little bit this example and run it in the¬†Siddhi¬†editor that contains input stream simulator as well. The¬†Siddhi¬†application is:
The trick here is defining the stream of expired events:
Once we have this stream, we can apply pattern matching that will try to match the arrival event with:
- Delivery event with the same tracking id
- Overdue event with the same tracking id
You can try the whole scenario in the¬†Siddhi¬†simulator by running the application. In the simulator, enter the following events at this particular order:
arrivals_stream: trackingId=100, customerName=Marjan
arrivals_stream: trackingId=200, customerName=Marjan
arrivals_stream: trackingId=300, customerName=Marjan
arrivals_stream: trackingId=400, customerName=Marjan
Wait a minute after that. The log output will look similar to this one:
The missing delivery events for tracking ids 200 and 300 have been detected.
It would be nice if we can get similar feature in KSQL in the near future. In the meantime,¬†Siddhi¬†provides connector for Kafka.