CEP: Detecting Missing Events

The real-time stream processing analytics is of paramount significance for today’s agile business. There are many solutions out there like:

https://aws.amazon.com/kinesis/data-analytics/

https://www.confluent.io/product/ksql/

Amazon Kinesis Data Analytics is available for a couple of years.  Kafka’s KSQL has been released recently. Both products support SQL like continuous queries for data filtering, transformation, aggregations, windowing etc.

However, it seems, up to my knowledge, one use case is not supported even if it is mentioned as a possible one in their documentation. That use cases is detecting missing events. In particular, through SQL like stream joins you can match an order with shipment. However, I’m not aware that you can detect missing events. For example, this could be the scenario where we shall raise an alert if there is no shipment 2 hours after an order has been received on the stream.

Surprisingly (or not) one open source CEP engine released under Apache V2 License supports the use case of detecting missing events. That product is Siddhi, part of the WSO2 platform:

https://wso2.github.io/siddhi/documentation/siddhi-quckstart-4.0/

You can download and try the “Hello World” example in a couple of minutes as explained in the article above.

WSO2 CEP documentation contains a recipe how to detect non-occurrences through pattern matching:

https://docs.wso2.com/display/CEP400/Sample+0111+-+Detecting+non-occurrences+with+Patterns

Let’s modify a little bit this example and run it in the Siddhi editor that contains input stream simulator as well. The Siddhi application is:

The trick here is defining the stream of expired events:

<script src=”https://gist.github.com/renatagosevska/f246ebcc1ce5dc96bea1a70305092344.js”></script>

Once we have this stream, we can apply pattern matching that will try to match the arrival event with:

  • Delivery event with the same tracking id
  • Overdue event with the same tracking id

You can try the whole scenario in the Siddhi simulator by running the application. In the simulator, enter the following events at this particular order:

arrivals_stream: trackingId=100, customerName=Marjan

arrivals_stream: trackingId=200, customerName=Marjan

arrivals_stream: trackingId=300, customerName=Marjan

arrivals_stream: trackingId=400, customerName=Marjan

deliveries_stream: trackingId=100

deliveries_stream:trackingId=400

Wait a minute after that. The log output will look similar to this one:

The missing delivery events for tracking ids 200 and 300 have been detected.

It would be nice if we can get similar feature in KSQL in the near future. In the meantime, Siddhi provides connector for Kafka.