Skip to main content

Understanding Origin Events

Origin events are the base usage unit for Responsive SDK. They are the first events that enter your streaming system and typically represent actions that are generated by real interactions within your business. This is in direct contrast to derived events, which are created when your applications filter, project, aggregate or join origin events.

The intention of billing on Origin Events is to allow developers freedom on how you design and build your Kafka application topologies. Responsive encourages splitting logic across many applications and sub-topologies instead of artificially clumping functionality into a single app.

Predicting Origin Event Volume

The easiest way to predict your origin event volume is to come up with a diagram that outlines how your stream processing architecture functions. Once you have this diagram, identify which topics contain origin events and compute the number of events processed by that application.

An example of such an architecture diagram is below:

Origin Events, Explained

This sample architecture has two relevant components:

  • There are two applications: App_A and App_B
  • There are three topics ingested by these applications: origin_topic, derived_topic and origin_topic_2.

You can imagine that App_A splits the input events from origin_topic, outputting 3 output events for every input event and App_B then joins that output topic with a table represented with origin_topic_2. In this case, all events in origin_topic and origin_topic_2 are origin events, while the data in derived_topic are all derived events and would not count toward your origin event bill.

Once you have identified your origin event topics you need to figure out the monthly event volume on those topics. There are two ways of doing this:

  1. You can estimate by seeing your average process rate (JMX MBean kafka.streams:type=stream-task-metrics,thread-id=[threadId],task-id=[taskId] metric process-rate) and multiplying that by the number of tasks and then by 2,628,000 (assuming 730 hours in a month)
  2. You can estimate by looking at the offsets in your Kafka Topic, checking the offset number at the start of the hour and then again at the end of the hour, subtracting those two numbers and multiplying it by the number of partitions and then again by 730 (the number of hours in a month).

Note that both of these are estimates and may be off by a few events. The former does not account for re-processing and the latter does not account for transaction markers.

How Responsive Tracks Origin Events

Responsive requires that you run Kafka Streams with the ResponsiveKafkaStreams class. This class uses custom producers and consumers for various reasons, one of which is to track and report origin events. Whenever an event is produced by an application that uses the Responsive SDK, it will include a single byte header that indicates that this event has been seen by Responsive. Whenever an event is consumed, the consumer checks for the presence of this header. If the header is not there, it assumes that this is the first time this event has ever been seen by Responsive and counts it as an origin event.

How Responsive Avoid Double Counting

Responsive makes sure that in any failure situations, we optimize for under-reporting the number of origin events. This is accomplished in a few ways:

  1. Responsive reports origin events only on commit, meaning if there is a crash and events are re-processed they will only count once.
  2. Responsive reports each origin event with a UUID, meaning any network blips that require retrying the request will be deduplicated once we process them in our system.

Mixing Responsive and Open Source

If you have some applications that use ResponsiveKafkaStreams and others that have not yet migrated, it is important to note that the headers will be forwarded transparently by KafkaStreams as long as you do not modify them manually. This means that any events downstream from a ResponsiveKafkaStreams application will contain the header, even if they are later processed by vanilla OSS Kafka Streams applications.