Hosted Rollups: Streamlining usage data for scalable infrastructure

Kshitij Grover

At Orb, we work with some of the world’s best infrastructure companies: from hosting providers like Vercel & Replit to large-scale databases like Neo4j and Pinecone, our customers know all too well how to scale production systems.

Central to everything we build at Orb is taking the burden of billing off the backs of our customers. As it turns out, when you’re operating a serverless database, a financial transaction processing platform, or an LLM inference stack, you can produce a lot of usage data!

Deduplicating and aggregating this data at scale is no easy task, even for the most seasoned infrastructure teams. Think about it this way: if you want your billing infrastructure to be as precise as possible, you’ll need it to scale to a superset of the volume of your other internal services.

More importantly, it’s not a high leverage task: as an engineer, you should be thinking about the core metrics of your business model, not about a mess of data pipelines.

This is why Orb offers Hosted Rollups – a mechanism for you to confidently ingest events at petabyte scale and let Orb handle the data infrastructure complexity for you. This is built for our highest volume customers that each produce upwards of hundreds of thousands of events a second.

The base case: 1000 events per second

The majority of Orb customers – those with a few thousand events a second – have a simple use case for data ingestion. They send a raw event to Orb, which represents a unit of product usage in their product (e.g. a transaction processed, a machine compute heartbeat, or an API request). Each event contains a set of key-value pairs as properties, which can then be used to define aggregation queries over them. For example, you might define a metric “sum of ACH transaction volume filtered to transactions greater than $5.00” and then attach a $1.00 / transaction take rate as your pricing model to this metric.

Orb always evaluates each metric in the context of a customer’s subscription, so the system runs a query over all events in a given timeframe (billing period) and for a given customer. This is one of the keys to Orb’s flexibility: the system allows you to write SQL (subqueries and all!), not limiting you to a handpicked set of aggregators like “SUM” and “MAX”. Note that for billing purposes, no calculations are happening as events stream in – instead, they're happening in the context of the full window over all the data.

This is the most natural way to think about defining a metric – a very flexible query over your entire dataset – but it has its limits. At the scale of 1 million+ events per second, it’s impractical to store each event and run a query over the full volume. Given the constraints of I/O, not even the most ambitious databases will promise you that in sub-second query time! This is precisely where most systems would push the work of reducing the data size onto the end developer, imposing rate limits and asking them to pre-aggregate incoming data. At Orb, we handle it for you. This means less time spent firefighting pipelines and more time building features that matter to your users.

Rollups as configuration

Each of Orb’s customers shouldn’t have to build their own aggregation pipeline. This would involve having to implement a stream processing framework (e.g. kSQL, Flink, Spark Streaming), understand its windowed rollup semantics, troubleshoot its deduplication guarantees, and of course maintain the compute layer as your event volume grows.

Orb’s Hosted Rollups feature takes your raw data and rolls it up into time-based aggregates, significantly reducing your event volume. For example, suppose you send Orb an event per inference call, tagged with a machine instance type that the inference was run on (e.g. `gpu-a100`). Even if you have millions of inference requests a second, Orb can emit a single rollup for each of your customers every 5 minutes with the count of inference calls per machine type.

This is crucial: the amount of data you produce after aggregation is no longer proportional to your input events. In this example, it’s proportional to the number of active customers and machine types.

The steps to configure this are much simpler than hosting your own infrastructure:

Send the raw events into S3 (e.g. via Firehose or a Kafka connector)
Define the parameters of the rollups (i.e. your deduplication window, and the size of your rollup aggregation window)
Define an “aggregation configuration” for each type of event, telling Orb how you’d like to aggregate your data. For example, for a given event type, you might want to take the `SUM` of a specific property every 10 minutes, but maintain a separate split by `machine_size`.

Not just infrastructure

Although taking the infrastructure burden off developers is a significant win, we know that billing often has even more stringent requirements. With Hosted Rollups, you get much more right out of the box:

Partial rollups: Suppose you want to aggregate data every 5 minutes, but you still want real-time alerting on your usage (e.g. to enforce spend caps, or credit burndowns). Orb can emit partial rollups every minute, automatically doing the work to discard these partial rollups at query time so you never double count.
Query syntax: Dealing with rollups as raw events can be particularly confusing, especially if you have partial rollups in the mix. Orb provides a simplified "view-like" syntax, letting you deal with the aggregated events as a SQL view directly when defining your metric.

In billing, things rarely stay the same. An extremely common pain point is how your rollups evolve over time. Suppose your business wants to start charging a differential rate not only by machine type but also region. Originally, there was no need to emit a different rollup per region in the post-aggregated data, even if it was in your original raw event structure. Now, you’d like to introduce a separate rollup for each combination of region and machine so you can construct your pricing for each region separately. Even more so, it’s important that this doesn’t cause problems with your existing subscriptions that rely on the current metric definition – it would be an extreme hassle to have to migrate each subscription to a new version of your metric.

This is where the native aggregation capability shines: Orb supports versioning of rollups natively. Since the system knows both how the aggregates are produced, and how the query engine constructs the relevant queries, Orb can handle the versioning semantics seamlessly. Based on a timestamp cutoff, Orb will automatically look at the correct version of your events and you can continue using your existing metrics without any manual migrations.

With Orb’s Hosted Rollups, you can offload the complexity of real-time data rollups and seamlessly scale your billing infrastructure alongside your core services, ensuring precision and performance even at massive event volumes.

Last Updated:

October 31, 2024

Category:

Announcements

Ready to solve billing?

Contact sales Explore demo

Hosted Rollups: Streamlining usage data for scalable infrastructure

The base case: 1000 events per second

Rollups as configuration

Not just infrastructure

Ready to solve billing?

Let's talk.