Logo

Assumptions

We have made efforts to set a balance for this calculator between complexity of use and accuracy of results. Where representative costs are required To maintain this balance we've made a number of assumptions that are documented below:

  1. When using Kafka as a pipe rather than a long term store it is assumed that Kafka will hold 10% of the total data size at any given time
  2. Big query compute compute costs are calculated at on demand query prices
  3. Queries executed in BigQuery on average require only 25% of the total dataset. This accounts for any partitioning schemes etc. that can reduce the data required for processing.
  4. Connector pricing assumes 1 task per topic
  5. Connector pricing assumes that the entire dataset is transferred into BigQuery
  6. Streambased node costs assume 1 nodes is required per 10Tb scanned
  7. Kafka clusters are considered to be over provisioned and not CPU bound. This means they can absorb the extra CPU load associated with querying data directly in Kafka

In addition to the assumptions above a series of reference costings have been used. These are detailed below:

Big Query Costs

Storage Cost (per TB)$0.04
Query Cost (per TB)$6.25

AWS Costs

S3 Storage Cost (per TB)$0.02
EC2 t4g.xlarge (Streambased nodes) Hourly Cost$0.134

Confluent Managed Connect Costs

Connector Task Cost (hourly)$0.1
Connector Transfer Cost (per GB)$0.025

Kafka Costs

Kafka Egress Cost (per GB)$0.01

Should you require a more detailed cost comparison tailored to your own architecture please reach out here: info@streambased.io