Assumptions
We have made efforts to set a balance for this calculator between complexity of use and accuracy of results. Where representative costs are required To maintain this balance we've made a number of assumptions that are documented below:
- When using Kafka as a pipe rather than a long term store it is assumed that Kafka will hold 10% of the total data size at any given time
- Big query compute compute costs are calculated at on demand query prices
- Queries executed in BigQuery on average require only 25% of the total dataset. This accounts for any partitioning schemes etc. that can reduce the data required for processing.
- Connector pricing assumes 1 task per topic
- Connector pricing assumes that the entire dataset is transferred into BigQuery
- Streambased node costs assume 1 nodes is required per 10Tb scanned
- Kafka clusters are considered to be over provisioned and not CPU bound. This means they can absorb the extra CPU load associated with querying data directly in Kafka
In addition to the assumptions above a series of reference costings have been used. These are detailed below:
Big Query Costs
Storage Cost (per TB) | $0.04 |
Query Cost (per TB) | $6.25 |
AWS Costs
S3 Storage Cost (per TB) | $0.02 |
EC2 t4g.xlarge (Streambased nodes) Hourly Cost | $0.134 |
Confluent Managed Connect Costs
Connector Task Cost (hourly) | $0.1 |
Connector Transfer Cost (per GB) | $0.025 |
Kafka Costs
Kafka Egress Cost (per GB) | $0.01 |
Should you require a more detailed cost comparison tailored to your own architecture please reach out here: info@streambased.io