Right-size AWS MSK cluster size

uperstream achieves cost efficiency, enhanced performance, and greater reliability through continuous optimization of both application and infrastructure layers.

Application layer: Identify and eliminate inactive resources, reduce traffic, apply improved configurations, enforce standardization, identify storage reduction opportunities, and more.
Infrastructure layer: Clusters are often sized for peak times, but Superstream dynamically identifies the exact resources needed at any given moment and automatically scales accordingly.

Reducing costs in AWS MSK

Your bill structure

AWS MSK Total Cost of Ownership (=TCO) includes:

Compute (Number of instances X type)
Egress (Read) in GB/hour
Storage in GB/hour

Step 1: Optimizing the application layer (clusters/clients)

Identify and Reduce Inactive Topics and Partitions: Superstream employs proprietary methodologies to continuously detect and minimize inactive topics and partitions. Decreasing the number of partitions has a direct effect on Memory utilization and the number of needed brokers (instances)
Identify and Reduce Inactive Connections: Superstream constantly monitors and reduces inactive connections, optimizing resource usage.
Identify and Reduce Inactive Consumer Groups: Superstream continuously tracks and diminishes inactive consumer groups to ensure efficient system performance.
Optimize Producers without Compression: Superstream assesses producers not using compression, determines their capability to enable it, and seamlessly activates the most suitable compression algorithm.
Optimize Non-Binary Payloads: Superstream identifies producers writing non-binary payloads and activates payload reduction techniques, ensuring no disruption to the existing workflow.

The entire identification process is fully configurable to accommodate various business logic while continuously self-improving through reinforcement learning.

Step 2: Dynamically scale the cluster's resources

Superstream leverages advanced machine learning algorithms to anticipate resource demands based on historical data and current usage patterns. By continuously analyzing this information, Superstream adjusts cluster sizes in real-time, ensuring that resources are allocated precisely when and where they are needed. This dynamic scaling guarantees optimal performance and significantly reduces operational costs by avoiding the need to provision resources for peak load conditions that may only occur sporadically.

Although certain environments may not support all actions, implementing even a small portion can significantly reduce the monthly bill.

Getting started

Step 1: Access your MSK cluster through the Superstream Console

Step 2: Scroll down to "Autoscaler" and define its rules

Step 3: Activate

Superstream will now initiate and monitor the cluster, evaluating whether any defined rules are satisfied. If a rule is met, Superstream will execute the corresponding user-defined action.

Ref: Scale-in a cluster (reduce the number of brokers)

Ref: How to manually scale an MSK cluster in case Superstream is down

Overview

This manual provides step-by-step instructions for scaling Amazon Managed Streaming for Apache Kafka (MSK) clusters, including both horizontal and vertical scaling approaches.

Horizontal Scaling (Adding Brokers)

Planning Phase

Monitor current cluster metrics:
- CPU utilization
- Storage utilization
- Network throughput
- Partition distribution
Calculate required capacity:
- Number of partitions per broker
- Expected throughput per broker

Implementation Steps

Using AWS Console

Navigate to the Amazon MSK console
Select your cluster
Click "Actions" → "Edit cluster configuration"
Under "Brokers", modify the number of brokers per Availability Zone
Review and confirm changes
Monitor the scaling operation in the console

Using AWS CLI

aws kafka update-broker-count \
    --cluster-arn <your-cluster-arn> \
    --current-version <current-cluster-version> \
    --target-number-of-broker-nodes <new-broker-count>

Vertical Scaling (Broker Type Update)

Planning Phase

Identify target broker type based on:
- CPU requirements
- Memory needs
- Network capacity requirements
- Cost considerations

Implementation Steps

Using AWS Console

Navigate to the Amazon MSK console
Select your cluster
Click "Actions" → "Update broker type"
Select new broker type
Schedule the update
Review and confirm changes

Using AWS CLI

aws kafka update-broker-type \
    --cluster-arn <your-cluster-arn> \
    --current-version <current-cluster-version> \
    --target-instance-type <new-instance-type>

Best Practices

Scale during low-traffic periods
Maintain sufficient headroom (20-30%) for unexpected traffic spikes
Monitor scaling operations closely
Keep cluster configuration version updated
Document all scaling operations

Troubleshooting

Common Issues

Insufficient Capacity Errors
- Solution: Verify available capacity in target AZs
- Contact AWS support if needed
Scaling Operation Timeout
- Solution: Check AWS CloudWatch logs
- Verify network connectivity
- Review security group configurations
Uneven Partition Distribution
- Solution: Run kafka-reassign-partitions tool
- Review partition assignment strategy

Monitoring and Maintenance

Key Metrics to Monitor

Broker CPU utilization
Storage utilization
Network throughput
Producer/consumer latency
Partition replication lag

PreviousSuperCluster for Kafka NextNotifications

Last updated 3 months ago

Was this helpful?