Auto-scaling is a convenient scaling technique that you can use for your workloads hosted in a cloud environment. One of the major benefits of Public Cloud is the ability to scale resources to match your needs on demand. Auto-scaling enhances this advantage by adding automation and flexibility, ensuring that resource allocation adjusts dynamically to match workload changes over time.
Before the introduction of auto-scaling, managing workload scaling was challenging. It involved manually adding or removing resources, which often led to errors and overprovisioning, as accurately predicting demand changes was difficult. Under-provisioning could result in service outages, making resources unavailable. Auto-scaling addresses these issues by automatically adjusting the resources allocated to workloads based on fluctuating demand, ensuring optimal performance and availability.
In this blog, we’ll explain key things to understand to make the most of auto-scaling.
Scaling Out or Scaling Up?
To gain a comprehensive understanding of auto-scaling, it’s useful to begin with the two primary methods available:
1. Scaling out/horizontal scaling
With horizontal scaling, you increase or decrease the number of instances (in Public Cloud) participating in each workload. An advantage of horizontal scaling is that you can add or remove new instances without affecting existing ones or creating downtime. Compared to vertical scaling, this is a faster method but not every application or workload can be scaled horizontally.
2. Scaling up/vertical scaling
This type of scaling adjusts the compute capabilities of an existing instance by increasing or decreasing its resources, such as memory and CPU processing power. For instance, if you scale vertically, an instance provisioned with 4 vCPUs and 16 GiB of memory could be upgraded to 64 vCPUs and 64 GiB of memory upon request. For relational databases that do not employ sharding*, vertical scaling is the only option for scaling up or down. However, this method is less commonly used than horizontal auto-scaling, which is typically preferred in automated scaling scenarios.
How Does Horizontal Auto-scaling Work?
Auto-scaling is typically configured to respond to specific events, such as significant spikes in traffic or the launch of a new feature, as well as to metric thresholds that ensure performance remains unaffected.
For instance, a developer might set a threshold of 60 percent CPU usage sustained for more than five minutes. Additionally, the auto-scaling group can have a minimum and maximum number of instances configured in the Public Cloud. This setup allows for the automatic launch of additional instances each time the CPU threshold is reached, up to the predefined maximum limit. Newly added instances are usually assigned to a load balancer, which evenly distributes incoming traffic across all instances. Conversely, if an instance’s CPU usage falls below the threshold for more than five minutes, it is removed, and the load balancer stops directing traffic to it. This configuration optimizes performance and enhances the application’s availability.
In addition to triggering metrics, auto-scaling can also be configured according to a predetermined schedule. This option is particularly beneficial for companies and services with predictable seasonal demands, as it allows for proactive adjustments to scaling capacity based on anticipated needs.
Why Is Auto-scaling So Important?
Main benefits from Public Cloud auto-scaling:
- Better Cost Management: By auto-scaling horizontally, your instances are automatically added or removed to match the demands of applications, ensuring optimal performance while avoiding charges for unused capacity.
- Enhanced Reliability: Your applications remain available and responsive even during sudden surges in workloads, ensuring uninterrupted service.
- Improved User Experience: Auto-scaling contributes to optimizing performance and reducing latency, resulting in a seamless and responsive experience.
Where Is Auto-scaling Used?
The following examples illustrate the most common use cases for auto-scaling. Auto-scaling addresses the need for immediate, flexible, and reliable extra capacity. It is quick to set up, automated, and requires no long-term commitments—allowing you to pay only for what you actually use.
Gaming
As the number of gamers fluctuates throughout the day, engineers can configure their backend gaming instances to automatically scale up or down. This is particularly useful during new game launches or weekend events, when auto-scaling enables teams to effectively manage anticipated demand surges.
E-Commerce
Given that most online shoppers tend to make their purchases during the daytime, engineers can configure ordering and verification systems to automatically scale up during the day and scale down at night. During high-demand events, such as Black Friday, auto-scaling ensures that e-commerce systems can meet demand while maintaining optimal performance and minimizing unnecessary hosting costs.
Martech/Adtech
When a global ad agency prepares for a major campaign, their LTO system requires increased capacity for analytics. With auto-scaling, the system can automatically scale up during peak times by adding or removing backend instances based on average CPU utilization. This ensures that their analytics application remains fully available, even during surges in workload. Additionally, auto-scaling promotes better cost management by preventing charges for unused capacity.
If you would like to learn more about auto-scaling provided with Leaseweb Public Cloud, visit our product page and start your cloud journey with Leaseweb.