Understanding and Interpreting CPU Steal Time on Virtual Machines

Virtual machines report on different types of usage metrics, such as server load, memory usage, and steal time. Customers often ask about steal time – what is it, and why is it reported on their virtual machines? Read on as we explain how steal time works to better understand what it means for your virtual machine. 

What is Steal Time? 

Steal time is the percentage of time the virtual machine process is waiting on the physical CPU for its CPU time. You can monitor processes and resource usage by running the “top” command on your Linux server. Among usage metrics, is steal time is labeled as ‘st’. 

CPU in Virtual Environments

In cloud environments, the hypervisor acts as the interface between the physical server and its virtualized environment. The hypervisor kernel manages all these tasks by scheduling the running processes to the physical cores of the server. Processes such as virtual machines, networking operations and storage I/O requests are given with some CPU time to process jobs. CPU time is allocated between these processes, which shifts priorities and creates contention between these processes over the physical cores. 

Percentage Idle Time

Steal time can also be visible on virtual machines alongside idle time. Idle time means that there is CPU time allocated by the hypervisor, but the virtual machine did not use that time. In this case we can assume there was no effect on the performance at all. 

When the idle time percentage is 0 and steal time is present, we can assume that processes on the virtual machine are processed with a delay. 

Multi-Tenant Cloud

Leaseweb cloud platforms consist of single-tenant and multi-tenant environments. Leaseweb Elastic Compute Cloud (powered by Apache CloudStack) products allow you to develop and run a multi-tenant environment, enabling different kinds of users to run their cloud infrastructures at lower cost. Along with not overselling virtual cores on our premium Elastic Compute platforms, we also do not pin virtual machines to CPU cores. This allows the hypervisor to allocate CPU time from all the server’s physical cores to any of its active processes. 

Theoretically speaking, if the virtual machine has immediate access to its assigned cores 100% of the time, there would be no steal time visible. However, hypervisors are running many different tasks and are continuously performing actions such as rescheduling tasks for efficiency and processing received data from other systems. All these processes require CPU time from the hypervisor’s CPU, resulting in delayed access to the physical cores and adding steal time to the virtual machine. 

Analyze Service Performance

A small amount of steal time is often unavoidable in modern hosting environments, particularly when running on shared cloud hosting. The steal time virtual machines experience is not always visible from outside the virtualized operating system. 

If you see a constant steal time registered by the virtual machine, try finding a correlation with the tasks you are executing. More importantly, how does this steal time result in performance loss? Are you noticing any loss in performance on your applications? If so, try measuring output to discover latency in the whole flow of your application in accordance with steal time.  

Keep your hosting provider informed in case you do notice severe impact on your application. In many situations they can find a more suitable environment by moving the virtual machine to a different hypervisor. 

If you are interested in exploring different cloud environments, which can be a multi-tenant VMware vCloud, Elastic Compute or a single tenant VMware vSphere or HCI solution, check out our private cloud offerings. 

 

 

Editor’s note: This article was originally published on 24 September 2020 and was updated on 20 November 2022.