Instance Caging
Recently on one of our production cluster
running 8 nodes and having shared environment with multiple applications, ran
into high CPU usage. Since this is shared environment, running few databases,
its always difficult to pinpoint which is culprit. Hence after more than couple
of incidents we were able to point out which one is actually causing mayhem.
Also the pattern of problem is not predictable as application is used across
globe. But since hang was not momentarily it really affects all other
candidates as well. This is serious problem in shared environment as one wants
to save cost on hardware and software, which is quite under stable. However, this
also ensures the optimum usage of the capacity as not all the databases are
always used up to the capacity (again, case to case).
So the question is how do we avoid such an
instance and still provide the optimum usage of resources ?
11g, introduces one new feature
called "instance caging", where you can have more than one instance
on the same server, and you can share the CPUs reported by the operating system
-- To find out total cpu and cores available
to your system
SQL> select
cpu_count_current,cpu_core_count_current from v$license;
System with 12 physical cpu cores
CPU_COUNT_CURRENT CPU_CORE_COUNT_CURRENT
----------------- ----------------------
12 12
System with 6 physical cpu and 12
cores, Hyper Threaded
CPU_COUNT_CURRENT CPU_CORE_COUNT_CURRENT
----------------- ----------------------
12 6
One can use following query to
find out the CPU SOCKET and CPU CORES available to database
select * from v$osstat where stat_name like '%CPU%'
In Hyper threaded Servers, the output will be
like following.
Now one needs to determine how the
database instances on your server will share the CPU. With Instance
Caging, each instance's cpu_count specifies the maximum number of CPUs you want
it to use at any time for that particular instance. The sum of the cpu_counts
across all database instances determines the amount of isolation between the
database instances and the efficiency of the server.
For maximum isolation one can use "partition" approach. With the partition approach, the sum of the cpu_counts is less than or equal to the number of CPUs.
For maximum isolation one can use "partition" approach. With the partition approach, the sum of the cpu_counts is less than or equal to the number of CPUs.
For example, suppose the total
number of CPUs (i.e. CPU threads/cores) is 16. Using the partition
approach, we could set cpu_count=8 for database A, cpu_count=4 for database B,
and cpu_count=4 for database C. The sum of the cpu_counts is 16,
which equals the number of CPUs.
The disadvantage of the partition approach is that any CPU unused by one database instance cannot be used by another. Therefore, for non-critical databases where you also want to achieve better CPU utilization efficiency, use the "over-subscribe" approach. With the over-subscribe approach, the sum of the cpu_counts is less than or equal to 3x the number of CPUs.
The disadvantage of the partition approach is that any CPU unused by one database instance cannot be used by another. Therefore, for non-critical databases where you also want to achieve better CPU utilization efficiency, use the "over-subscribe" approach. With the over-subscribe approach, the sum of the cpu_counts is less than or equal to 3x the number of CPUs.
For example, for a server with 16
CPUs, you could use the over-subscribe approach and set cpu_count=8 for
database A, cpu_count=8 for database B, and cpu_count=8 for database C.
The sum of the cpu_counts is 24, which is greater than the number of
CPUs. Therefore, if all databases are using their full CPU allocation,
there will be some CPU contention.
Enable Instance Caging –
-- Set the cpu_count initialization
parameter.
-- This is a dynamic parameter, and can be set with the following statement:
SQL> ALTER SYSTEM SET CPU_COUNT = 4;
-- This is a dynamic parameter, and can be set with the following statement:
SQL> ALTER SYSTEM SET CPU_COUNT = 4;
To verify that Instance Caging is
enabled, check that "instance_caging" equals "ON" and that
"cpu_count" is set appropriately.
SQL> select instance_caging from v$rsrc_plan where is_top_plan = 'TRUE';
INS
---
ON
Once
this is enabled, The CPU spikes will be limited to only specified
number of cpu's and will be not be hampering other databases.