Monday, April 21, 2014

ACS, Histogram & Oracle Bug

with 11g, Oracle has enhanced the ACS feature. Ideally when you have multiple bind values for same SQL statement, Optimizer may generate different plan for different bind values. These plan are generated based on feature called as Bind Aware statements. So if Optimizer detects that different bind values generates the different and very varied foot prints of IO, then one execution plan will not suffice. So  it will create different execution plan and will mark the SQL as bind aware. 

So effectively you are sharing the Cursor but not the plan and that is expected behavior of 11g Optimizer. However, during one performance issue troubleshooting, we observed that this is not the case. This all started with one application having severe performance issue and one of the DBA executed the Gather Stats jobs and things went back to normal. 
After few hours things again started to went south as normal process of request (from frontend), that usually takes couple of minutes didn't even completed in 15 minutes. 
Again, someone executed Gather Stats job and things came back in line, but only for another few hours. Pattern has now started to emerge... and things were not looking good.

So I decided to take a look at DB health. Also I checked if there was any change went in recently but there were none. My strategy was simple, start with comparing good and bad times. So I ran AWRDIFF report with good and bad run-times (as received by application team).

From the report what I saw that there was severe CPU Scheduling problem on Database. The processes were doing nothing but waiting on CPU till someone either kill them or run gather stats. 

Top 5 Timed Events

select child_number, sql_id,hash_value, plan_hash_value, executions, buffer_gets, is_bind_sensitive, is_bind_aware, is_shareable from v$sql
where sql_text like 'SELECT xxxxxxx%';

-- excel output

So if you see here there are lots of SQLs which are shared and have different children based on the bind values provided. However if you see the attached table there are multiple child numbers with same SQL hash value and same plan hash value i.e. diff values of bind variables creates the new child cursor but doesn’t generate the new plan always i.e. its bind sensitive but not bind aware. Hence its using the same PLAN for different cursor which is not the ideal way to do it. Ideally it should create the new plan for each new child cursor. This usually requires the HISTOGRAMS. when I go and try to find the histograms I see the relevant histograms on the table. 
Also if you notice there are lots of hard parses going on. This is pretty evident by the number of child cursor created for each statement and there are few hundreds statement like that. 

So that brings to the most important question that, why the histograms are not used when they are in place ??
Looks like we hit some Oracle Bug ? its time to verify and so raised TAR with Oracle and they came up with following justification (after waiting of 5 days and Escalation of SR, which was then attended by Sr Manager from European Region)

SR Update - 

the reason for not sharing seems to be caused by bug 10351178. See Note 10351178.8 for details: Bug 10351178 - High version counts caused by auto tuned PGA memory.

The bug is fixed in The workaround is to disable auto tuned PGA. You can do this by setting

PGA_AGGREGATE_TARGET and SGA_TARGET explicitly and setting MEMORY_TARGET = 0. See Note 443746.1 (Automatic Memory Management (AMM) on 11g) in section "2. Automatic Shared Memory Management – For the SGA" on how to do that.

On a side note, I created an additional index on the table, which was then used by query which stabilized the performance a lot. So at the moment we are not modifying any parameter and will wait for some time as our upgrade cycle is approaching. 

Hope this will help you to get some insight in internal working of ACS and histograms.