Keeping detailed notes and records of facts found as the debugging process progresses is essential for later analysis and justification of any remedial actions. This is analogous to a doctor keeping good patient notes for future reference.
Steps in The Oracle Performance Improvement Method
1.Perform the following initial standard checks:
a.Get candid feedback from users. Determine the performance project’s scope and subsequent performance goals, and performance goals for the future. This process is key in future capacity planning.
b.Get a full set of operating system, database, and application statistics from the system when the performance is both good and bad. If these are not available, then get whatever is available. Missing statistics are analogous to missing evidence at a crime scene: They make detectives work harder and it is more time-consuming.
c.Sanity-check the operating systems of all computers involved with user performance. By sanity-checking the operating system, you look for hardware or operating system resources that are fully utilized. List any over-used resources as symptoms for analysis later. In addition, check that all hardware shows no errors or diagnostics.
2.Check for the top ten most common mistakes with Oracle Database, and determine if any of these are likely to be the problem. List these as symptoms for
The Oracle Performance Improvement Method
Performance Improvement Methods 3-3
later analysis. These are included because they represent the most likely problems. ADDM automatically detects and reports nine of these top ten issues. See Chapter 6, “Automatic Performance Diagnostics” and “Top Ten Mistakes Found in Oracle Systems” on page 3-4.
3.Build a conceptual model of what is happening on the system using the symptoms as clues to understand what caused the performance problems. See “A Sample Decision Process for Performance Conceptual Modeling” on page 3-3.
4.Propose a series of remedy actions and the anticipated behavior to the system, then apply them in the order that can benefit the application the most. ADDM produces recommendations each with an expected benefit. A golden rule in performance work is that you only change one thing at a time and then measure the differences. Unfortunately, system downtime requirements might prohibit such a rigorous investigation method. If multiple changes are applied at the same time, then try to ensure that they are isolated so that the effects of each change can be independently validated.
5.Validate that the changes made have had the desired effect, and see if the user’s perception of performance has improved. Otherwise, look for more bottlenecks, and continue refining the conceptual model until your understanding of the application becomes more accurate.
6.Repeat the last three steps until performance goals are met or become impossible due to other constraints.
This method identifies the biggest bottleneck and uses an objective approach to performance improvement. The focus is on making large performance improvements by increasing application efficiency and eliminating resource shortages and bottlenecks. In this process, it is anticipated that minimal (less than 10%) performance gains are made from instance tuning, and large gains (100% +) are made from isolating application inefficiencies.
For a quick and easy approach to performance tuning, use ADDM. ADDM automatically monitors your Oracle system and provides recommendations for solving performance problems should problems occur. For example, suppose a DBA receives a call from a user complaining that the system is slow. The DBA simply examines the latest ADDM report to see which of the recommendations should be implemented to solve the problem.
Oracle Database provides CPU statistics in V$SQL and V$SQLSTATS.
If bind variables are not used, then there is hard parsing of all SQL statements.
You should investigate SQL that consumes significant system resources for potential improvement. ADDM identifies high load SQL. SQL Tuning Advisor can provide recommendations for improvement.
Many sites lay out their databases poorly over the available disks. Other sites specify the number of disks incorrectly, because they configure disks by disk space and not I/O bandwidth.
Many sites run with too few online redo log files and files that are too small. Small redo log files cause system checkpoints to continuously put a high load on the buffer cache and I/O system. If too few redo log files exist, then the archive cannot keep up, and the database must wait for the archiver to catch up.
Steps in the Emergency Performance Method
The Emergency Performance Method is as follows:
1.Survey the performance problem and collect the symptoms of the performance problem. This process should include the following:
¶User feedback on how the system is underperforming. Is the problem throughput or response time?
¶Ask the question, “What has changed since we last had good performance?” This answer can give clues to the problem. However, getting unbiased answers in an escalated situation can be difficult. Try to locate some reference points, such as collected statistics or log files, that were taken before and after the problem.
Performance Improvement Methods 3-7
¶Use automatic tuning features to diagnose and monitor the problem. See “Automatic Performance Tuning Features” on page 1-5 for information about the features that help diagnose and tune Oracle systems. In addition, you can use Oracle Enterprise Manager performance features to identify top SQL and sessions.
2.Sanity-check the hardware utilization of all components of the application system. Check where the highest CPU utilization is, and check the disk, memory usage, and network performance on all the system components. This quick process identifies which tier is causing the problem. If the problem is in the application, then shift analysis to application debugging. Otherwise, move on to database server analysis.
3.Determine if the database server is constrained on CPU or if it is spending time waiting on wait events. If the database server is CPU-constrained, then investigate the following:
¶Sessions that are consuming large amounts of CPU at the operating system level and database; check V$SESS_TIME_MODEL for database CPU usage
¶Sessions or statements that perform many buffer gets at the database level; check V$SESSTAT and V$SQLSTATS
¶Execution plan changes causing sub-optimal SQL execution; these can be difficult to locate
¶Incorrect setting of initialization parameters
¶Algorithmic issues caused by code changes or upgrades of all components
If the database sessions are waiting on events, then follow the wait events listed in V$SESSION_WAIT to determine what is causing serialization. The V$ACTIVE_SESSION_HISTORY view contains a sampled history of session activity which you can use to perform diagnosis even after an incident has ended and the system has returned to normal operation. In cases of massive contention for the library cache, it might not be possible to logon or submit SQL to the database. In this case, use historical data to determine why there is suddenly contention on this latch. If most waits are for I/O, then examine V$ACTIVE_SESSION_HISTORY to determine the SQL being run by the sessions that are performing all of the inputs and outputs. See Chapter 10, “Instance Tuning Using Performance Views” for a discussion on wait events.
4.Apply emergency action to stabilize the system. This could involve actions that take parts of the application off-line or restrict the workload that can be applied to the system. It could also involve a system restart or the termination of job in process. These naturally have service level implications.
5.Validate that the system is stable. Having made changes and restrictions to the system, validate that the system is now stable, and collect a reference set of statistics for the database. Now follow the rigorous performance method described earlier in this book to bring back all functionality and users to the system. This process may require significant application re-engineering before it is complete.
6.It may not always be possible to provide a specific size recommendation for redo log files, but redo log files in the range of 100 MB to a few gigabytes are considered reasonable. Size online redo log files according to the amount of redo your system generates. A rough guide is to switch log files at most once every 20 minutes.
7.For any permanent tablespace that has moderate to large insert, modify, or delete operations compared to reads, choose local extent management.
8.Temporary tablespaces can be dictionary-managed or locally managed.
How to tune oracle Database?
Collecting and analyzing the AWR reports for Oracle Database:
Tuning the Database is very much essential not only for better execution of SQL statement but also for applications running well using that Database.AWR(Automatic workload repository) report gives us clear picture in Deciding and tuning Database as well as SQL statement.As Enterprise Manager we can see it easily but Enterprise manager needs license(additional cost) so some companies does not want to use Oracle Enterprise Manager.Oracle gives us flexible option in which we can generated awr report in html and plain format.
It is better to generate in html format for seeing it clearly by using any web browser.Since performance tuning is a deep ocean I will be updating this thread based on my problems faced and the method which work fine for tuning purpose.
Collecting awr report from SQL prompt:
Login to the Database as ‘sys’ user(SYSDBA) and make sure Database is up & running and Oracle Environment for the particular Database
is set.we can gather the awr report using ‘awrrpt.sql’.
Note:If we want the awr report in Oracle RAC environment than we have to use ‘awrgrpt.sql’ script,as there you have to gather the report for multiple instances running on various nodes.
SQL> select name from v$database;
SQL> select status from v$instance;
DB Id DB Name Inst Num Instance ----------- ------------ -------- ------------ 3628069655 TESTDB 1 TESTDB Specify the Report Type
Would you like an HTML report, or a plain text report?
Enter ‘html’ for an HTML report, or ‘text’ for plain text
Defaults to ‘html’
Enter value for report_type: html
Type Specified: html
Instances in this Workload Repository schema
DB Id Inst Num DB Name Instance Host ------------ -------- ------------ ------------ ------------ * 3628069655 1 TESTDB TESTDB TESTNODE1.comp.com Using 3628069655 for database Id Using 1 for instance number Specify the number of days of snapshots to choose from
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed. Pressing without
specifying a number lists all completed snapshots.
Enter value for num_days: 1
Listing the last day’s Completed Snapshots
Instance DB Name Snap Id Snap Started Level
TESTDB TESTDB 5590 24 Nov 2011 00:30 1
5591 24 Nov 2011 01:30 1
5592 24 Nov 2011 02:30 1
5593 24 Nov 2011 03:30 1
5594 24 Nov 2011 04:30 1
5595 24 Nov 2011 05:30 1
5596 24 Nov 2011 06:30 1
5597 24 Nov 2011 07:30 1
5598 24 Nov 2011 08:30 1
5599 24 Nov 2011 09:30 1
5600 24 Nov 2011 10:30 1
5601 24 Nov 2011 11:30 1
5602 24 Nov 2011 12:30 1
5603 24 Nov 2011 13:30 1
5604 24 Nov 2011 14:30 1
5605 24 Nov 2011 15:30 1
Specify the Begin and End Snapshot Ids
Enter value for begin_snap: 5604 Begin Snapshot Id specified: 5604 Enter value for end_snap: 5605 End Snapshot Id specified: 5605 Specify the Report Name ~~~~~~~~~~~~~~~~~~~~~~~ The default report file name is awrrpt_1_5604_5605.html. To use this name, press to continue, otherwise enter an alternative. Enter value for report_name: awrrpt_NOV24_2011_2_30_3_30_PM.html SQL> exit We will see the html format of the awr report in the current operating system path. [[email protected] ~]$ ls -altr awr* -rw-r--r-- 1 oracle dba 458371 Nov 24 14:02 awrrpt_1_5590_5603.html -rw-r--r-- 1 oracle dba 390564 Nov 24 16:31 awrrpt_NOV24_2011_2_30_3_30_PM.html We can copy this html file using copying tool(winscp or ftp) to our machine and review using web browsers(mozilla or IE supported versions). Analzing the awr report and suggesting possible recommendations: Once we obtain the awr report our main motive is to analyze the awr report and come up with possible recommendations.Depending on the size of our Production Database we can come up with possible recommendations.This recommendation should be first implemented in test environment and after successful results should be adopted in production environments. 1) Redo logs: We need to make sure our redo logs are large enough.Check the number of log switches, one every twenty minutes is ideal, more than this is too high and you should make them larger to reduce the number of switches. We can find the log switches in the Instance Activity Stats part of the awr report. Example: Instance Activity Stats - Thread Activity * Statistics identified by '(derived)' come from sources other than SYSSTAT Statistic Total per Hour log switches (derived) 2 2.00 We can see in this system there are 2 log swtiches per hourly basis,which is good.So this tells us the redo logs are large enough. 2)Parsing: Check the hard parsing amount.It should be zero.If it is not, this indicates that our SGA is probably too small,increase the size of SGA and test again. Hard parsing is caused by use of literals in SQL (as opposed to bind variables).If the queries in question are our own,we should change them to use bind variables. We can find this information on the first page. Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~ --------------- --------------- ---------- ---------- ... Parses: 33.9 7.2 Hard parses: 0.5 0.1 ... We can see in this system the hard parses is almost zero, which is good. Now coming to the SGA we can focus on the below considerations: 3)Buffer hit and Library hit percentages: Check the buffer hit and library hit percentages. We want them to be 100%, if not we should increase the size of SGA.This is also on the first page: Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: 99.82 Redo NoWait %: 100.00
Buffer Hit %: 99.52 In-memory Sort %: 100.00
Library Hit %: 98.63 Soft Parse %: 98.60
Execute to Parse %: 50.96 Latch Hit %: 98.16
Parse CPU to Parse Elapsd %: 66.67 % Non-Parse CPU: 97.75
In this case they are also good.
4)Top 5 Timed Foreground Events:
Check the average wait times.Anything over 5ms indicates a problem.If we see database CPU events in the Top 5, this indicates that SGA is too small.We may also be missing indexes.Check the optimizer statistics.
Here are the Top 5 from my environment:
Top 5 Timed Foreground Events
~ ~ ~ ~ ~~~~~
wait % DB
Event Waits Time(s) (ms) time Wait Class
DB CPU 15 59.9
log file sync 1,592 8 5 32.3 Commit
sort segment request 1 1 1001 4.0 Configurat
db file sequential read 216 1 4 3.6 User I/O
db file scattered read 64 0 6 1.5 User I/O
we can see here that the major issue is DB CPU, which generally indicates SGA is too small. However, in this case it is high because this report was run on a VM with the database and BPM sharing the CPU and disk.
Database file sequential/scattered read
These indicate time spent doing table scans and index scans (respectively).If these are high (over 5ms),We should consider moving your data files to reduce disk I/O contention, or move them to faster disks.
5)Enqueue high watermark:
This indicates hardware contention that occurs when there are multiple users inserting into LOB segments at once while the database is trying to reclaim unused space.
We should consider enabling secure files to improve LOB performance (SECURE_FILES=ALWAYS).
We cannot see these in my example report, because this was not a problem in my environment, so it did not make it into the Top 5.If it did, you would see an event called:enq: HW – contention
Other things to be aware ofÖ
We will also check our database configuration.
Do not use this setting.We should have our DBA tune the memory manually instead.This will result in a better tuned database.We should start with 60% of physical memory allocated to SGA and 20% to PGA.
Usually we do not use this setting much for tuning.But auditing on Database level
can be overhead to the Database.
Understand AWR Report
I mention earlier how to generate AWR , But after you did this how can you read it .
1) The first thing to be checked in AWR report is the following:-
Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 112 11-Jun-09 00:00:57 191 6.7
End Snap: 113 11-Jun-09 01:00:11 173 7.4
Elapsed: 59.23 (mins)
DB Time: 710.73 (mins)
Check the “DB Time” metric. If it is much higher than the elapsed time, then it indicates that the sessions are waiting for something.
Here in this example, the Elapsed Time is around 60 minutes while the DB Time is around 700 minutes. This means that 700 minutes of time is spent by the sessions on waiting.
2) Next thing to be looked is the following:-
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 98.67 In-memory Sort %: 100.00
Library Hit %: 98.60 Soft Parse %: 99.69
Execute to Parse %: 5.26 Latch Hit %: 99.31
Parse CPU to Parse Elapsd %: 12.78 %Non-Parse CPU: 99.10
As per the thumb rule, Instance Efficieny Percentages should be ideally above 90%.
3) Then comes the Shared Pool Statistics.
Shared Pool Statistics
Memory Usage %: 85.49 80.93
% SQL with executions>1: 42.46 82.96
% Memory for SQL w/exec>1: 47.77 81.03
The memory usage statistics of shared pool is shown.
Idealy this should be lesser. If it is very high like beyond 90, this shows the contention in the shared pool.
4) Next thing to be looked after is the Top 5 Timed Events table.
This shows the most significant waits contributing to the DB Time.
Top 5 Timed Events
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
db file sequential read 4,076,086 28,532 7 66.9 User I/O
CPU time 11,214 26.3
Backup: sbtbackup 4 4,398 1,099,452 10.3 Administrative
log file sync 37,365 2,421 65 5.7 Commit
log file parallel write 37,928 1,371 36 3.2 System I/O
Here, the significant wait is the db file sequential read which contributes to 67% of DB Time.
5) Then , SQL Statistics can be checked.
SQL ordered by Elapsed Time
SQL ordered by CPU Time
SQL ordered by Gets
SQL ordered by Reads
SQL Statistics section would have commonly the above four sections.
Each section shows the list of SQLs based on the order of the respective metric.
For example, SQL ordered by Elapsed Time section shows the list of SQLs in the order
of the Elapsed Time. High resource consuming SQLs can be spotted out and meant for
Note: All the above four sections of SQL Statistics show the list of SQLs in descending order.
i.e, For ex: Highest elapsed time is shown as first.
6) Then comes the IO Stats section.
This shows the IO Statistics for each tablespaces in the database.
As the thumb rule, the Av Rd(ms) [Average Reads in milliseconds] should not cross beyond 30, add myself(not greater that 30)
which is considered to be IO bottleneck.
Tablespace IO Stats
ordered by IOs (Reads + Writes) desc
Tablespace Reads Av Reads/s Av Rd(ms) Av Blks/Rd Writes Av Writes/s Buffer Waits Av Buf Wt(ms)
TEMP 3,316,082 933 4.91 1.00 28,840 8 0 0.00
DAT1 520,120 146 16.06 1.21 185,846 52 902 13.00
DAT3 93,411 26 42.82 2.98 13,442 4 16 23.13
DAT2 98,171 28 91.97 7.97 5,333 2 325 34.89
In the above example, the Av Rd(ms) is high in all tablespaces indicating the IO contention.
7) Then , Advisory Statistics can be checked.
This section shows the following:-
Buffer Pool Advisory
PGA Aggr Summary
PGA Aggr Target Stats
PGA Aggr Target Histogram
PGA Memory Advisory
Shared Pool Advisory
SGA Target Advisory
Streams Pool Advisory
Java Pool Advisory
It is very commonly used to check the advisories for the most important SGA structures like shared pool, buffer cache etc and PGA.
8) Then finally, init.ora Parameters is shown which shows the list of parameters set at instance level.
All the above said sections except the DB Time can be checked from Statspack report also.
The statspack snapshots are not generated automatically as in AWR.It has to be generated during the problem period as follows:-
Take 2 snapshots between 60 minutes interval during the problem and generate the statspack report
wait for 60 minutes
Please run $ORACLE_HOME/rdbms/admin/spreport.sql
and specify BEGIN and END ID’s of the snapshots taken during the problem.