stay Oracle in , What should be paid attention to when collecting statistical information of tables ?


          Answer section          

On the collection of statistical information, we need to pay attention to the following points :

For a small amount of data OLTP Type of system , It is recommended to use automatic collection of Statistics , And write for some special large tables JOB Collect statistics regularly . If it’s a huge amount of data OLAP perhaps DSS System , Then suggest DBA Write it yourself JOB Script to collect statistics .

After importing a large amount of data, statistical information should be collected in time before the relevant follow-up business processing ( Including query and modification ), Otherwise, there may be a huge difference between the actual amount of data and the amount of data recorded in the statistical information CBO Choose the wrong execution plan .

Global temporary tables cannot collect statistics by default , It is better to use dynamic sampling when generating execution plan .

For some newly launched or migrated systems , It is suggested to collect statistical information once in the whole database .

It is recommended to collect statistical information for tables containing date type fields in time , Avoid predicate crossing .

Statistical information collection job sampling ratio : about Oracle 11g It is recommended that the sampling proportion of statistical information collected in the above version be DBMS_STATS.AUTO_SAMPLE SIZE. If it is Oracle 10g, Set the recommended sampling ratio to 30%, And then according to the goal SQL And then adjust the actual implementation of .

System statistics : If the hardware environment of the system changes , Then it is recommended to collect system statistics once more .

Internal object statistics : In the process of clearly diagnosing the existing performance problems of the system, it is because X$ The statistical information of the internal objects in the table is not allowed , This is the time to collect X$ Table internal object statistics , Don’t collect in other cases .

The size of the table 、 Is it parallel : If the watch is big , And the system is idle , You can use parallelism to collect statistics .

Whether the table is partitioned : If it is a partition table, it is recommended to collect global statistics and a single partition with changed data volume ( Add GRANULARITY And parameters and set properties INCREMENTAL) Statistical information .

Do you want to collect statistics for the index : In general, you should collect index statistics .

Whether to collect histogram . The strategy of collecting histogram statistical information is to collect histogram statistical information only for columns that already have histogram statistical information , The primary histogram statistics of the target column are obtained by the understanding system DBA Manually collect histograms . Set up METHOD_OPT The value of is “FOR ALL COLUMNS SIZE REPEAT”.

Whether statistics can be collected concurrently : If the system has many small tables , You can consider collecting statistics concurrently .

The load of the system : When collecting statistics manually, you need to pay attention to the load of the system .

How long will it take collect complete : Yes OLAP For the big table of the system , According to the experience of collecting statistical information, we should estimate how long it will take to collect statistical information .

database-based 、SCHEMA Or table level : Judge whether it is necessary to use the database or SCHEMA Level to collect statistics .

Whether you need to collect statistics for extended Columns . If the data in the table is highly skewed , Then the histogram collection can help the optimizer calculate the accurate Cardinality, So as to avoid bad execution plan ; Further , If there is a skew, multiple columns together form a Predicate And there is a strong column correlation between these columns , So generating multi column statistics with histogram is a good choice , It can help the optimizer accurately predict Cardinality.

Whether to set up NO_INVALIDATE by FALSE. The options are TRUEFALSE and DBMS_STATS.AUTO_INVALIDATE this 3 It’s worth . If the value is TRUE, This means that the cursor will not fail after collecting statistical information , The original Shared Cursor Keep it as it is . If the value is FALSE, So it means that all the information related to the statistical information object Cursor All the failure , The goal is SQL Statement will use hard parsing the next time it is executed . If set to AUTO_INVALIDATE, that Oracle Their own decisions Shared Cursor Failure action , When SQL The re execution time is more than… From the last time statistics were collected 5 Hours ( Implied parameters “_OPTIMIZER_INVALIDATION_PERIOD” decision ) On the other hand SQL Do hard parsing again .AUTO_INVALIDATE For default options . There are some DBA When collecting statistics , Not used NO_INVALIDATE=>FALSE Options , therefore , Even if statistics are collected , The implementation plan will not change immediately . It can be set at the table level so that all cursors that depend on the table will not fail , Set by :

1EXEC DBMS_STATS.SET_TABLE_PREFS('SH','SALES','NO_INVALIDATE','TRUE');-- Collection SH.SALES Statistics on the table , Let all cursors that depend on the table not fail 


about OLTP Database of type , Need special attention DML More frequent and large data loading tables and partition tables .

Check whether there is data loading near the statistics collection window , If there is , Can it be completed in the window time of database statistics , If it can’t be done in window time , So the data that should be loaded during this period of time , Especially a lot of data , After the relevant load script is complete , Join the statistical information collection .

21 If the amount of data loaded is large , And the partition table , The business data of each partition is presented uniformly , stay Oracle 11g You can consider using DBMS_STATS.COPY_TABLE_STATS Let’s make a quick setup of the statistics first , then , Then collect the statistics of the partition .

Actually , There is no universal standard answer to the above points , Because different systems have different amounts of data 、 The distribution of data is different , It might even make a big difference , Therefore, the statistical information collection strategy suitable for one system may not be suitable for another system . The general principle of collecting statistical information is to tailor it to the circumstances , That is to find a suitable statistical information collection strategy for your own system , With as little cost as possible to collect the statistical information that can stably run out of the correct implementation plan , In other words, the collected statistical information does not have to be accurate , As long as it is representative , It’s OK to be able to run steadily out of the right execution plan .