SQL Server Scrapbook: Helping SQL Server with more statistics

Existing System: We generally have the SQL Servers to automatically create statistics. So for every column it would create a statistics with the required information. When SQL creates statistics, it’s able to create it with about 200 sample points (Or distinct values). So if there are about 100K distinct names in a column then SQL Server would just be able to accurately store the information of about 200 names and stores the information of other names in ranges. For e.g., if the first sample point is of name “Adam” and the second sample point is of name “Ben” then it accurately stores the information as to how many records in the table are with name “Adam” and “Ben” in the statistics histogram. Along with this it also stores the number of records that are between “Adam” and “Ben” and also the number of distinct value.

Problem Statement: With the above theory it would always be happening that SQL Server is always using very skewed statistics at various times and hence must be using a not so optimized query plans. Let’s take the above samples to understand this better. Consider that the following few names are used in a table from “Adam” to “ben” in alphabetical order:

Name	Number of Rows
Adam	200,000
Alan	25,000
Ariez	500,000
Bacary	50,000
Bane	25,000
Ben	200,000

For the above sample of data, SQL Server would be storing the accurate information of the sample points (“Adam” – 200K and “Ben” – 200K). However it stores an approximate value for the values that are in between these sample points. For e.g., It would just be having an information that there are 4 distinct names between “Adam” and “Ben” and the number of records are 600K so it would estimate that there are about 150K records for each distinct name. Hence when in queries we search for the name “Ariez”, SQL Server would estimate that there are about 150K records with the name “Ariez” but in actual it’s having about 500K records which make it to use some inefficient query plan instead of the most optimized one.

Solution: With the above anomaly in place, we should be providing SQL Servers with some more statistics for it to take better decisions as to which plans are to be used. This is where Filtered indexes come into picture. In the above sample we should be creating filtered statistics on the values between “Adam” and “Ben”. SQL Server in such case would produce another statistic with about 200 sample points (Or lesser if there are less number of distinct values) and hence it would be able to better inform SQL Server that there are about 500K records with the name “Ariez” and other possible names.

SQL Server Scrapbook

Sunday, December 7, 2014

Helping SQL Server with more statistics – Filtered Statistics

No comments:

Translate

Pages