Wednesday, March 7, 2012

CHECKSUM_AGG and BINARY_CHECKSUM performance problems

Gentlemen,

I am using the following query to get a list of grouped checksum data.

SELECT CAST(Field0_datetime AS INT),
CHECKSUM_AGG(BINARY_CHECKSUM(Field1_bigint, Field2_datetime,
Field3_datetime, Field4_bigint, Field5_bigint, CAST(Field6_float
Decimal(38,6)), Field7_datetime))
FROM Table1
WHERE Field0_datetime BETWEEN '2003-01-01' AND '2003-01-20'
GROUP BY CAST(Field0_datetime AS INT)

Please notice the used filter: from January 1 to January 20.
That query takes about 6 minutes do return the data. The result is 18
records.

However, when I execute the same query filtering BETWEEN '2003-01-01' and
'2003-01-10', this time it takes only 1 second to return data.
When I execute the query filtering BETWEEN '2003-01-10' and '2003-01-20' the
query takes another 1 second to return data.

So why 6 minutes to process them together??

The table have an index by Field0_datetime.

It contains about 1.5 millions records total, using around 1.7Gb of
diskspace, indexes included.

From 2003-01-01 and 2003-01-20, there are 11401 records selected. Don't look
like that much.

The situation is repeatable, I mean, if I execute the queries back and
again, they takes the about the same ammount of time to execute, so I don't
think this problem is related to cache or something like that.

I would appreciate any advice about what might be wrong with my situation.

Thanks a lot and kind regards,

Orly Junior
IT ProfessionalBy using the profiler, I found that while executing the first query (20 days
span), the system don't use the index. How it possible?

A simpler version of the query that causes the same problem is:

select checksum_agg(binary_checksum([dc])) from [table1] where [dc] between
'2003-01-01' and '2003-01-20'

The profiler reports it will be using a clustered index scan wich is
unacceptable since the table have a lot of records.

Why the hell it is not using the [dc] index ?? If a tight the criteria to
between a 10-day span it uses the index correctly.

Do you have any idea why is that happening?

Thanks in advance and best regards,

Orly Junior
IT Professional

"Orly Junior" <nomail@.nomail.com> wrote in message
news:42b0c9e6$0$32014$a729d347@.news.telepac.pt...
> Gentlemen,
> I am using the following query to get a list of grouped checksum data.
> SELECT CAST(Field0_datetime AS INT),
> CHECKSUM_AGG(BINARY_CHECKSUM(Field1_bigint, Field2_datetime,
> Field3_datetime, Field4_bigint, Field5_bigint, CAST(Field6_float
> Decimal(38,6)), Field7_datetime))
> FROM Table1
> WHERE Field0_datetime BETWEEN '2003-01-01' AND '2003-01-20'
> GROUP BY CAST(Field0_datetime AS INT)
> Please notice the used filter: from January 1 to January 20.
> That query takes about 6 minutes do return the data. The result is 18
> records.
> However, when I execute the same query filtering BETWEEN '2003-01-01' and
> '2003-01-10', this time it takes only 1 second to return data.
> When I execute the query filtering BETWEEN '2003-01-10' and '2003-01-20'
> the query takes another 1 second to return data.
> So why 6 minutes to process them together??
> The table have an index by Field0_datetime.
> It contains about 1.5 millions records total, using around 1.7Gb of
> diskspace, indexes included.
> From 2003-01-01 and 2003-01-20, there are 11401 records selected. Don't
> look like that much.
> The situation is repeatable, I mean, if I execute the queries back and
> again, they takes the about the same ammount of time to execute, so I
> don't think this problem is related to cache or something like that.
> I would appreciate any advice about what might be wrong with my situation.
> Thanks a lot and kind regards,
> Orly Junior
> IT Professional|||Orly Junior (nomail@.nomail.com) writes:
> By using the profiler, I found that while executing the first query (20
> days span), the system don't use the index. How it possible?

When you have a non-clustered index that can be used to compute a query,
SQL Server cannot always use this index blindly. If the selection is
small, the index is find. If the selection is large, the index spells
disaster. This is because every hit in the pages, requires an access to
the data pages. This can up with more pages reads, than use scanning the
table once.

Now, in your case, there are 11041 rows that matches the WHERE clause.
The table is 1.7 GB, which is 207 000 pages. Even if some of those
1.7 GB are indexes, the table scan is obviously more expensive.

But SQL Server does not build query plans from full knowledge, but from
statistics it has saved about the table. If this statistics is inaccurate
for some reason, the estimate may be incorrect. By default, SQL Server
does only sample data for its statistics.

You can try "UPDATE STATISTICS tbl WITH FULLSCAN" and see if this
has any effect. SQL Server will now look at all rows. However, it
saves data in a histogramme, so you may still lose accuracy. DBCC
SHOW_STATISTICS may give some information.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment