Wednesday, March 7, 2012

checksum

It seems that the checksum function does not distinguish between positive
and negative decimals or floats. Is this by design? For example
select checksum(1.0)
select checksum(-1.0)
returns:
-1374215283
-1374215283Elmer Miller wrote:
> It seems that the checksum function does not distinguish between positive
> and negative decimals or floats. Is this by design? For example
> select checksum(1.0)
> select checksum(-1.0)
> returns:
> -1374215283
> -1374215283
That's right. CHECKSUM doesn't necessarily return distinct results for
different inputs.
SELECT CHECKSUM(N'ABC') AS ABC,CHECKSUM(N'ASH') AS ASH;
ABC ASH
-- --
1132495864 1132495864
(1 row(s) affected)
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||It is odd. I am trying to use checksum to build an index. The documentation
states that checksum is intended for the building of hash indexes.
However...
I have a table of approx 1.3 million rows and 25 columns (mixed types). I
ran the following statements:
select count(*) as vol, checksum(*) as Hash from <tablename> group by
checksum(*) order by vol desc
This returned 227 records that had the same check sums as another row in the
same table. No row was matched more than twice. This means that 0.01% of the
rows have the same check sums. I then took a look at the rows and they are
very different. They do have the same data types across the columns but there
is a 0.01% chance that my table returns the same checksum despite the data
within them being very different.
This means that I can't really use it as an index. Could there be another
way of creating an index from columns? perhaps an MD5 Hash?
thanks for any help on this.
"David Portas" wrote:
> Elmer Miller wrote:
> > It seems that the checksum function does not distinguish between positive
> > and negative decimals or floats. Is this by design? For example
> > select checksum(1.0)
> >
> > select checksum(-1.0)
> >
> > returns:
> >
> > -1374215283
> >
> > -1374215283
> That's right. CHECKSUM doesn't necessarily return distinct results for
> different inputs.
> SELECT CHECKSUM(N'ABC') AS ABC,CHECKSUM(N'ASH') AS ASH;
> ABC ASH
> -- --
> 1132495864 1132495864
> (1 row(s) affected)
>
> --
> David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem.
> Including CREATE TABLE and INSERT statements usually helps.
> State what version of SQL Server you are using and specify the content
> of any error messages.
> SQL Server Books Online:
> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
> --
>|||Some other checksums will give you better results but basically no checksum
will guarantee you won't have collisions. Checksums are useful as indexes
just as hash functions are useful in building hash tables even though
uniqueness is not guaranteed. If I understood your statistics correctly,
you would return a maximum of two rows which is pretty good for 1.3 million
candidates. Presumably once you have narrowed the search to two or three
rows you can use some other means to get the exact row you want.
A checksum ensures that no two identical rows will return different
checksums but it doesn't ensure that the same checksum can't be returned
from different rows.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
"Sharat Koya" <SharatKoya@.discussions.microsoft.com> wrote in message
news:4E79996B-27A5-46B5-8E9B-E3FFC68024D5@.microsoft.com...
> It is odd. I am trying to use checksum to build an index. The
> documentation
> states that checksum is intended for the building of hash indexes.
> However...
> I have a table of approx 1.3 million rows and 25 columns (mixed types). I
> ran the following statements:
> select count(*) as vol, checksum(*) as Hash from <tablename> group by
> checksum(*) order by vol desc
> This returned 227 records that had the same check sums as another row in
> the
> same table. No row was matched more than twice. This means that 0.01% of
> the
> rows have the same check sums. I then took a look at the rows and they are
> very different. They do have the same data types across the columns but
> there
> is a 0.01% chance that my table returns the same checksum despite the data
> within them being very different.
> This means that I can't really use it as an index. Could there be another
> way of creating an index from columns? perhaps an MD5 Hash?
> thanks for any help on this.
>
>
> "David Portas" wrote:
>> Elmer Miller wrote:
>> > It seems that the checksum function does not distinguish between
>> > positive
>> > and negative decimals or floats. Is this by design? For example
>> > select checksum(1.0)
>> >
>> > select checksum(-1.0)
>> >
>> > returns:
>> >
>> > -1374215283
>> >
>> > -1374215283
>> That's right. CHECKSUM doesn't necessarily return distinct results for
>> different inputs.
>> SELECT CHECKSUM(N'ABC') AS ABC,CHECKSUM(N'ASH') AS ASH;
>> ABC ASH
>> -- --
>> 1132495864 1132495864
>> (1 row(s) affected)
>>
>> --
>> David Portas, SQL Server MVP
>> Whenever possible please post enough code to reproduce your problem.
>> Including CREATE TABLE and INSERT statements usually helps.
>> State what version of SQL Server you are using and specify the content
>> of any error messages.
>> SQL Server Books Online:
>> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
>> --
>>

No comments:

Post a Comment