Wednesday, March 7, 2012

Checksum computation help

Please execute the script below to understand the problem -

--
create table test(id int, col1 int,col2 varchar(5),col3 datetime)
create table test2(id int, col1 int,col2 varchar(5),col3 datetime)

--id & col1 make up the PK.

insert test values(4,4,'d','02/06/2004')
insert test values(4,4,'e','02/06/2004')

insert test2 values(4,4,'d','02/06/2004')
insert test2 values(4,4,'e','02/06/2004')

select *
from test

select *
from test2

--The rows are identical.
--Script A

select t.*
from test t
join test2 t2 on t2.id=t.id
where CHECKSUM(t.col2,t.col3)<>CHECKSUM(t2.col2,t2.col3)

--The purpose of the above script is to check for any updates in the two tables. It returns two rows. But as you can see both these rows were present in the table before. So I modify the script to -
--SCRIPT B
select t.*
from test t
join test2 t2 on t2.col2=t.col2
where CHECKSUM(t.col3)<>CHECKSUM(t2.col3)

-- In this case no row is returned.This is exactly what I need. The problem - Now execute the script below.

TRUNCATE TABLE TEST
TRUNCATE TABLE TEST2

insert test values(4,4,'d','02/06/2004')
insert test values(4,4,'d','02/01/2004')

insert test2 values(4,4,'d','02/06/2004')
insert test2 values(4,4,'d','02/01/2004')

--Now when I execute script B two rows are returned which is not what I want. Since the rows are identical no row should be returned. So depending on what column changes (col2 or col3), I have to alter the script. I seek advise on the method to calculate checksum. Again the PK is ID and Col1 only.

Thanks

drop table test
drop table test2
go
--Script B is not correct because you have no keys in tables and, of course, it returns rows - col3s are different. There is relation many to many.|||And did you look up CHECKSUM() in BOL?

I know you're trying to accomplish something...but you got me lost..

It's in the same manner as your previous threads...

Can you give us a "big picture" view of what you're trying to accomplish?

I don't mean to offend, but you need to understan what primary keys are for...sounds like your data model is not fitting in quite right with what you're trying to accomplish...|||I think this would give you an idea of the data. Yesterday when I did the processing I had this view of the table -

ID...County...Univ...Dept.....Status

1...A......XYZ...Accounting...Processed - Good
1...A......ABC...Accounting...Processed - Bad
1...A......XYZ...Marketing...Processed - Good
1...B......PQR...HR............Processed - Good
1...C......XXX...HR............Processed - Bad

I have an index on the Status field coz I can see all Bad records on top.

Today I have in my source system -

ID...County...Univ...Dept

1...A......ABC...Accounting
1...A......XYZ...Accounting
1...A......XYZ...Marketing
1...B......PQR...HR
1...C......XXX...HR
2...C......YYY...Training

I want to process only those records that are new/updated since yesterday's version. I get the above records in a separate table and assign a Status to them as 'Not Processed'. I then compare the two tables. And so because of the problem stated before, I end up processing a record that I have processed the previous day.

So how do I go about this problem? Is there a need for another column in here.

No comments:

Post a Comment