Thursday, February 16, 2012

Checking for duplicates

Hi,
I have to run through a list of tables and check for duplicates. The
tables have no primary key setup but we expect that there should be only
1 record for a combination of fields.
So for example if I keep it simple and imagine there is a table called
customers with loads of fields and in here the uniqueness of each row is
defined by the fields salesrepid and customerid so there should be only
1 occurrence of eg salesrep xyz123 and customer abc123
What I need to do is somehow look at the table and check to make sure
that the above combination of salesrep xyz123 and customer abc123 only
occurs once and if it does oocur more than once then note it / report on it.
Any ideas on how to approach it? I imagine I could have a static table
containing the tables to be checked plus the fields of each table that
make up the uniquness. So if the tables for checking where in a table
called checkthese then in there I could have a field called tablename,
and then further rows called pkf1 (primary key field 1), pkf2, pkf3 and
so on and for the customers table the pkf1 would have value salesrepid
and pkf2 would have value of customerid.
I've probably not made much sense up there but basically what I need to
do is read a table that contains table names and fields that make up the
uniqueness of the table and then go and check those tables to make sure
that there are no more than one of each record that makes up the uniqueness.
any ideas?
tia,
toeYou could store the table names and column names in a table...or you could
just put them within a script or stored procedure.
To check for uniqueness on one table you could do something like this:
SELECT TheFirstCol, TheSecondCol.... (repeat as needed),
COUNT(*) AS TheDuplicateCount
FROM YourTable
GROUP BY TheFirstCol, TheSecondCol.... (repeat as needed)
HAVING COUNT(*) > 1
You could also do something like this
IF EXISTS (
SELECT TheFirstCol, TheSecondCol.... (repeat as needed),
COUNT(*) AS TheDuplicateCount
FROM YourTable
GROUP BY TheFirstCol, TheSecondCol.... (repeat as needed)
HAVING COUNT(*) > 1)
BEGIN
PRINT 'Duplicate found in table xyz'
END
...repeat
--
Keith Kratochvil
"toedipper" <send_rubbish_here734@.hotmail.com> wrote in message
news:0z0fg.5150$O7.3856@.newsfe5-win.ntli.net...
> Hi,
> I have to run through a list of tables and check for duplicates. The
> tables have no primary key setup but we expect that there should be only 1
> record for a combination of fields.
> So for example if I keep it simple and imagine there is a table called
> customers with loads of fields and in here the uniqueness of each row is
> defined by the fields salesrepid and customerid so there should be only 1
> occurrence of eg salesrep xyz123 and customer abc123
> What I need to do is somehow look at the table and check to make sure that
> the above combination of salesrep xyz123 and customer abc123 only occurs
> once and if it does oocur more than once then note it / report on it.
> Any ideas on how to approach it? I imagine I could have a static table
> containing the tables to be checked plus the fields of each table that
> make up the uniquness. So if the tables for checking where in a table
> called checkthese then in there I could have a field called tablename, and
> then further rows called pkf1 (primary key field 1), pkf2, pkf3 and so on
> and for the customers table the pkf1 would have value salesrepid and pkf2
> would have value of customerid.
> I've probably not made much sense up there but basically what I need to do
> is read a table that contains table names and fields that make up the
> uniqueness of the table and then go and check those tables to make sure
> that there are no more than one of each record that makes up the
> uniqueness.
> any ideas?
> tia,
> toe|||Thanks Keith.
If I where to store the table name and columns to check in a table then
have you any pointers On how I would the go to the table(s) and check
that there is only 1 occurrence for each set of columns?
So if I have the table called 'checkthese' with the fields and data below
Fields Data
tablename: customers
pkf1: salesrepid
pkf2: customerid
How would I actaully look that up and then go to the customers table to
make sure there is only one row of each?
Cheers,
toe
Keith Kratochvil wrote:
> You could store the table names and column names in a table...or you could
> just put them within a script or stored procedure.
> To check for uniqueness on one table you could do something like this:
> SELECT TheFirstCol, TheSecondCol.... (repeat as needed),
> COUNT(*) AS TheDuplicateCount
> FROM YourTable
> GROUP BY TheFirstCol, TheSecondCol.... (repeat as needed)
> HAVING COUNT(*) > 1
> You could also do something like this
> IF EXISTS (
> SELECT TheFirstCol, TheSecondCol.... (repeat as needed),
> COUNT(*) AS TheDuplicateCount
> FROM YourTable
> GROUP BY TheFirstCol, TheSecondCol.... (repeat as needed)
> HAVING COUNT(*) > 1)
> BEGIN
> PRINT 'Duplicate found in table xyz'
> END
> ...repeat|||One method would be to cursor through the data within checkthese and build
and execute the appropriate sql statement.
You could also write a sql statement that would create the appropriate T-SQL
commands that you would have to execute on your own.
I will let you explore the cursor option a bit. The other option would look
something like this:
--your table
create table #checkthese (TableName varchar(128), PKcols varchar(2000))
insert into #checkthese (TableName, PKcols) VALUES ('customers',
'salesrepid, customerid')
insert into #checkthese (TableName, PKcols) VALUES ('SalesRep',
'salesrepid')
GO
--the select (run the output)
SELECT 'IF EXISTS (SELECT ' + PKcols + ' , COUNT(*) FROM ' + TableName + '
GROUP BY ' + PKcols + ' HAVING COUNT(*) > 1 )
BEGIN
PRINT ''Duplicates found within '' + TableName
END' + char(13) + char(10) + 'GO'
from #checkthese
Keith Kratochvil
"toedipper" <send_rubbish_here734@.hotmail.com> wrote in message
news:447CBDE8.9030902@.hotmail.com...
> Thanks Keith.
> If I where to store the table name and columns to check in a table then
> have you any pointers On how I would the go to the table(s) and check that
> there is only 1 occurrence for each set of columns?
> So if I have the table called 'checkthese' with the fields and data below
> Fields Data tablename: customers
> pkf1: salesrepid
> pkf2: customerid
> How would I actaully look that up and then go to the customers table to
> make sure there is only one row of each?
> Cheers,
> toe
>
> Keith Kratochvil wrote:

No comments:

Post a Comment