Sunday, March 25, 2012
Clean up a table & save to another table
moving parts of each row into appropriate tables and deleting redundant
rows. This table is just a scratch table used while importing old data
into a new DB.
The situation: A company supplies credits to its customers (other
companies [construction] called Developers) that the customers
(Developers) can use to reduce any bills sent to them by the company.
Each time the credits are supplied they are referred to as a Credit
Agreement. The developers can transfer those credits, or parts of those
credits, to other developers. When a transfer is made they are creating
a new agreement. The Agreement identifier, in correspondence, uses a
user constructed identifier (ID #). The company keeps the original
credit agreement identifier & just adds a sequential letter suffix
representing each iteration of a new agreement off the original (because
the original agreement has one cut-off date that is enforced for all
subsequent agreements that use credits from the original credit agreement).
E.g.:
1,'FIF2 GRNBR 001','LUCY',55000.00 <- original agreement
2,'FIF2 GRNBR 001A','LUCY',-20000.00 <- Lucy transferred $20K to Judy
3,'FIF2 GRNBR 001A','JUDY',20000.00
5,'FIF2 GRNBR 001B','JUDY',-10000.00 <- Judy transferred $10K to Susie
4,'FIF2 GRNBR 001B','SUSIE',10000.00
6,'TIF1 ACSPA 001','WILLY',30000.00 <- original agreement
7,'TIF1 ACSPA 002','WILLY',45000.00 <- original agreement
8,'TIF1 MSSH 001','FRED',25000.00 <- original agreement
9,'TIF1 MSSH 001A','FRED',-10000.00 <- Fred transferred $10K to Harry
10,'TIF1 MSSH 001A','HARRY',10000.00
11,'TIF1 MSSH 001B','HARRY',-100000.00 <- Harry transferred $10K to Joe
12,'TIF1 MSSH 001B','JOE',10000.00
I need to generate a new row in the table Transactions (below) for both
the debit on the original credit agreement and the transfer/credit for
the new credit agreement.
/* BEGIN DDL */
set nocount on
CREATE TABLE AllAgreements ( -- scratch table - no PK
ca_id INTEGER NOT NULL UNIQUE,
"ID #" VARCHAR(20) NOT NULL,
Developer VARCHAR(5) NOT NULL , -- REFERENCES Developers,
issued_date DATETIME NOT NULL,
amt decimal(9,2) NOT NULL
)
CREATE TABLE Transactions (
ca_id INTEGER NOT NULL REFERENCES AllAgreements,
developer VARCHAR(5) NOT NULL , -- REFERENCES Developers,
transaction_type INTEGER NOT NULL
CHECK (transaction_type BETWEEN 1 AND 5),
applied_date DATETIME NOT NULL,
amount DECIMAL(9,2) NOT NULL,
CONSTRAINT PK_Transactions
PRIMARY KEY (ca_id, developer, transaction_type, applied_date)
)
INSERT INTO AllAgreements
VALUES (1,'FIF2 GRNBR 001','LUCY','20050115',55000.00)
INSERT INTO AllAgreements
VALUES (2,'FIF2 GRNBR 001A','LUCY','20050122',-20000.00)
INSERT INTO AllAgreements
VALUES (3,'FIF2 GRNBR 001A','JUDY','20050122',20000.00)
INSERT INTO AllAgreements
VALUES (4,'FIF2 GRNBR 001B','SUSIE','20050122',10000.00)
INSERT INTO AllAgreements
VALUES (5,'FIF2 GRNBR 001B','JUDY','20050122',-10000.00)
INSERT INTO AllAgreements
VALUES (6,'TIF1 ACSPA 001','WILLY','20050211',30000.00)
INSERT INTO AllAgreements
VALUES (7,'TIF1 ACSPA 002','WILLY','20050211',45000.00)
INSERT INTO AllAgreements
VALUES (8,'TIF1 MSSH 001','FRED','20050212',25000.00)
INSERT INTO AllAgreements
VALUES (9,'TIF1 MSSH 001A','FRED','20050212',-10000.00)
INSERT INTO AllAgreements
VALUES (10,'TIF1 MSSH 001A','HARRY','20050212',10000.00)
INSERT INTO AllAgreements
VALUES (11,'TIF1 MSSH 01B','HARRY','20050225',-100000.00)
INSERT INTO AllAgreements
VALUES (12,'TIF1 MSSH 001B','JOE','20050225',10000.00)
/* desired results in table AllAgreements (I'll remove the Amt column
after the clean up & the values are in the Transactions table. Didn't
show the date in order to display row on one line).
ca_id ID # Developer amt
1 FIF2 GRNBR 001 LUCY $55,000.00
3 FIF2 GRNBR 001A JUDY $20,000.00 <- Lucy transferred to Judy
4 FIF2 GRNBR 001B SUSIE $10,000.00 <- Judy transferred to Susie
6 TIF1 ACSPA 001 WILLY $30,000.00
7 TIF1 ACSPA 002 WILLY $45,000.00
8 TIF1 MSSH 001 FRED $25,000.00
10 TIF1 MSSH 001A HARRY $10,000.00 <- Fred transferred to Harry
12 TIF1 MSSH 001B JOE $10,000.00 <- Harry transferred to Joe
*/
/* desired results in table Transactions:
(1 = Original credit agreement; 2 = transfer of CR agreement)
E.g.: below the 1st 2 LUCY lines show the original credit
(trans_type=1) and the debit (trans_type=2) on the ID # "FIF2 GRNBR 001"
caused by the transfer to JUDY (line 3). The JUDY line's ca_id, 3,
points to the new credit agreement "FIF2 GRNBR 001A" row in the
AllAgreements table.
ca_id developer transaction_type applied_date amount
-- -- -- -- --
1 LUCY 1 20050115 55000.00
1 LUCY 2 20050122 -20000.00
3 JUDY 2 20050122 20000.00
4 SUSIE 2 20050122 10000.00
3 JUDY 2 20050122 -10000.00
6 WILLY 1 20050211 30000.00
7 WILLY 1 20050211 45000.00
8 FRED 1 20050212 25000.00
8 FRED 2 20050212 -10000.00
10 HARRY 2 20050212 10000.00
10 HARRY 2 20050225 -100000.00
12 JOE 2 20050225 10000.00
*/
DROP TABLE AllAgreements, Transactions
/* END */
QUESTION: What append command will move the amounts to the Transactions
table w/ the correct ca_id value.
Thanks for your help.
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)Hi
See inline:
"MGFoster" <mgf00@.earthlink.net> wrote in message
news:8waoe.562$HM.259@.newsread1.news.pas.earthlink.net...
> I'm trying to filter and update the table AllAgreements (below) by moving
> parts of each row into appropriate tables and deleting redundant rows.
> This table is just a scratch table used while importing old data into a
> new DB.
> The situation: A company supplies credits to its customers (other
> companies [construction] called Developers) that the customers
> (Developers) can use to reduce any bills sent to them by the company. Each
> time the credits are supplied they are referred to as a Credit Agreement.
> The developers can transfer those credits, or parts of those credits, to
> other developers. When a transfer is made they are creating a new
> agreement. The Agreement identifier, in correspondence, uses a user
> constructed identifier (ID #). The company keeps the original credit
> agreement identifier & just adds a sequential letter suffix representing
> each iteration of a new agreement off the original (because the original
> agreement has one cut-off date that is enforced for all subsequent
> agreements that use credits from the original credit agreement).
> E.g.:
> 1,'FIF2 GRNBR 001','LUCY',55000.00 <- original agreement
> 2,'FIF2 GRNBR 001A','LUCY',-20000.00 <- Lucy transferred $20K to Judy
> 3,'FIF2 GRNBR 001A','JUDY',20000.00
> 5,'FIF2 GRNBR 001B','JUDY',-10000.00 <- Judy transferred $10K to Susie
> 4,'FIF2 GRNBR 001B','SUSIE',10000.00
> 6,'TIF1 ACSPA 001','WILLY',30000.00 <- original agreement
> 7,'TIF1 ACSPA 002','WILLY',45000.00 <- original agreement
> 8,'TIF1 MSSH 001','FRED',25000.00 <- original agreement
> 9,'TIF1 MSSH 001A','FRED',-10000.00 <- Fred transferred $10K to Harry
> 10,'TIF1 MSSH 001A','HARRY',10000.00
> 11,'TIF1 MSSH 001B','HARRY',-100000.00 <- Harry transferred $10K to Joe
> 12,'TIF1 MSSH 001B','JOE',10000.00
> I need to generate a new row in the table Transactions (below) for both
> the debit on the original credit agreement and the transfer/credit for the
> new credit agreement.
> /* BEGIN DDL */
> set nocount on
> CREATE TABLE AllAgreements ( -- scratch table - no PK
> ca_id INTEGER NOT NULL UNIQUE,
> "ID #" VARCHAR(20) NOT NULL,
> Developer VARCHAR(5) NOT NULL , -- REFERENCES Developers,
> issued_date DATETIME NOT NULL,
> amt decimal(9,2) NOT NULL
> )
> CREATE TABLE Transactions (
> ca_id INTEGER NOT NULL REFERENCES AllAgreements,
> developer VARCHAR(5) NOT NULL , -- REFERENCES Developers,
> transaction_type INTEGER NOT NULL
> CHECK (transaction_type BETWEEN 1 AND 5),
> applied_date DATETIME NOT NULL,
> amount DECIMAL(9,2) NOT NULL,
> CONSTRAINT PK_Transactions
> PRIMARY KEY (ca_id, developer, transaction_type, applied_date)
> )
> INSERT INTO AllAgreements
> VALUES (1,'FIF2 GRNBR 001','LUCY','20050115',55000.00)
> INSERT INTO AllAgreements
> VALUES (2,'FIF2 GRNBR 001A','LUCY','20050122',-20000.00)
> INSERT INTO AllAgreements
> VALUES (3,'FIF2 GRNBR 001A','JUDY','20050122',20000.00)
> INSERT INTO AllAgreements
> VALUES (4,'FIF2 GRNBR 001B','SUSIE','20050122',10000.00)
> INSERT INTO AllAgreements
> VALUES (5,'FIF2 GRNBR 001B','JUDY','20050122',-10000.00)
> INSERT INTO AllAgreements
> VALUES (6,'TIF1 ACSPA 001','WILLY','20050211',30000.00)
> INSERT INTO AllAgreements
> VALUES (7,'TIF1 ACSPA 002','WILLY','20050211',45000.00)
> INSERT INTO AllAgreements
> VALUES (8,'TIF1 MSSH 001','FRED','20050212',25000.00)
> INSERT INTO AllAgreements
> VALUES (9,'TIF1 MSSH 001A','FRED','20050212',-10000.00)
> INSERT INTO AllAgreements
> VALUES (10,'TIF1 MSSH 001A','HARRY','20050212',10000.00)
> INSERT INTO AllAgreements
> VALUES (11,'TIF1 MSSH 01B','HARRY','20050225',-100000.00)
> INSERT INTO AllAgreements
> VALUES (12,'TIF1 MSSH 001B','JOE','20050225',10000.00)
> /* desired results in table AllAgreements (I'll remove the Amt column
> after the clean up & the values are in the Transactions table. Didn't
> show the date in order to display row on one line).
> ca_id ID # Developer amt
> 1 FIF2 GRNBR 001 LUCY $55,000.00
> 3 FIF2 GRNBR 001A JUDY $20,000.00 <- Lucy transferred to Judy
> 4 FIF2 GRNBR 001B SUSIE $10,000.00 <- Judy transferred to Susie
> 6 TIF1 ACSPA 001 WILLY $30,000.00
> 7 TIF1 ACSPA 002 WILLY $45,000.00
> 8 TIF1 MSSH 001 FRED $25,000.00
> 10 TIF1 MSSH 001A HARRY $10,000.00 <- Fred transferred to Harry
> 12 TIF1 MSSH 001B JOE $10,000.00 <- Harry transferred to Joe
> */
This seems to be
SELECT A.ca_id,A.[ID #],A.Developer,
CONVERT(char(10),A.issued_date,112) As Applied_date,
A.Amt FROM AllAgreements A
WHERE A.Amt > 0
order by A.ca_id
> /* desired results in table Transactions:
> (1 = Original credit agreement; 2 = transfer of CR agreement)
> E.g.: below the 1st 2 LUCY lines show the original credit (trans_type=1)
> and the debit (trans_type=2) on the ID # "FIF2 GRNBR 001" caused by the
> transfer to JUDY (line 3). The JUDY line's ca_id, 3, points to the new
> credit agreement "FIF2 GRNBR 001A" row in the AllAgreements table.
> ca_id developer transaction_type applied_date amount
> -- -- -- -- --
> 1 LUCY 1 20050115 55000.00
> 1 LUCY 2 20050122 -20000.00
> 3 JUDY 2 20050122 20000.00
> 4 SUSIE 2 20050122 10000.00
> 3 JUDY 2 20050122 -10000.00
> 6 WILLY 1 20050211 30000.00
> 7 WILLY 1 20050211 45000.00
> 8 FRED 1 20050212 25000.00
> 8 FRED 2 20050212 -10000.00
> 10 HARRY 2 20050212 10000.00
> 10 HARRY 2 20050225 -100000.00
> 12 JOE 2 20050225 10000.00
> */
>
I can't see the login in the ca_id column as they do not relate to the
transaction concerned! But this is almost what you wanted:
SELECT A.ca_id,--A.[ID #],
B.Developer,
2 Transaction_Type,
CONVERT(char(10),A.issued_date,112) As Applied_date,
B.AMT AS AMT
FROM AllAgreements A
JOIN AllAgreements B ON A.ca_id <> B.ca_id
AND A.[ID #] = B.[ID #]
AND A.Developer <> B.Developer
AND A.issued_date = B.issued_date
AND A.AMT = -1 * B.AMT
WHERE A.AMT > 0
UNION ALL
SELECT A.ca_id,--A.[ID #],
A.Developer AS Developer,
CASE WHEN B.ca_id IS NULL THEN 1 ELSE 2 END AS Transaction_Type,
CONVERT(char(10),A.issued_date,112) As Applied_date,
A.AMT AS AMT
FROM AllAgreements A
LEFT JOIN AllAgreements B ON A.ca_id <> B.ca_id
AND A.[ID #] = B.[ID #]
AND A.Developer <> B.Developer
AND A.issued_date = B.issued_date
AND A.AMT = -1 * B.AMT
WHERE A.AMT > 0
ORDER BY A.CA_ID, AMT
> DROP TABLE AllAgreements, Transactions
> /* END */
>
> QUESTION: What append command will move the amounts to the Transactions
> table w/ the correct ca_id value.
>
You will have to decide what the criteria is for the ca_id.
John
> Thanks for your help.
> --
> MGFoster:::mgf00 <at> earthlink <decimal-point> net
> Oakland, CA (USA)|||John Bell wrote:
> I can't see the login [do you mean logic?] in the ca_id column as they do
not relate to the
> transaction concerned! But this is almost what you wanted:
> SELECT A.ca_id,--A.[ID #],
> B.Developer,
> 2 Transaction_Type,
> CONVERT(char(10),A.issued_date,112) As Applied_date,
> B.AMT AS AMT
> FROM AllAgreements A
> JOIN AllAgreements B ON A.ca_id <> B.ca_id
> AND A.[ID #] = B.[ID #]
> AND A.Developer <> B.Developer
> AND A.issued_date = B.issued_date
> AND A.AMT = -1 * B.AMT
> WHERE A.AMT > 0
> UNION ALL
> SELECT A.ca_id,--A.[ID #],
> A.Developer AS Developer,
> CASE WHEN B.ca_id IS NULL THEN 1 ELSE 2 END AS Transaction_Type,
> CONVERT(char(10),A.issued_date,112) As Applied_date,
> A.AMT AS AMT
> FROM AllAgreements A
> LEFT JOIN AllAgreements B ON A.ca_id <> B.ca_id
> AND A.[ID #] = B.[ID #]
> AND A.Developer <> B.Developer
> AND A.issued_date = B.issued_date
> AND A.AMT = -1 * B.AMT
> WHERE A.AMT > 0
> ORDER BY A.CA_ID, AMT
>
> You will have to decide what the criteria is for the ca_id.
>
Ah, but that's just what I need the query to do - figure out which ca_id
goes w/ each transfer's credit transaction.
Think of it this way: Lucy transfers $20K from agreement FIF2 GRNBR 001
(ca_id 1) to Judy, which makes a new agreement FIF2 GRNBR 001A (ca_id 2)
for Judy. I need to show in the Transactions table a debit of $20K from
ca_id 1 and a credit to ca_id 2 (easy to do) of $20K.
The following DML command will get the credits:
INSERT INTO Transactions (ca_id, developer, transaction_type,
applied_date, amount)
SELECT ca_id, developer, 2 As trans_type, issued_date, amt
FROM AllAgreements
WHERE amt > 0
It's the DML command that puts the debits, w/ the proper ca_id, that's
the problem.
Ex. 1:
ca_id, "ID #", developer, amt
1,'FIF2 GRNBR 001','LUCY',55000.00 <- original agreement
2,'FIF2 GRNBR 001A','LUCY',-20000.00 <- Lucy transferred $20K to Judy
The 2nd row (ca_id 2) needs to have an entry in Transactions of:
ca_id developer, trans_type, amount
--
1 Lucy 2 -20000.00
Since it is a deduction from agreement 'FIF2 GRNBR 001.'
Ex. 2:
3,'FIF2 GRNBR 001A','JUDY',20000.00
5,'FIF2 GRNBR 001B','JUDY',-10000.00 <- Judy transferred $10K to Susie
4,'FIF2 GRNBR 001B','SUSIE',10000.00
The above row needs to have an entry in Transactions of:
ca_id developer, trans_type, amount
--
3 Judy 2 -10000.00
Since it is a deduction from agreement 'FIF2 GRNBR 001A.'
IOW, the credits ($) are deducted from the prior agreement:
001A deducts from 001
001B deducts from 001A
I believe I need to add a character "@." to the 001 so the query can
"see" the previous agreement as
Right([current ID #],1) > Right([other ID #],1)
This sorta works:
SELECT C.ca_id, c.[ID #], d.developer, d.transaction_type,
d.issued_date, d.amt
FROM AllAgreements As C,
(SELECT ca_id,[ID #], Developer, 2 AS Transaction_Type,
issued_date, amt
FROM AllAgreements
WHERE amt<0 ) As D
WHERE LEFT(C.[id #],LEN(c.[id #])-1)=LEFT(d.[id #],LEN(d.[id #])-1)
AND ASC(RIGHT(C.[ID #],1)) = ASC(RIGHT(D.[ID #],1))-1
ORDER BY 1, 2
Data from above query (removed the amt & date cols so could display on
one line):
ca_id ID # developer amt
1 FIF2 GRNBR 001@. LUCY ($20,000.00) <- correct
2 FIF2 GRNBR 001A JUDY ($10,000.00) <- incorrect
3 FIF2 GRNBR 001A JUDY ($10,000.00) <- correct
8 TIF1 MSSH 001@. FRED ($10,000.00) <- correct
9 TIF1 MSSH 001A HARRY ($10,000.00) <- incorrect
10 TIF1 MSSH 001A HARRY ($10,000.00) <- correct
Any refinements possible?
Rgds,
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
Tuesday, March 20, 2012
CIRUCLAR REFERENCE Microsoft SQL Server 2000 Access ON DELETE UPDATE CASCADE
Server 2000 database that is using Access as a front end (adp file).
Since SQL server does not allow cycle cascading or circular reference,
I am having problems.
For example, suppose I have tableA, tableB, tableC, and tableD. tableB
and tableC each have a foreign key (tableAID) which is tableA's primary
key. tableD has 2 foreign keys (tableBID and tableCID), one of which is
tableB's primary key and one is tableC's primary Key. The problem is
that I can not have ON UPDATE/DELETE CASCADE because it is
circular/cyclic referential integrity. (I get an error when I try to
set both tableB and tableC to ON DELETE CASCADE... I can only set it
for one of them) What can I do to still have the behavior of ON
UPDATE/DELETE CASCADE? I think there is a way to do it with triggers
but I don't know how to program a trigger. My objective is that when
I delete (or update) a row in tableA, the corresponding rows in tableB,
tableC, and tableD will be deleted/updated. Please help!The best way would be to not depend on cascading updates. Rather, you should
be
using stored procedures and *explicitly* delete referenced data as part of a
conscious design. However, I'd imagine you have your frontend linked directl
y to
the tables in the backend. This, among other reasons, is why this is a weak
design.
Thomas
<mneufeld@.gmail.com> wrote in message
news:1121874815.358446.298010@.f14g2000cwb.googlegroups.com...
>I need some Microsoft Access / SQL Server 2000 help. I have a SQL
> Server 2000 database that is using Access as a front end (adp file).
> Since SQL server does not allow cycle cascading or circular reference,
> I am having problems.
> For example, suppose I have tableA, tableB, tableC, and tableD. tableB
> and tableC each have a foreign key (tableAID) which is tableA's primary
> key. tableD has 2 foreign keys (tableBID and tableCID), one of which is
> tableB's primary key and one is tableC's primary Key. The problem is
> that I can not have ON UPDATE/DELETE CASCADE because it is
> circular/cyclic referential integrity. (I get an error when I try to
> set both tableB and tableC to ON DELETE CASCADE... I can only set it
> for one of them) What can I do to still have the behavior of ON
> UPDATE/DELETE CASCADE? I think there is a way to do it with triggers
> but I don't know how to program a trigger. My objective is that when
> I delete (or update) a row in tableA, the corresponding rows in tableB,
> tableC, and tableD will be deleted/updated. Please help!
>|||Why is this a weak design? Please Explain.
Is there anything wrong with depending on cascading deletes and
updates?
Also, how would I create a stored procedure or a trigger to
automatically delete the corresponding rows in tableB tableC and tableD
when I delete a row in tableA?|||>> Since SQL server does not allow cycle cascading or circular reference, I
am having problems. <<
Other SQLs do not have this problem, so you might want to switch to one
of them. Otherwise, you can use triggers (remember to comment it as
kludge) or write a stored procedure to do the job.
Don't do narratives; do DDL or at least psuedo-code. Is this what you
meant?
CREATE TABLE Alpha
(alpha_id INTEGER NOT NULL PRIMARY KEY,
.);
CREATE TABLE Beta
(beta_id INTEGER NOT NULL PRIMARY KEY,
alpha_id INTEGER NOT NULL
REFERENCES Alpha(alpha_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
);
CREATE TABLE Gamma
(gamma_id INTEGER NOT NULL PRIMARY KEY,
alpha_id INTEGER NOT NULL
REFERENCES Alpha(alpha_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
);
CREATE TABLE Delta
(delta_id INTEGER NOT NULL PRIMARY KEY,
beta_id INTEGER NOT NULL
REFERENCES Beta(beta_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
gamma_id INTEGER NOT NULL
REFERENCES Gamma(gamma_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
.);
There is two paths from Alpha to Delta (A -> B -> D) and( A -> C -> D).
SQL Server sees this and chokes.
One ugly trick is to create a BetaGamma table with a compound key so
there is only one path (A ->. BC ->. D). Another trick is split Delta
so you have (A -> B -> D1) and( A -> C -> D2). But we'd need more
information to do one of these.
One of the advantages of DRI is that it gives information to the
optimizer. You know that a join on a PK-FK will always work and that
an EXISTS () is true or not on sight. You know how the stats will
change, etc.|||Thanks Celko for replying. It is actually funny because I was looking
at other threads that you replied to and I was thinking, "Wow, it looks
like Celko knows his stuff and I wish he would reply to my thread."
Then low and behold I checked my thread and you replied! I have a few
questions though.
I'm new to groups and I don't know what DDL is?
Yes
kludge) or write a stored procedure to do the job. <<
I have never written a trigger and I do not know how to. Can you please
explain.
optimizer. You know that a join on a PK-FK will always work and that
an EXISTS () is true or not on sight. You know how the stats will
change, etc. <<
I don't understand this entire paragraph. I'm new to groups and to
databases so can you please explain it to me.
Thanx.|||>> I'm new to groups and I don't know what DDL is? <<
SQL has three sub-languages: DDL (Data Declaration Language), DML (Data
Manipulation Language) and DCL (Data Control Language). DDL is the
CREATE, ALTER and DROP stuff.
For that, you need a book not a newsgroup. A trigger is a kind of
stored procedure that is attachd to a table and it gets "fired"
(invoked) when a database event occurs (update, delete, insert). Three
are some special things about triggers which involve the delted and
inserted rows in the T_SQL model and there are some limits as to what
you can do in a Trigger.
Oh Boy! This way outside a newsgroup! You tell SQL *what" you want
and the optimzer figures out *how* to find it. The optimizer uses
statistics, constraints, and DRI to pick the right access methods.
PK-FK can pass on some ifno. For example, can you look at this
predicate and tell me if it is TRUE or FALSE?
EXISTS
(SELECT *
FROM Delta
WHERE Delta.beta_id = Beta.beta_id).
Likewise if I do a JOIN on (Delta.beta_id = Beta.beta_id), I can pull
up stats to see the statistical distribution, number of disticnt
beta_id values in each table, etc.|||Celko,
Thanx for explaining heres the DDL:
CREATE TABLE tbServer
(lngServerID INTEGER NOT NULL PRIMARY KEY,
.);
CREATE TABLE tbDatabase
(lngDatabaseID INTEGER NOT NULL PRIMARY KEY,
lngServerID INTEGER NOT NULL
REFERENCES tbServer(lngServerID )
ON DELETE CASCADE
ON UPDATE CASCADE,
);
CREATE TABLE tbApplication
(lngApplicationID INTEGER NOT NULL PRIMARY KEY,
lngServerID INTEGER NOT NULL
REFERENCES tbServer(lngServerID)
ON DELETE CASCADE
ON UPDATE CASCADE,
);
CREATE TABLE tbApplicationToDatabase
(lngApplicationToDatabaseID INTEGER NOT NULL PRIMARY KEY,
lngApplicationID INTEGER NOT NULL
REFERENCES tbApplication(lngApplicationID)
ON DELETE CASCADE
ON UPDATE CASCADE,
lngDatabaseID INTEGER NOT NULL
REFERENCES tbDatabase(lngDatabaseID)
ON DELETE CASCADE
ON UPDATE CASCADE,
.);
First of all keep in mind that this is a sample DDL, not the real one.
OK so basically this is keeping track of the server's application and
databases and sometimes an Application connects to a DB for storage
etc.
I have database(s) that are running on server(s) that connect to
application(s) which may be running on different server(s). So when I
delete a server record I want all the applications and databases
running on this particular server deleted AND any connections deleted
(since the DB or App does not exist anymore). How can I do this. If the
only way is to do it with triggers then please show me the code for it.
Thanx a lot!|||
--CELKO-- wrote:
> Other SQLs do not have this problem, so you might want to switch to one
> of them.
Yep, even ACCESS can do this correctly! They've fixed it for SQL Server
2005, I understand.|||It is a weak design because it lacks abstraction and thereby limits your abi
lity
to make design changes. By linking directly to the source tables, you have n
o
ability to effective business rules in the backend without using something l
ike
triggers. Your example is quite illustrative of this design's short comings.
You
want to effect a business rule (delete X when I delete Y). This would be a s
nap
if you were doing all of your deletes through stored procedures which shield
the
UI to some degree from the backend structure. However, since there is no
abstraction between the UI and the backend database, you are left with a han
dful
of ugly choices (e.g. triggers, or UI logic).
A good indication of a weak design is one that prevents you from easily
effecting a simple change.
Thomas
"neuf38" <mneufeld@.gmail.com> wrote in message
news:1121878593.785146.3490@.g44g2000cwa.googlegroups.com...
> Why is this a weak design? Please Explain.
> Is there anything wrong with depending on cascading deletes and
> updates?
> Also, how would I create a stored procedure or a trigger to
> automatically delete the corresponding rows in tableB tableC and tableD
> when I delete a row in tableA?
>|||> Is there anything wrong with depending on cascading deletes and
> updates?
YES!!!!! Cascading referential actions are problematic for a number of
reasons, but the main one involves locks, concurrency, and deadlocks.
Cascading updates and deletes complicate deadlock resolution because the
order in which locks are obtained for a cascading referential action is
undefined. From a programmer's perspective, all locks in the cascade chain
are obtained at the same time. While the problem is more prevalent with
cascading updates, the same principles apply to a cascading delete. The way
to minimize deadlocks is to always obtain locks in the same order in every
transaction. Cascading referential actions limit the control you as a
programmer have in specifying that order. I never use them. I use
surrogate keys so I don't have to worry about cascading updates, and I
explicitly code deletes either in a trigger, or in a stored procedure. That
way I have complete control over the order in which resources are locked,
and I only have to worry about deadlocks caused by a malformed execution
plan. In those very rare cases, I can manipulate the execution plan by
specifying optimizer hints.
CREATE TRIGGER tIOD_tableA ON tableA INSTEAD OF DELETE AS
BEGIN
DELETE tableD
FROM
(SELECT keyD FROM deleted JOIN tableB ON (tableB.keyA =
deleted.keyA)
UNION SELECT keyD FROM deleted JOIN tableC ON (tableC.keyA =
deleted.keyA))) allD
WHERE tableD.keyD = allD.keyD
DELETE tableB FROM deleted WHERE tableB.keyA = deleted.keyA
DELETE tableC FROM deleted WHERE tableC.keyA = deleted.keyA
DELETE tableA FROM deleted WHERE tableA.keyA = deleted.keyA
END
GO
"neuf38" <mneufeld@.gmail.com> wrote in message
news:1121878593.785146.3490@.g44g2000cwa.googlegroups.com...
> Why is this a weak design? Please Explain.
> Is there anything wrong with depending on cascading deletes and
> updates?
> Also, how would I create a stored procedure or a trigger to
> automatically delete the corresponding rows in tableB tableC and tableD
> when I delete a row in tableA?
>
Wednesday, March 7, 2012
CHECKSUM , produces same hash for two different inputs. is this right?
Hi,
We are using binary_checksum in some of instead of update trigger. The problem came into the knowledge when update falied without raising any error. We came to know after research that checksum returns same number for two different inputs and thats why update failed.
We are using following type of inside the trigger.
UPDATE [dbo].[Hospital]
SET
[HospitalID]= I.[HospitalID],
[Name]= I.[Name],
[HospitalNumber]= I.[HospitalNumber],
[ServerName] = I.[ServerName],
[IsAuthorized]= I.[IsAuthorized],
[IsAlertEnabled]= I.[IsAlertEnabled],
[AlertStartDate]= I.[AlertStartDate],
[AlertEndDate]= I.[AlertEndDate],
[IsTraining]= I.[IsTraining],
[TestMessageInterval]= I.[TestMessageInterval],
[DelayAlertTime]= I.[DelayAlertTime],
[IsDelayMessageAlert]= I.[IsDelayMessageAlert],
[IsTestMessageAlert]= I.[IsTestMessageAlert],
[IsUnAuthorizedMessageAlert]= I.[IsUnAuthorizedMessageAlert],
[IsWANDownAlert]= I.[IsWANDownAlert],
[IsWANUpAlert]= I.[IsWANUpAlert],
[CreateUserID]= Hospital.[CreateUserID],
[CreateWorkstationID]= Hospital.[CreateWorkstationID],
[CreateDate]= Hospital.[CreateDate] ,
/* record created date is never updated */
[ChangeUserID]= suser_name(),
[ChangeWorkstationID]= host_name(),
[ChangeDate]= getdate() ,
/* Updating the record modified field to now */
[CTSServerID]= I.[CTSServerID]
FROM inserted i
WHERE
i.[HospitalID]= Hospital.[HospitalID]
AND binary_checksum(
Hospital.[HospitalID],
Hospital.[Name],
Hospital.[HospitalNumber],
Hospital.[ServerName],
Hospital.[IsAuthorized],
Hospital.[IsAlertEnabled],
Hospital.[AlertStartDate],
Hospital.[AlertEndDate],
Hospital.[IsTraining],
Hospital.[TestMessageInterval],
Hospital.[DelayAlertTime],
Hospital.[IsDelayMessageAlert],
Hospital.[IsTestMessageAlert],
Hospital.[IsUnAuthorizedMessageAlert],
Hospital.[IsWANDownAlert],
Hospital.[IsWANUpAlert]) !=
binary_checksum(
I.[HospitalID],
I.[Name],
I.[HospitalNumber],
I.[ServerName],
I.[IsAuthorized],
I.[IsAlertEnabled],
I.[AlertStartDate],
I.[AlertEndDate],
I.[IsTraining],
I.[TestMessageInterval],
I.[DelayAlertTime],
I.[IsDelayMessageAlert],
I.[IsTestMessageAlert],
I.[IsUnAuthorizedMessageAlert],
I.[IsWANDownAlert],
I.[IsWANUpAlert]) ;
Here is the checksum example which produces same results for two different input.
DECLARE @.V1 VARCHAR(10)
DECLARE @.V2 VARCHAR(10)
SELECT @.V1 = NULL, @.V2=NULL
SELECT binary_checksum('KKK','San Jose','1418','1418SVR ',0,1,@.V1,@.V2,0,30,180,1,0,1,1,1),
binary_checksum('KKK','San Jose','1418','1418SVR ',1,1,@.V1,@.V2,0,30,180,1,1,1,1,1)
Lookat the two binary_checksum above, they are different and should not match, but they both return same value.
Can someone please provide some info on these.
Did any one looked at this? I guess this is a very very critical. The checksum is used by storage engine to verify the page integrity and the checksum is stored in every page. if it is producing the same hash for two different inputs it may not verify page correctly.
|||CHECKSUMS ARE NOT UNIQUE. You cannot use checksums in the way you are trying to use them.Check out BOL under BINARY_CHECKSUM
BINARY_CHECKSUM(*), computed on any row of a table, returns the same value as long the row is not subsequently modified. BINARY_CHECKSUM(*) will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications.
and CHECKSUMIf one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
Yes, you are correct, there is a small insignificant chance the checksum may be the same on a page even though the data has changed. In large binary data, like SQL server's 64k pages, this chance is in the range of 100 million to 1.
|||
Tom,
In these case the checksum is consistently produces same hash for two different inputs. So every time when I was updating the two columns specified above, it failed. So I thought there is a bug in the code which is not detecting the changes in the input.
Anyway, your suggestion to use HashBytes is very helpful.
Thanks,
CHECKSUM , produces same hash for two different inputs. is this right?
Hi,
We are using binary_checksum in some of instead of update trigger. The problem came into the knowledge when update falied without raising any error. We came to know after research that checksum returns same number for two different inputs and thats why update failed.
We are using following type of inside the trigger.
UPDATE [dbo].[Hospital]
SET
[HospitalID]= I.[HospitalID],
[Name]= I.[Name],
[HospitalNumber]= I.[HospitalNumber],
[ServerName] = I.[ServerName],
[IsAuthorized]= I.[IsAuthorized],
[IsAlertEnabled]= I.[IsAlertEnabled],
[AlertStartDate]= I.[AlertStartDate],
[AlertEndDate]= I.[AlertEndDate],
[IsTraining]= I.[IsTraining],
[TestMessageInterval]= I.[TestMessageInterval],
[DelayAlertTime]= I.[DelayAlertTime],
[IsDelayMessageAlert]= I.[IsDelayMessageAlert],
[IsTestMessageAlert]= I.[IsTestMessageAlert],
[IsUnAuthorizedMessageAlert]= I.[IsUnAuthorizedMessageAlert],
[IsWANDownAlert]= I.[IsWANDownAlert],
[IsWANUpAlert]= I.[IsWANUpAlert],
[CreateUserID]= Hospital.[CreateUserID],
[CreateWorkstationID]= Hospital.[CreateWorkstationID],
[CreateDate]= Hospital.[CreateDate] ,
/* record created date is never updated */
[ChangeUserID]= suser_name(),
[ChangeWorkstationID]= host_name(),
[ChangeDate]= getdate() ,
/* Updating the record modified field to now */
[CTSServerID]= I.[CTSServerID]
FROM inserted i
WHERE
i.[HospitalID]= Hospital.[HospitalID]
AND binary_checksum(
Hospital.[HospitalID],
Hospital.[Name],
Hospital.[HospitalNumber],
Hospital.[ServerName],
Hospital.[IsAuthorized],
Hospital.[IsAlertEnabled],
Hospital.[AlertStartDate],
Hospital.[AlertEndDate],
Hospital.[IsTraining],
Hospital.[TestMessageInterval],
Hospital.[DelayAlertTime],
Hospital.[IsDelayMessageAlert],
Hospital.[IsTestMessageAlert],
Hospital.[IsUnAuthorizedMessageAlert],
Hospital.[IsWANDownAlert],
Hospital.[IsWANUpAlert]) !=
binary_checksum(
I.[HospitalID],
I.[Name],
I.[HospitalNumber],
I.[ServerName],
I.[IsAuthorized],
I.[IsAlertEnabled],
I.[AlertStartDate],
I.[AlertEndDate],
I.[IsTraining],
I.[TestMessageInterval],
I.[DelayAlertTime],
I.[IsDelayMessageAlert],
I.[IsTestMessageAlert],
I.[IsUnAuthorizedMessageAlert],
I.[IsWANDownAlert],
I.[IsWANUpAlert]) ;
Here is the checksum example which produces same results for two different input.
DECLARE @.V1 VARCHAR(10)
DECLARE @.V2 VARCHAR(10)
SELECT @.V1 = NULL, @.V2=NULL
SELECT binary_checksum('KKK','San Jose','1418','1418SVR ',0,1,@.V1,@.V2,0,30,180,1,0,1,1,1),
binary_checksum('KKK','San Jose','1418','1418SVR ',1,1,@.V1,@.V2,0,30,180,1,1,1,1,1)
Lookat the two binary_checksum above, they are different and should not match, but they both return same value.
Can someone please provide some info on these.
Did any one looked at this? I guess this is a very very critical. The checksum is used by storage engine to verify the page integrity and the checksum is stored in every page. if it is producing the same hash for two different inputs it may not verify page correctly.
|||CHECKSUMS ARE NOT UNIQUE. You cannot use checksums in the way you are trying to use them.Check out BOL under BINARY_CHECKSUM
BINARY_CHECKSUM(*), computed on any row of a table, returns the same value as long the row is not subsequently modified. BINARY_CHECKSUM(*) will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications.
and CHECKSUMIf one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
Yes, you are correct, there is a small insignificant chance the checksum may be the same on a page even though the data has changed. In large binary data, like SQL server's 64k pages, this chance is in the range of 100 million to 1.
|||
Tom,
In these case the checksum is consistently produces same hash for two different inputs. So every time when I was updating the two columns specified above, it failed. So I thought there is a bug in the code which is not detecting the changes in the input.
Anyway, your suggestion to use HashBytes is very helpful.
Thanks,
Saturday, February 25, 2012
Checking User & Expire of Update Possibility (by Trigger), How to?
Hi,
I have Table (RatesTable) every user can insert records to this table, and all users can see this records, this table contain the following columns:
RateID, Service, Rate, DateTime, User
Want I want is a code (trigger) in the database can do the following:
If user perform an Update request the code will check:
- if this recored inserted by the same user update command will be execute.
- if this recored inserted by other user: update command will not execute and return message.
- if more than 5 minutes passed the update command will not be execute and return message.
Yes, this can be done with a trigger but it really would be better for the update statement itself to decide whether or not the update is allowed by adding either a WHERE condition or an AND condition to the update statement to decide whether or not to allow the update.
Change the update statement from something like:
update rate
set service = @.serviceChange,
rate = @.rateChange
where rateId = @.rateRecordId
to something like
|||update rate
set service = @.serviceChange,
rate = @.rateChange
where rateId = @.rateRecordId
and user = @.currentUser
and dateTime >= dateadd (mi, -5, getdate())
Thanks Kent Waldrop Ap07, but the problem is my all program use Datasets created by Data Source Configration Wizzard, is it possible to to add your code to the Dataset Designer?
and what if I want to put this code in Trigger?
thnx again,,,
|||Any help?|||Here is a TRIGGER idea. While I agree with Kent that changing the UPDATE statement is a better option, I know from expereince that it is not always the solution that works.
The code idea below relies upon [User] being captured with the system_user system function (domain/username).
Code Snippet
CREATE TRIGGER tr_RatesTable_U_UserOnly
ON RatesTable
FOR UPDATE
AS
IF @.@.ROWCOUNT = 0
RETURN
DECLARE @.User varchar(50)
IF EXISTS
( SELECT *
FROM inserted
WHERE User <> system_user
)
BEGIN
ROLLBACK
RAISERROR('Cannot UpDate This Record', 16, 1)
RETURN
END
GO
Hi,
I am not familiar with the Wizard stuff, but I would expect that it does not cover holding the logic for that. But should have a look on the resulting queries the wizard produces, maybe you are able to tweak the Update statement to cover your logic. Anyway, using a trigger could be another option:
CREATE TRIGGER TRG_UPD_SomeTable
ON SomeTable
FOR UPDATE
AS
BEGIN
IF NOT EXISTS(SELECT * From INSERTED WHERE User = SUSER_NAME AND DateTime <= DATEADD(s,-5,GETDATE()))
RAISERROR('Update not allowed',16,1)
END
HTH, jens K. Suessmeyer.
http://www.sqlserver2005.de
Checking User & Expire of Update Possibility (by Trigger), How to?
Hi,
I have Table (RatesTable) every user can insert records to this table, and all users can see this records, this table contain the following columns:
RateID, Service, Rate, DateTime, User
Want I want is a code (trigger) in the database can do the following:
If user perform an Update request the code will check:
- if this recored inserted by the same user update command will be execute.
- if this recored inserted by other user: update command will not execute and return message.
- if more than 5 minutes passed the update command will not be execute and return message.
Yes, this can be done with a trigger but it really would be better for the update statement itself to decide whether or not the update is allowed by adding either a WHERE condition or an AND condition to the update statement to decide whether or not to allow the update.
Change the update statement from something like:
update rate
set service = @.serviceChange,
rate = @.rateChange
where rateId = @.rateRecordId
to something like
|||update rate
set service = @.serviceChange,
rate = @.rateChange
where rateId = @.rateRecordId
and user = @.currentUser
and dateTime >= dateadd (mi, -5, getdate())
Thanks Kent Waldrop Ap07, but the problem is my all program use Datasets created by Data Source Configration Wizzard, is it possible to to add your code to the Dataset Designer?
and what if I want to put this code in Trigger?
thnx again,,,
|||Any help?|||Here is a TRIGGER idea. While I agree with Kent that changing the UPDATE statement is a better option, I know from expereince that it is not always the solution that works.
The code idea below relies upon [User] being captured with the system_user system function (domain/username).
Code Snippet
CREATE TRIGGER tr_RatesTable_U_UserOnly
ON RatesTable
FOR UPDATE
AS
IF @.@.ROWCOUNT = 0
RETURN
DECLARE @.User varchar(50)
IF EXISTS
( SELECT *
FROM inserted
WHERE User <> system_user
)
BEGIN
ROLLBACK
RAISERROR('Cannot UpDate This Record', 16, 1)
RETURN
END
GO
Hi,
I am not familiar with the Wizard stuff, but I would expect that it does not cover holding the logic for that. But should have a look on the resulting queries the wizard produces, maybe you are able to tweak the Update statement to cover your logic. Anyway, using a trigger could be another option:
CREATE TRIGGER TRG_UPD_SomeTable
ON SomeTable
FOR UPDATE
AS
BEGIN
IF NOT EXISTS(SELECT * From INSERTED WHERE User = SUSER_NAME AND DateTime <= DATEADD(s,-5,GETDATE()))
RAISERROR('Update not allowed',16,1)
END
HTH, jens K. Suessmeyer.
http://www.sqlserver2005.de
Friday, February 24, 2012
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.
|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.
|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.
|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.
Checking to see if a record exists and if so update else insert
For those of you wishing to build a package that determines if a source row exists in the destination and if so update it else insert it, this link is for you.
http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
Thanks Jamie!
If you want to do a similar concept to Jamie's blog post above, but with the Konesan's Checksum Transformation to quickly compare MANY fields, you can visit here:
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
PhilThis is exactly what I was looking for.
I appreciate that.
Fahad|||It's an awesome article - and I used this method in my 1st SSIS package.
However, if number of records is high (over 1000) the OLE Update Command performs slowly. So in that case you might better off loading source data to staging table and creating Stored Procedure which Updates then Inserts (and Deletes) data in the destination table.
Using Lookup transform in the way described in article is quite efficient (compared to other methods like merge join) - and can be useful in a variety of situations|||TheViewMaster,
Yes, very valid points. Each developer will have to decide based on the pros and cons of each method.
Personally, I wouldn't go to the extreme that you did, but would rather dump just the updates to a staging table and then outside of the data flow use an Execute SQL task to perform the batch update. I don't want to spend the cost to insert the data twice (once into staging table, and then again into the destination table using your idea).|||Can somebody point me to a simple example of this.
I have a flat file and I want to check if the record exist in the file exist in the table, if it does I want to update and if it does not I want to insert.|||
This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records ?
Thanks!
|||I have cloned the example above and it does not seem to be picking up the new or changed records. Is there any way to troubleshoot this.|||I use OLE DB Command update the records. Thanks!|||Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
|||
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way.. ?
regards
Sudhir Kesharwani
You need to likely send your changed records to a temporary table and then perform a batch update using an Execute SQL Task on the control flow. Using the OLE DB Command transformation isn't very efficient for high numbers of rows.|||
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance? How did you conclude the SSIS way is much faster?
Leonce
|||Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at?
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
|||I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
Leonce
|||fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred? What would typical values be for a table with 100000 rows? Does it dependent on the row size?
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates? I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
|||Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this? Do you rely on NOLOCK hint too? Thanks for your help in advance.