files before matching records against existing ones in the database:
UPDATE SampleSourceArchive
SET CompanyName = LTRIM(RTRIM(ISNULL(REPLACE(CompanyName,'
"',''),
''))),
CompanyPhoneArea = LTRIM(RTRIM(ISNULL(REPLACE(CompanyPhoneA
rea,'"',''),
''))),
CompanyPhoneNumber =
LTRIM(RTRIM(ISNULL(REPLACE(CompanyPhoneN
umber,'"',''), ''))),
CompanyAddress1 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s1,'"',''),
''))),
CompanyAddress2 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s2,'"',''),
''))),
CompanyAddress3 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s3,'"',''),
''))),
City = LTRIM(RTRIM(ISNULL(REPLACE(City,'"',''), ''))),
StateProvince = LTRIM(RTRIM(ISNULL(REPLACE(StateProvince
,'"',''),
''))),
PostalCode = LTRIM(RTRIM(ISNULL(REPLACE(PostalCode,'"',''), ''))),
Country = LTRIM(RTRIM(ISNULL(REPLACE(Country,'"',''), ''))),
SubIndustryCode = LTRIM(RTRIM(ISNULL(REPLACE(SubIndustryCo
de,'"',''),
''))),
SicCode = LTRIM(RTRIM(ISNULL(REPLACE(SicCode,'"',''), ''))),
RdhType = LTRIM(RTRIM(ISNULL(REPLACE(RdhType,'"',''), ''))),
RdhID = LTRIM(RTRIM(ISNULL(REPLACE(RdhID,'"',''), ''))),
WWCustomerNumber = LTRIM(RTRIM(ISNULL(REPLACE(WWCustomerNum
ber,'"',''),
''))),
DunsNumber = LTRIM(RTRIM(ISNULL(REPLACE(DunsNumber,'"',''), ''))),
EstablishmentSize =
LTRIM(RTRIM(ISNULL(REPLACE(Establishment
Size,'"',''), ''))),
ContactTitle = LTRIM(RTRIM(ISNULL(REPLACE(ContactTitle,
'"',''), ''))),
FirstName = LTRIM(RTRIM(ISNULL(REPLACE(FirstName,'"',''), ''))),
FullName = LTRIM(RTRIM(ISNULL(REPLACE(FullName,'"',''), ''))),
ContactJob = LTRIM(RTRIM(ISNULL(REPLACE(ContactJob,'"',''), ''))),
ContactPhoneNumber =
LTRIM(RTRIM(ISNULL(REPLACE(ContactPhoneN
umber,'"',''), ''))),
IbmContactNumber = LTRIM(RTRIM(ISNULL(REPLACE(IbmContactNum
ber,'"',''),
''))),
ProductVendor = LTRIM(RTRIM(ISNULL(REPLACE(ProductVendor
,'"',''),
''))),
ProductName = LTRIM(RTRIM(ISNULL(REPLACE(ProductName,'
"',''), ''))),
SoftwareVersion = LTRIM(RTRIM(ISNULL(REPLACE(SoftwareVersi
on,'"',''),
''))),
SoftwareOS = LTRIM(RTRIM(ISNULL(REPLACE(SoftwareOS,'"',''), ''))),
HardwareSeries = LTRIM(RTRIM(ISNULL(REPLACE(HardwareSerie
s,'"',''),
''))),
HardwareProductGroup =
LTRIM(RTRIM(ISNULL(REPLACE(HardwareProdu
ctGroup,'"',''), ''))),
SoftwareProductID =
LTRIM(RTRIM(ISNULL(REPLACE(SoftwareProdu
ctID,'"',''), ''))),
SoftwareComponentID =
LTRIM(RTRIM(ISNULL(REPLACE(SoftwareCompo
nentID,'"',''), ''))),
SoftwarePartNumber =
LTRIM(RTRIM(ISNULL(REPLACE(SoftwarePartN
umber,'"',''), ''))),
HardwareModel = LTRIM(RTRIM(ISNULL(REPLACE(HardwareModel
,'"',''),
''))),
HardwareMachineType =
LTRIM(RTRIM(ISNULL(REPLACE(HardwareMachi
neType,'"',''), ''))),
ContactEmail = LTRIM(RTRIM(ISNULL(REPLACE(ContactEmail,
'"',''), ''))),
CompanyNameLong = LTRIM(RTRIM(ISNULL(REPLACE(CompanyNameLo
ng,'"',''),
''))),
InterviewLanguage =
LTRIM(RTRIM(ISNULL(REPLACE(InterviewLang
uage,'"',''), ''))),
Salutation = LTRIM(RTRIM(ISNULL(REPLACE(Salutation,'"',''), ''))),
CompanyNameAbbreviation =
LTRIM(RTRIM(ISNULL(REPLACE(CompanyNameAb
breviation,'"',''), ''))),
UniversalCountryCode = LTRIM(RTRIM(ISNULL(REPLACE(EmailFlag,'"',''),
''))),
EmailFlag = LTRIM(RTRIM(ISNULL(REPLACE(EmailFlag,'"',''), ''))),
SurveyFlag = LTRIM(RTRIM(ISNULL(REPLACE(SurveyFlag,'"',''), '')))
WHERE SampleSourceKey = @.sampleSourceKey
Each of the columns is NVARCHAR(255) Don't be alarmed, the records are never
actually that big. The nature of the data is that certain records use
certain columns (not my fault).
Business Rule:
Each value has to get trimmed, replace nulls with ZLS's and drop any
existing double-quotes.
My Complaint:
Since a function gets invoked once for every record, then if there are 42
columns getting updated, and each one requires 4 nested function calls, that
seems to me that each record getting updated needs 168 function calls, and a
@.sampleSourceKey in the where clause could make it touch anywhere from a
half million records to over a million records.
My Worse Complaint:
The have just changed the business rules to also require me to remove any of
these characters from every value in the SampleSource block:
10, 13, 34, 39, 46, 145, 146, 147, 148
I *really* don't want to make this uglier than it already is, so I have
thought of three different ways to do it:
1.
ltrim(rtrim(replace(replace(replace(repl
ace(replace(...n...(isnull(column,
''))))))))) for each column in the set clause (I REALLY don't want to do
this)
2. Make a CleanGarbage(columnvalue) UDF that basically does #1 above, but at
least it encapsulates the mess (still has gobbles of function call
overhead)
3. Use sp_OACreate to get an instance of the Scripting.RegExp COM-based
regular expression object, hold onto its handle, and pass the handle and
value to be cleaned to a UDF. In this example all the UDF calls that run
the regular expression would be using the same instance of the same object,
have cleaner code, be more maintainable because when the character list to
be replaced changes, I'd only have to modify the regex pattern, and wouldn't
have the vast number of calls as #1 & #2, but on the other hand, this is
pretty obscure technique and I'm afraid that other people that come after me
might choke on it.
What are the comments about my three ideas? Feel free to suggest something
I haven't thought of -- I'd really like to have this be cleaner, and faster,
if possible. Speed is actually not a requirement, as this is about 3
percent of the work of this batch, and the stuff runs for hours at a time
because of other reasons.
--
Peace & happy computing,
Mike Labosh, MCSD
"Escriba coda ergo sum." -- vbSenseiIs DTS an option in this situation?
Rick Sawtell
MCT, MCSD, MCDBA|||> 1.
> ltrim(rtrim(replace(replace(replace(repl
ace(replace(...n...(isnull(column,
> ''))))))))) for each column in the set clause (I REALLY don't want to do
> this)
urgg, i never ever want to maintain that !
> 2. Make a CleanGarbage(columnvalue) UDF that basically does #1 above, but
> at least it encapsulates the mess (still has gobbles of function call
> overhead)
As text base functions are very slow in 2000 that wouldnt make it better, I
working on a exmaple to comapre the effeciency of UDT and SQL Server 2005
.NET Assemblies, just checkout my sites from time to time, they will be on
in the next few w

your data.
3. Use sp_OACreate to get an instance of the Scripting.RegExp COM-based
> regular expression object, hold onto its handle, and pass the handle and
Wait a minute, that wouldnt be really nice at all. :-(
Nor of your suggestions really make me happy, if you are doing such
"immense" inserts in the database, I would write some code in .NET with some
kind of tracing. Using a serverside cursor and scrolling down the pages with
skipping the rows that doesnt really work with your translation and mark
them with some kind of (bit)flag that you can clean if some malicious data
was inserted into the database and coulnt be transformed. This application
could be run via commandprompt / winApp / WebApp or service.
If you cant code .NET well have to find something else.
HTH, Jens Suessmeyer.
http://www.sqlserver2005.de
> I *really* don't want to make this uglier than it already is, so I have
> thought of three different ways to do it:
> 1.
> ltrim(rtrim(replace(replace(replace(repl
ace(replace(...n...(isnull(column,
> ''))))))))) for each column in the set clause (I REALLY don't want to do
> this)
> 2. Make a CleanGarbage(columnvalue) UDF that basically does #1 above, but
> at least it encapsulates the mess (still has gobbles of function call
> overhead)
> 3. Use sp_OACreate to get an instance of the Scripting.RegExp COM-based
> regular expression object, hold onto its handle, and pass the handle and
> value to be cleaned to a UDF. In this example all the UDF calls that run
> the regular expression would be using the same instance of the same
> object, have cleaner code, be more maintainable because when the character
> list to be replaced changes, I'd only have to modify the regex pattern,
> and wouldn't have the vast number of calls as #1 & #2, but on the other
> hand, this is pretty obscure technique and I'm afraid that other people
> that come after me might choke on it.
> What are the comments about my three ideas? Feel free to suggest
> something I haven't thought of -- I'd really like to have this be cleaner,
> and faster, if possible. Speed is actually not a requirement, as this is
> about 3 percent of the work of this batch, and the stuff runs for hours at
> a time because of other reasons.
> --
> Peace & happy computing,
> Mike Labosh, MCSD
> "Escriba coda ergo sum." -- vbSensei
>|||> Is DTS an option in this situation?
I personally am not that well versed in DTS. I would have to read up on
some of its different transform features.
Also in the end, the package would have to be invokable either from a stored
procedure, or .NET. Can that be done?
--
Peace & happy computing,
Mike Labosh, MCSD
"Escriba coda ergo sum." -- vbSensei
"Rick Sawtell" <r_sawtell@.hotmail.com> wrote in message
news:%23qP7bYNVFHA.4092@.TK2MSFTNGP12.phx.gbl...
> Rick Sawtell
> MCT, MCSD, MCDBA
>|||I would rather code an assembly which can be used by calling it from a
stored procedure (via XP_CMDSHELL) or just at thta time as you want to (via
WInApp, etc.). In DTS the great deal is doing things parallel, theres no
user for that in here because it will be updated as a whole.
HTH, Jens Suessmeyer.
http://www.sqlsever2005.de
--
"Mike Labosh" <mlabosh@.hotmail.com> schrieb im Newsbeitrag
news:uQcxufNVFHA.4092@.TK2MSFTNGP12.phx.gbl...
> I personally am not that well versed in DTS. I would have to read up on
> some of its different transform features.
> Also in the end, the package would have to be invokable either from a
> stored procedure, or .NET. Can that be done?
> --
> Peace & happy computing,
> Mike Labosh, MCSD
> "Escriba coda ergo sum." -- vbSensei
> "Rick Sawtell" <r_sawtell@.hotmail.com> wrote in message
> news:%23qP7bYNVFHA.4092@.TK2MSFTNGP12.phx.gbl...
>|||I prefer to do all this in a pre-processing stage on the application side,
often via a .NET app. It has much better text processing/manipulation
capabilities.
Based on doing it in SQL 2K, I would think any of your possible solutions
would run pretty poorly... sp_OACreate especially... That seems like some
serious overkill. If you're stuck with doing this SQL Server-side, might as
well put it in a SP and/or UDF to at least keep some level of
maintainability... Can you imagine the mess in trying to update your #1
solution below when they decide they want to eliminate all ASCII character
code 9's from the data next w

"Mike Labosh" <mlabosh@.hotmail.com> wrote in message
news:OwFpG9MVFHA.2664@.TK2MSFTNGP15.phx.gbl...
>I have the following SP that is the first step of cleaning bulk inserted
>files before matching records against existing ones in the database:
> UPDATE SampleSourceArchive
> SET CompanyName = LTRIM(RTRIM(ISNULL(REPLACE(CompanyName,'
"',''),
> ''))),
> CompanyPhoneArea =
> LTRIM(RTRIM(ISNULL(REPLACE(CompanyPhoneA
rea,'"',''), ''))),
> CompanyPhoneNumber =
> LTRIM(RTRIM(ISNULL(REPLACE(CompanyPhoneN
umber,'"',''), ''))),
> CompanyAddress1 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s1,'"',''),
> ''))),
> CompanyAddress2 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s2,'"',''),
> ''))),
> CompanyAddress3 = LTRIM(RTRIM(ISNULL(REPLACE(CompanyAddres
s3,'"',''),
> ''))),
> City = LTRIM(RTRIM(ISNULL(REPLACE(City,'"',''), ''))),
> StateProvince = LTRIM(RTRIM(ISNULL(REPLACE(StateProvince
,'"',''),
> ''))),
> PostalCode = LTRIM(RTRIM(ISNULL(REPLACE(PostalCode,'"',''), ''))),
> Country = LTRIM(RTRIM(ISNULL(REPLACE(Country,'"',''), ''))),
> SubIndustryCode = LTRIM(RTRIM(ISNULL(REPLACE(SubIndustryCo
de,'"',''),
> ''))),
> SicCode = LTRIM(RTRIM(ISNULL(REPLACE(SicCode,'"',''), ''))),
> RdhType = LTRIM(RTRIM(ISNULL(REPLACE(RdhType,'"',''), ''))),
> RdhID = LTRIM(RTRIM(ISNULL(REPLACE(RdhID,'"',''), ''))),
> WWCustomerNumber =
> LTRIM(RTRIM(ISNULL(REPLACE(WWCustomerNum
ber,'"',''), ''))),
> DunsNumber = LTRIM(RTRIM(ISNULL(REPLACE(DunsNumber,'"',''), ''))),
> EstablishmentSize =
> LTRIM(RTRIM(ISNULL(REPLACE(Establishment
Size,'"',''), ''))),
> ContactTitle = LTRIM(RTRIM(ISNULL(REPLACE(ContactTitle,
'"',''),
> ''))),
> FirstName = LTRIM(RTRIM(ISNULL(REPLACE(FirstName,'"',''), ''))),
> FullName = LTRIM(RTRIM(ISNULL(REPLACE(FullName,'"',''), ''))),
> ContactJob = LTRIM(RTRIM(ISNULL(REPLACE(ContactJob,'"',''), ''))),
> ContactPhoneNumber =
> LTRIM(RTRIM(ISNULL(REPLACE(ContactPhoneN
umber,'"',''), ''))),
> IbmContactNumber =
> LTRIM(RTRIM(ISNULL(REPLACE(IbmContactNum
ber,'"',''), ''))),
> ProductVendor = LTRIM(RTRIM(ISNULL(REPLACE(ProductVendor
,'"',''),
> ''))),
> ProductName = LTRIM(RTRIM(ISNULL(REPLACE(ProductName,'
"',''), ''))),
> SoftwareVersion = LTRIM(RTRIM(ISNULL(REPLACE(SoftwareVersi
on,'"',''),
> ''))),
> SoftwareOS = LTRIM(RTRIM(ISNULL(REPLACE(SoftwareOS,'"',''), ''))),
> HardwareSeries = LTRIM(RTRIM(ISNULL(REPLACE(HardwareSerie
s,'"',''),
> ''))),
> HardwareProductGroup =
> LTRIM(RTRIM(ISNULL(REPLACE(HardwareProdu
ctGroup,'"',''), ''))),
> SoftwareProductID =
> LTRIM(RTRIM(ISNULL(REPLACE(SoftwareProdu
ctID,'"',''), ''))),
> SoftwareComponentID =
> LTRIM(RTRIM(ISNULL(REPLACE(SoftwareCompo
nentID,'"',''), ''))),
> SoftwarePartNumber =
> LTRIM(RTRIM(ISNULL(REPLACE(SoftwarePartN
umber,'"',''), ''))),
> HardwareModel = LTRIM(RTRIM(ISNULL(REPLACE(HardwareModel
,'"',''),
> ''))),
> HardwareMachineType =
> LTRIM(RTRIM(ISNULL(REPLACE(HardwareMachi
neType,'"',''), ''))),
> ContactEmail = LTRIM(RTRIM(ISNULL(REPLACE(ContactEmail,
'"',''),
> ''))),
> CompanyNameLong = LTRIM(RTRIM(ISNULL(REPLACE(CompanyNameLo
ng,'"',''),
> ''))),
> InterviewLanguage =
> LTRIM(RTRIM(ISNULL(REPLACE(InterviewLang
uage,'"',''), ''))),
> Salutation = LTRIM(RTRIM(ISNULL(REPLACE(Salutation,'"',''), ''))),
> CompanyNameAbbreviation =
> LTRIM(RTRIM(ISNULL(REPLACE(CompanyNameAb
breviation,'"',''), ''))),
> UniversalCountryCode = LTRIM(RTRIM(ISNULL(REPLACE(EmailFlag,'"',''),
> ''))),
> EmailFlag = LTRIM(RTRIM(ISNULL(REPLACE(EmailFlag,'"',''), ''))),
> SurveyFlag = LTRIM(RTRIM(ISNULL(REPLACE(SurveyFlag,'"',''), '')))
> WHERE SampleSourceKey = @.sampleSourceKey
> Each of the columns is NVARCHAR(255) Don't be alarmed, the records are
> never actually that big. The nature of the data is that certain records
> use certain columns (not my fault).
> Business Rule:
> Each value has to get trimmed, replace nulls with ZLS's and drop any
> existing double-quotes.
> My Complaint:
> Since a function gets invoked once for every record, then if there are 42
> columns getting updated, and each one requires 4 nested function calls,
> that seems to me that each record getting updated needs 168 function
> calls, and a @.sampleSourceKey in the where clause could make it touch
> anywhere from a half million records to over a million records.
> My Worse Complaint:
> The have just changed the business rules to also require me to remove any
> of these characters from every value in the SampleSource block:
> 10, 13, 34, 39, 46, 145, 146, 147, 148
> I *really* don't want to make this uglier than it already is, so I have
> thought of three different ways to do it:
> 1.
> ltrim(rtrim(replace(replace(replace(repl
ace(replace(...n...(isnull(column,
> ''))))))))) for each column in the set clause (I REALLY don't want to do
> this)
> 2. Make a CleanGarbage(columnvalue) UDF that basically does #1 above, but
> at least it encapsulates the mess (still has gobbles of function call
> overhead)
> 3. Use sp_OACreate to get an instance of the Scripting.RegExp COM-based
> regular expression object, hold onto its handle, and pass the handle and
> value to be cleaned to a UDF. In this example all the UDF calls that run
> the regular expression would be using the same instance of the same
> object, have cleaner code, be more maintainable because when the character
> list to be replaced changes, I'd only have to modify the regex pattern,
> and wouldn't have the vast number of calls as #1 & #2, but on the other
> hand, this is pretty obscure technique and I'm afraid that other people
> that come after me might choke on it.
> What are the comments about my three ideas? Feel free to suggest
> something I haven't thought of -- I'd really like to have this be cleaner,
> and faster, if possible. Speed is actually not a requirement, as this is
> about 3 percent of the work of this batch, and the stuff runs for hours at
> a time because of other reasons.
> --
> Peace & happy computing,
> Mike Labosh, MCSD
> "Escriba coda ergo sum." -- vbSensei
>|||Thanks All.
After a "team meeting", the boss has decided to go with my #1
ltrim(rtrim(replace(replace(replace...
Because, "It's the only thing you're going to get a solution on my the end
of today. Just load the SP into notepad and change the expression with a
find-replace."
I have very well documented my concerns. This is one of those places where
there solution is to do it fast and dumb, and if it runs like death, we'll
just throw a bigger server at it. LOL!
Sometimes there just aren't enough drugs.
Peace & happy computing,
Mike Labosh, MCSD
"Escriba coda ergo sum." -- vbSensei|||Reminds me of the relationshsip between Dilbert and his boss --> Just do it
quick and dirty, you will have enough time for rest of your life to maintain
the errors ;-)
http://www.dailydilbert.com
TWH, Jens Suessmeyer.
"Mike Labosh" <mlabosh@.hotmail.com> schrieb im Newsbeitrag
news:eV5wsEOVFHA.3944@.tk2msftngp13.phx.gbl...
> Thanks All.
> After a "team meeting", the boss has decided to go with my #1
> ltrim(rtrim(replace(replace(replace...
> Because, "It's the only thing you're going to get a solution on my the end
> of today. Just load the SP into notepad and change the expression with a
> find-replace."
> I have very well documented my concerns. This is one of those places
> where there solution is to do it fast and dumb, and if it runs like death,
> we'll just throw a bigger server at it. LOL!
> Sometimes there just aren't enough drugs.
> --
> Peace & happy computing,
> Mike Labosh, MCSD
> "Escriba coda ergo sum." -- vbSensei
>|||The best managers tend to handle the business rules and leave the
implementation details to the developers. After all you don't see the
President sitting in the middle of a battlefield calculating coordinates for
outbound artillery shells.
"Mike Labosh" <mlabosh@.hotmail.com> wrote in message
news:eV5wsEOVFHA.3944@.tk2msftngp13.phx.gbl...
> Thanks All.
> After a "team meeting", the boss has decided to go with my #1
> ltrim(rtrim(replace(replace(replace...
> Because, "It's the only thing you're going to get a solution on my the end
> of today. Just load the SP into notepad and change the expression with a
> find-replace."
> I have very well documented my concerns. This is one of those places
> where there solution is to do it fast and dumb, and if it runs like death,
> we'll just throw a bigger server at it. LOL!
> Sometimes there just aren't enough drugs.
> --
> Peace & happy computing,
> Mike Labosh, MCSD
> "Escriba coda ergo sum." -- vbSensei
>
No comments:
Post a Comment