Sunday, March 25, 2012

cleaning different formats

I have a column "currentDate" with approx 4 million rows.
The rows are composed of dates , but with a variety of formats.
For example:
"14-05-2005 14:56:32"
"20/07/2005 10:26:43"
"2006-03-26 11:21:42"
I've decided to convert all the rows into a standard format i.e dd-mm-yyyy
hh:mm:ss, and then convert the column into a datetime
The problem I have is that I am using DATEPART , which is fine , except that
I am losing the leading zero.
Taking "02/03/2005 10:26:43" as an example ,
when I do the following:
CAST(DATEPART(dd,myday) AS VARCHAR(2)) the result = "2" as opposed to "02"
,
How can I retain the leading "0" , for any DATEPART.?>I have a column "currentDate" with approx 4 million rows.
> The rows are composed of dates , but with a variety of formats.
> For example:
> "14-05-2005 14:56:32"
> "20/07/2005 10:26:43"
> "2006-03-26 11:21:42"
> I've decided to convert all the rows into a standard format i.e dd-mm-yyyy
Bravo! But that is very far from a standard format!
Why wasn't this a datetime in the first place? What application is storing
all these different formats? Are you allowing end users to just enter
whatever format they want?
Assuming these are the only three formats present, this might give you some
ideas:
SELECT CONVERT(CHAR(10), d, 120)+'T'+CONVERT(CHAR(8), d, 108)
FROM
(
SELECT d = CASE WHEN ISDate(d)=1 THEN CONVERT(DATETIME, d)
ELSE CONVERT(DATETIME, d, 103) END
FROM
(
SELECT d = '14-05-2005 14:56:32'
UNION ALL SELECT '20/07/2005 10:26:43'
UNION ALL SELECT '2006-03-26 11:21:42'
) a
) b
However, I am betting there are twenty other formats you're not mentioning.
A|||You may try this:
select right('0' + CAST(DATEPART(dd, myday) AS VARCHAR(2)), 2)
Perayu
"Jack Vamvas" <delete_this_bit_jack@.ciquery.com_delete> wrote in message
news:dtv3e8$kdf$1@.nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com...
>I have a column "currentDate" with approx 4 million rows.
> The rows are composed of dates , but with a variety of formats.
> For example:
> "14-05-2005 14:56:32"
> "20/07/2005 10:26:43"
> "2006-03-26 11:21:42"
> I've decided to convert all the rows into a standard format i.e dd-mm-yyyy
> hh:mm:ss, and then convert the column into a datetime
> The problem I have is that I am using DATEPART , which is fine , except
> that
> I am losing the leading zero.
> Taking "02/03/2005 10:26:43" as an example ,
> when I do the following:
> CAST(DATEPART(dd,myday) AS VARCHAR(2)) the result = "2" as opposed to
> "02"
> ,
> How can I retain the leading "0" , for any DATEPART.?
>|||Jack Vamvas (delete_this_bit_jack@.ciquery.com_delete) writes:
> I have a column "currentDate" with approx 4 million rows.
> The rows are composed of dates , but with a variety of formats.
> For example:
> "14-05-2005 14:56:32"
> "20/07/2005 10:26:43"
> "2006-03-26 11:21:42"
> I've decided to convert all the rows into a standard format i.e
> dd-mm-yyyy hh:mm:ss, and then convert the column into a datetime The
> problem I have is that I am using DATEPART , which is fine , except that
> I am losing the leading zero.
> Taking "02/03/2005 10:26:43" as an example ,
> when I do the following:
> CAST(DATEPART(dd,myday) AS VARCHAR(2)) the result = "2" as opposed to
> "02" ,
> How can I retain the leading "0" , for any DATEPART.?
There are a couple of variations on that theme, but it does not seem to
address your real issue anyway.
Obviously you have dateformat of dmy, in which case the format 2006-03-26
is not likely to convert to datetime.
I would suggest that you add a new column to the table, nullable. Then
you would do something like:
UPDATE tbl
SET newcol = convert(datetime, oldcol)
WHERE isdate(oldcol) = 1
go
SET DATEFORMAT ymd
go
UPDATE tbl
SET newcol = convert(datetime, oldcol)
WHERE isdate(oldcol) = 1
AND newcol IS NULL
go
SELECT * FROM tbl WHERE newcol IS NULL
-- Manuallly fix the rest?
go
ALTER TABLE tbl DROP oldcol
go
-- If column should not permit NULL.
ALTER TABLE col ALTER newcol datetime NOT NULL
One thing you would have to make an extra check for are completely far-out
date formats like MM/DD/YY (yes, there are odd corners of the world where
they use this). Not talking of a string like 05/03/02 that has a number of
interpretations.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||The Convert function will do the conversion to DateTime for you. However,
depending on the format, you will need to specify the Style parameter.
Lookup the Convert function in SQL Server Books Online. For example:
-- Italian dd-mm-yy
print convert(datetime,'14-05-2005 14:56:32',105)
May 14 2005 2:56PM
print convert(datetime,'14/05/2005 14:56:32',105)
May 14 2005 2:56PM
-- ANSI
print convert(datetime,'2005-05-14 14:56:32',102)
May 14 2005 2:56PM
Also, going forward, you will need to constrain user input into the
application to a consistent format.
"Jack Vamvas" <delete_this_bit_jack@.ciquery.com_delete> wrote in message
news:dtv3e8$kdf$1@.nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com...
>I have a column "currentDate" with approx 4 million rows.
> The rows are composed of dates , but with a variety of formats.
> For example:
> "14-05-2005 14:56:32"
> "20/07/2005 10:26:43"
> "2006-03-26 11:21:42"
> I've decided to convert all the rows into a standard format i.e dd-mm-yyyy
> hh:mm:ss, and then convert the column into a datetime
> The problem I have is that I am using DATEPART , which is fine , except
> that
> I am losing the leading zero.
> Taking "02/03/2005 10:26:43" as an example ,
> when I do the following:
> CAST(DATEPART(dd,myday) AS VARCHAR(2)) the result = "2" as opposed to
> "02"
> ,
> How can I retain the leading "0" , for any DATEPART.?
>|||"Aaron Bertrand [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:#yCoM86OGHA.2912@.tk2msftngp13.phx.gbl...
dd-mm-yyyy
> Bravo! But that is very far from a standard format!
> Why wasn't this a datetime in the first place? What application is
storing
> all these different formats? Are you allowing end users to just enter
> whatever format they want?
> Assuming these are the only three formats present, this might give you
some
> ideas:
> SELECT CONVERT(CHAR(10), d, 120)+'T'+CONVERT(CHAR(8), d, 108)
> FROM
> (
> SELECT d = CASE WHEN ISDate(d)=1 THEN CONVERT(DATETIME, d)
> ELSE CONVERT(DATETIME, d, 103) END
> FROM
> (
> SELECT d = '14-05-2005 14:56:32'
> UNION ALL SELECT '20/07/2005 10:26:43'
> UNION ALL SELECT '2006-03-26 11:21:42'
> ) a
> ) b
> However, I am betting there are twenty other formats you're not
mentioning.
> A
>
>
>
Thanks for the response.
Just to clarify , this data is inherited from a client . It is historical
data , and the different formats represent different pahases of their data
recording.
They dumped everything into a varchar column and now are requesting
different queries based on the dates.Hence the need to sort this problem
out.
The 3 formats I've presented are the only formats. The code you've presented
has givem me some ideas.|||> I've decided to convert all the rows into a standard format i.e dd-mm-yyyy
> hh:mm:ss, and then convert the column into a datetime
I think you can eliminate this intermediate step. Just convert from the
existing values into DateTime.|||Thanks
That sorted the problem out
"Perayu" <yu.he@.state.mn.us.Remove4Replay> wrote in message
news:uZ#VY86OGHA.1132@.TK2MSFTNGP10.phx.gbl...
> You may try this:
> select right('0' + CAST(DATEPART(dd, myday) AS VARCHAR(2)), 2)
> Perayu
> "Jack Vamvas" <delete_this_bit_jack@.ciquery.com_delete> wrote in message
> news:dtv3e8$kdf$1@.nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com...
dd-mm-yyyy
>|||I was having a similar issue and this fixed my problem as well! Good tip!

No comments:

Post a Comment