Sunday, March 25, 2012

Cleaning up bad Oracle 8i SQL; made it worse?

Hi a bit of a nagging query here. A little background I recently became employed at a company that uses SQR to do a lot of reporting on our Oracle 8i DBMS. Unfortunately, a lot of these reports were written by people with little/no SQL knowledge and so they are ugly (to say the least): they are all procedural with a SELECT * FROM user followed by different subroutines to execute queries to gather address and other data (obviously a JOIN would do!), a subroutine to determine if a user should be selected (instead of a where clause on the SQL statement!).

In any rate, Ive been tasked to help clean some of these up already Ive turned some 5 hour reports into 5 minute reports by removing the procedural stuff.

Ive come across a cross-tab type report that I have successfully converted but I dont know enough of 8is OLAP tricks (if there are any?) to help optimize the query.

The application is a packaged app so I cant modify the tables/views (I dont think we can add indexes or views either), nor for security reasons can I give out the DDL or exact SQL, but I can neuter the tables down to the important part, and really building the SQL doesnt need the entire structure anyway.

In a nutshell the tables are:

User stores user information.
User{ id, name, }

Type stores user type information. Should a user have more than one type code the priority can be used to determine which type is most significant. For example, if a user were both Admin and Guest, Admin would have a priority of 1 whereas Guest may be 10, so Admin will be chosen as their type.
Type{ code, name, priority }

Ties a user to a type. Users can be more than one type (as described above).
UserType{ id, code }

Address{ id, user_id }
Phone{ id, user_id }
Email{ id, user_id }

You get the point. All of the child tables relate back to the parent table. What were generating is something like this:

User Type Address Phone Email
-----------
Admin 123,222 80,000 90,000
SomeType 22,222 12,000 12,022
Etc.

So, for each user type, how many users of that type have an address record, how many have a phone record, etc. These are not correlated (e.g. I dont care if they have an address AND a phone, etc.). Also (here is where the priority comes into play) for the purposes of this report only count the user as being of the type considered if that type is their most significant type:

SELECT *
FROM usertype
WHERE user_id = 123
AND priority = ( SELECT MIN( priority )
FROM usertype
WHERE user_id = 123 )

So the query I have formulated looks something like this:

SELECT name,
( SELECT COUNT( DISTINCT user_id )
FROM address
WHERE user_id IN ( SELECT user_id
FROM user u1,
usertype ut2
WHERE ut2.code = ut1.code
AND ut2.priority = ( SELECT MIN( priority )
FROM usertype
WHERE user_id = u1.user_id )
)
) AS Address,
... AS Phone, -- etc.
FROM type


As you can see, it is nasty. The SQL to get users who are of a particular type is ugly and I dont like running it for each additional cross-tabbed column I add (Address, Phone, Email, etc.)

So I am wondering if there is a better way?

I think I could use a temp table to insert the user_id, code combo into so I can just join to that. The table could get quite large though and I am not sure if I can create an index on a temp table (anyone?).

Anyone? Thanks!

EDIT: Fixed tablenameOnly a small improvement (maybe):

SELECT t.name,
( SELECT COUNT( DISTINCT a.user_id )
FROM address a,
usertype ut2
WHERE a.user_id = ut2.user_id
AND ut2.code = t.code
AND ut2.priority = ( SELECT MIN( priority )
FROM usertype
WHERE user_id = ut2.user_id
)
) AS Address,
... AS Phone, -- etc.
FROM type t

i.e.
1) no need to join to table USER
2) join to usertype (ut2) rather than subquery
3) main query is on table TYPE not USERTYPE

A more radical rewrite which may work is:

SELECT t.name,
SUM( CASE WHEN a.user_id IS NULL THEN 0 ELSE 1 END ) AS Address,
SUM( CASE WHEN p.user_id IS NULL THEN 0 ELSE 1 END ) AS Phone,
...
FROM type t,
user_type ut,
address a,
phone p,
...
WHERE ut.code = t.code
AND ut.priority = ( SELECT MIN( ut2.priority )
FROM usertype ut2
WHERE ut2.user_id = ut.user_id
)
AND a.userid (+)= ut.userid
AND p.userid (+)= ut.userid
GROUP BY t.name;|||Thanks for the reply Tony!

3) main query on table TYPE not USERTYPE
That was a bug! :)

Have to run will check the query in a bit!|||I don't know if the last query will work.. Users can have more than one email address, so I don't want them to count as more than one user... Am I making sense? I just want to check for the existance of at least one email address, and then add that to the 'has email' total.|||Originally posted by MattR
I don't know if the last query will work.. Users can have more than one email address, so I don't want them to count as more than one user... Am I making sense? I just want to check for the existance of at least one email address, and then add that to the 'has email' total.
You are right: as written it relies on there only being one Address, one Phone, etc. per user. Not very useful!

Perhaps if you change the FROM clause to:

FROM type t,
user_type ut,
(select distinct userid from address) a,
(select distinct userid from phone) p,

I have no idea how fast that will run!|||You last code snippit is basically what I have that I would like to avoid. I'll make that JOIN change. I may just create another table with the joined info to avoid making it on every query.

No comments:

Post a Comment