I'm creating a performance testing server. I'll describe my approach,
maybe the group gurus can review it and suggest improvements. Please
tell me if this is a sound approach.
I want to have global performance measurements. They should guide us
in improving the database overall responsiveness. So, if we add a
complex clustered index that speed a query, but slows a frequent
update, it should appear in the measurements.
Here are the steps I followed:
1) Backup of the production database.
2) Recording of a representative SQL Profiler trace file of the
production database
using the TSQL_Replay template. It will be used as my base
workload.
3) Restoring the backup to a Microsoft Virtual Server instance
(http://www.microsoft.com/windowsserversystem/virtualserver/ )
4) Waste a lot of time configuring login permissions and making the
database have the same
Id number of the original one so I can replay the trace file.
5) Turning on the "Enable Undo disks" option in the Virtual Hard Disk
Properties of the Virtual
server instance.
6) Replay the saved trace file, saving another trace file to gather
performance data.
7) Load the new trace file to a table and run some queries to measure
the database
performance.
8) Discard all the modifications of the virtual server instances,
Tweak the database and go to
step 6 to gather new data.
Is it a sound approach? Are there better ways to do it?
Now is my doubt: which measurements should I take from the saved
profile?
There are a lot of possible events to record in the Profiler trace,
and in each one the fields mean something different.
Here is my initial list of queries on the trace table:
This is probably my most important measure. It gives me the total
duration and CPU spent:
SELECT sum(Duration) as total_time , sum(CPU) total_cpu,
sum(Duration - CPU) as total_wait FROM trace WHERE EventClass = 12 -- 12 is SQL:BatchCompleted
If instead of SQL:BatchCompleted, I use SQL:StmtCompleted I get a
smaller value for sum(duration). If I choose to sum the Duration
column of the SQLTransaction EventClass (number 50) with the commit
EventSubClass (1) the sum(duration) is bigger. Which one gives me the
better measurement?
Here is the commited transactions total, average, max and standard
deviation duration:
SELECT sum(Duration) as trans_total_time , AVG(Duration) as
average_time, max(Duration) as max_transaction_time, stdev(Duration)
as standard_deviation
FROM trace WHERE EventClass = 50 and EventSubClass =1
This is the time wasted rolling back transactions:
SELECT sum(Duration) FROM trace
WHERE EventClass = 50 and EventSubClass =2 -- duration of roolback
transactions
Time spent waiting for locks:
SELECT sum(Duration) FROM trace
WHERE EventClass = 24 -- 24 is Lock:Acquired event
Number of deadlocks:
SELECT MAX(IntegerData) FROM trace WHERE EventClass=59 --
Lock:Deadlock Chain
How do I measure the time wasted when a deadlock happens and a query
is canceled? Is it the duration of the Lock:Cancel event class?
Do the gurus here have any idea of more good data to collect?
BTW, I'm using SQL Server 2000
kind regards,
Paulo Eduardo Neves
http://NaoTemMosquito.blogspot.comHi Paulo
You will need to start your replay trace before yuo take the backup to make
sure it contains all changes between the backup finishing and the trace
starting. You can remove the backup commands after as well as other steps you
don't want. You will need to match database ids with the new system (or
change the trace). If you create the corresponding logins on the new server
you should be able to match the SIDs using sp_change_users_login, so mapping
users should not be a great deal!
For analysis you can use Bill Graziano's cleartrace at
www.cleardata.biz/cleartrace/download
If you are tuning then you should look at statements, frequency as well as
duration and I/O may also be important.
John
"Paulo Eduardo Neves" wrote:
> I'm creating a performance testing server. I'll describe my approach,
> maybe the group gurus can review it and suggest improvements. Please
> tell me if this is a sound approach.
> I want to have global performance measurements. They should guide us
> in improving the database overall responsiveness. So, if we add a
> complex clustered index that speed a query, but slows a frequent
> update, it should appear in the measurements.
> Here are the steps I followed:
> 1) Backup of the production database.
> 2) Recording of a representative SQL Profiler trace file of the
> production database
> using the TSQL_Replay template. It will be used as my base
> workload.
> 3) Restoring the backup to a Microsoft Virtual Server instance
> (http://www.microsoft.com/windowsserversystem/virtualserver/ )
> 4) Waste a lot of time configuring login permissions and making the
> database have the same
> Id number of the original one so I can replay the trace file.
> 5) Turning on the "Enable Undo disks" option in the Virtual Hard Disk
> Properties of the Virtual
> server instance.
> 6) Replay the saved trace file, saving another trace file to gather
> performance data.
> 7) Load the new trace file to a table and run some queries to measure
> the database
> performance.
> 8) Discard all the modifications of the virtual server instances,
> Tweak the database and go to
> step 6 to gather new data.
> Is it a sound approach? Are there better ways to do it?
> Now is my doubt: which measurements should I take from the saved
> profile?
> There are a lot of possible events to record in the Profiler trace,
> and in each one the fields mean something different.
> Here is my initial list of queries on the trace table:
> This is probably my most important measure. It gives me the total
> duration and CPU spent:
> SELECT sum(Duration) as total_time , sum(CPU) total_cpu,
> sum(Duration - CPU) as total_wait FROM trace WHERE EventClass => 12 -- 12 is SQL:BatchCompleted
> If instead of SQL:BatchCompleted, I use SQL:StmtCompleted I get a
> smaller value for sum(duration). If I choose to sum the Duration
> column of the SQLTransaction EventClass (number 50) with the commit
> EventSubClass (1) the sum(duration) is bigger. Which one gives me the
> better measurement?
> Here is the commited transactions total, average, max and standard
> deviation duration:
> SELECT sum(Duration) as trans_total_time , AVG(Duration) as
> average_time, max(Duration) as max_transaction_time, stdev(Duration)
> as standard_deviation
> FROM trace WHERE EventClass = 50 and EventSubClass =1
> This is the time wasted rolling back transactions:
> SELECT sum(Duration) FROM trace
> WHERE EventClass = 50 and EventSubClass =2 -- duration of roolback
> transactions
> Time spent waiting for locks:
> SELECT sum(Duration) FROM trace
> WHERE EventClass = 24 -- 24 is Lock:Acquired event
> Number of deadlocks:
> SELECT MAX(IntegerData) FROM trace WHERE EventClass=59 --
> Lock:Deadlock Chain
> How do I measure the time wasted when a deadlock happens and a query
> is canceled? Is it the duration of the Lock:Cancel event class?
> Do the gurus here have any idea of more good data to collect?
> BTW, I'm using SQL Server 2000
> kind regards,
> Paulo Eduardo Neves
> http://NaoTemMosquito.blogspot.com
>|||On Wed, 03 Oct 2007 12:22:03 -0700, Paulo Eduardo Neves
<pauloneves@.gmail.com> wrote:
>Do the gurus here have any idea of more good data to collect?
Looks like you've got the idea, but the basic structure I always
follow does a group by on the command text. Now, this can be tricky,
I think SQL 2005 has added some features to help there ... but you're
on 2K, so never mind that!
Anyway, my preferred figure of merit is reads, not duration. They
tend to correspond heavily, but reads is logical reads and is somewhat
independent of load, and tells you what it is costing to run that
command.
So,
select
top 20
substring(textdata,1,30) as cmd,
sum(reads) as sumreads,
avg(reads) as avgreads,
count(*) as cnt
from dbo.yourtractable
where eventclass in (rpc done, batch done)
group by substring(textdata,1,30)
order by 2
Now, if you're tuning an entire system, you need to create a total
picture, you need to come up with a way to baseline your performance
and measure the quantitative components and improvements you make. I
guess clocktime through a trace is a good overall measure.
... but this also assumes a lot of stuff about your IO system not
being saturated, and such. When the hardware starts to run out, these
softer statistics become hard to interpret. And replaying it on a VM,
boy, I dunno about that. But in any case, probably the most robust
and significant figure is logical reads, in my ever so humble (or not
so humble) opinion.
Josh|||On Oct 4, 6:17 am, John Bell <jbellnewspo...@.hotmail.com> wrote:
> You will need to start your replay trace before yuo take the backup to make
> sure it contains all changes between the backup finishing and the trace
> starting. You can remove the backup commands after as well as other steps you
> don't want.
I did a backup just a little before. I don't mind if some queries fail
later during the replay. They fail and the replay continues. They
would be insignificant when I calculate the total time.
You will need to match database ids with the new system (or
> change the trace). If you create the corresponding logins on the new server
> you should be able to match the SIDs using sp_change_users_login, so mapping
> users should not be a great deal!
And the default database of the replay users should be the one you are
testing.
> For analysis you can use Bill Graziano's cleartrace atwww.cleardata.biz/cleartrace/download
I've already seen Cleartrace. It looks very useful. The problem is
that I currently just have access to a SQL Server 2000 installation
and it doesn't run in the free version of SQL Server 2005. It looks
like that cleartrace doesn't give me global measurements, just
aggregate the statements.
Thanks for your tips.
regards,
Paulo Eduardo Neves
http://NaoTemMosquito.blogspot.com|||On Oct 5, 2:33 am, JXStern <JXSternChange...@.gte.net> wrote:
> On Wed, 03 Oct 2007 12:22:03 -0700, Paulo Eduardo Neves
> <paulone...@.gmail.com> wrote:
> >Do the gurus here have any idea of more good data to collect?
>
...
> Anyway, my preferred figure of merit is reads, not duration. They
> tend to correspond heavily, but reads is logical reads and is somewhat
> independent of load, and tells you what it is costing to run that
> command.
Yes. But I believe that the problem in my hands are caused due to
excessive locking in the database. Measuring Reads and Writes won't
display the locking problems.
> So,
> select
> top 20
> substring(textdata,1,30) as cmd,
> sum(reads) as sumreads,
> avg(reads) as avgreads,
> count(*) as cnt
> from dbo.yourtractable
> where eventclass in (rpc done, batch done)
> group by substring(textdata,1,30)
> order by 2
I'll try to use the cleartrace tool to get the statement aggregate
data.
> Now, if you're tuning an entire system, you need to create a total
> picture, you need to come up with a way to baseline your performance
> and measure the quantitative components and improvements you make. I
> guess clocktime through a trace is a good overall measure.
Will the clocktime value be different from the total transactions
duration?
> ... but this also assumes a lot of stuff about your IO system not
> being saturated, and such. When the hardware starts to run out, these
> softer statistics become hard to interpret. And replaying it on a VM,
> boy, I dunno about that. But in any case, probably the most robust
> and significant figure is logical reads, in my ever so humble (or not
> so humble) opinion.
First I'm working in a proof of concept server. After showing its
utility, I'll lobby for a dedicated performance testing machine. If I
guarantee that there's nothing else running in the machine, and put
the replay trace file and the performance recording trace in other
harddisk than the one with the database file, I believe that I will
have consistent performance measurements.
I've had a nice idea about the setup. Filtering the trace events where
Duration != 0, will record just the interesting log events.
Thanks for the good ideas!
Paulo
http://NaoTemMosquito.blogspot.com|||Hi Paulo
Replaying traces will potentially highlight problems and a very useful
facility, it is not guaranteed that they always find your problems. Check out
books online to see the restictions when replaying a trace. You may want to
consider using a test tool such as Microsoft Team Suite, LoadRunner or
Rational Performance Tester etc..
If you have locking problems look at usings some of the locking events to
check what your system is doing, rather than relying on time/reads/writes etc.
John
"Paulo Eduardo Neves" wrote:
> On Oct 5, 2:33 am, JXStern <JXSternChange...@.gte.net> wrote:
> > On Wed, 03 Oct 2007 12:22:03 -0700, Paulo Eduardo Neves
> >
> > <paulone...@.gmail.com> wrote:
> > >Do the gurus here have any idea of more good data to collect?
> >
> ...
> > Anyway, my preferred figure of merit is reads, not duration. They
> > tend to correspond heavily, but reads is logical reads and is somewhat
> > independent of load, and tells you what it is costing to run that
> > command.
> Yes. But I believe that the problem in my hands are caused due to
> excessive locking in the database. Measuring Reads and Writes won't
> display the locking problems.
> > So,
> >
> > select
> > top 20
> > substring(textdata,1,30) as cmd,
> > sum(reads) as sumreads,
> > avg(reads) as avgreads,
> > count(*) as cnt
> > from dbo.yourtractable
> > where eventclass in (rpc done, batch done)
> > group by substring(textdata,1,30)
> > order by 2
> I'll try to use the cleartrace tool to get the statement aggregate
> data.
> >
> > Now, if you're tuning an entire system, you need to create a total
> > picture, you need to come up with a way to baseline your performance
> > and measure the quantitative components and improvements you make. I
> > guess clocktime through a trace is a good overall measure.
> Will the clocktime value be different from the total transactions
> duration?
> >
> > ... but this also assumes a lot of stuff about your IO system not
> > being saturated, and such. When the hardware starts to run out, these
> > softer statistics become hard to interpret. And replaying it on a VM,
> > boy, I dunno about that. But in any case, probably the most robust
> > and significant figure is logical reads, in my ever so humble (or not
> > so humble) opinion.
> First I'm working in a proof of concept server. After showing its
> utility, I'll lobby for a dedicated performance testing machine. If I
> guarantee that there's nothing else running in the machine, and put
> the replay trace file and the performance recording trace in other
> harddisk than the one with the database file, I believe that I will
> have consistent performance measurements.
> I've had a nice idea about the setup. Filtering the trace events where
> Duration != 0, will record just the interesting log events.
> Thanks for the good ideas!
> Paulo
> http://NaoTemMosquito.blogspot.com
>
>
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment