Saturday, February 25, 2012

Checkpoint and performance

While profiling our stored procs that write data to the database, we notice
that while the database is checkpointing, the performance of these stored
procs gets worse. What usually will takes 100 ms takes 400 to 700 ms. Once
the checkpoint is done, they are back to their normal runtime
Is this expected behavior?
BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running Windows
Server 2003/SQL Server 2000 Enterprise SP3a.With direct attached SCSI I saw the same behavior. Basically, your
checkpoint is saturating the IO bandwidth of your controller card. I saw
this with high-end RAID controllers with 128MB RAM when the checkpoint was
large enough to flood the on-board cache. This went away with a SAN since
FC is full duplex and the SAN has 3GB of write cache, easily enough to hold
a checkpoint.
--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
"Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> While profiling our stored procs that write data to the database, we
notice
> that while the database is checkpointing, the performance of these stored
> procs gets worse. What usually will takes 100 ms takes 400 to 700 ms.
Once
> the checkpoint is done, they are back to their normal runtime
> Is this expected behavior?
> BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running Windows
> Server 2003/SQL Server 2000 Enterprise SP3a.
>|||I forgot to add we are using attached to an EMC storage system with 1
terabyte of storage... The controller card is fiber to the EMC system and
the sustained throughput during a checkpoint is 8 MBps and the that is
nothing compared to what we can actually push to that system. During
database restores we can get over 50 MBps!!! So while it may be an I/O
issue, we are not even reaching the saturation point of our attached I/O
system.
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:OCuhqipuDHA.3220@.tk2msftngp13.phx.gbl...
> With direct attached SCSI I saw the same behavior. Basically, your
> checkpoint is saturating the IO bandwidth of your controller card. I saw
> this with high-end RAID controllers with 128MB RAM when the checkpoint was
> large enough to flood the on-board cache. This went away with a SAN since
> FC is full duplex and the SAN has 3GB of write cache, easily enough to
hold
> a checkpoint.
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
>
>
> "Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
> news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> > While profiling our stored procs that write data to the database, we
> notice
> > that while the database is checkpointing, the performance of these
stored
> > procs gets worse. What usually will takes 100 ms takes 400 to 700 ms.
> Once
> > the checkpoint is done, they are back to their normal runtime
> >
> > Is this expected behavior?
> >
> > BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running Windows
> > Server 2003/SQL Server 2000 Enterprise SP3a.
> >
> >
>|||I am also using an EMC SAN (CX-600) with 4 fiber cards from each cluster
host to the SAN. The real key is write cache. Mine is set to max at 3GB.
If you overfill your write cache, you go to direct write mode and lose a lot
of the benefits of your high-end IO system.
There may be CPU issues as well. Have you monitored processor time on a
per-CPU basis? Also, do you have Hyperthreading on or off?
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
"Kevin Jackson" <softwiz@.covad.net> wrote in message
news:u6qIPUquDHA.2308@.TK2MSFTNGP11.phx.gbl...
> I forgot to add we are using attached to an EMC storage system with 1
> terabyte of storage... The controller card is fiber to the EMC system and
> the sustained throughput during a checkpoint is 8 MBps and the that is
> nothing compared to what we can actually push to that system. During
> database restores we can get over 50 MBps!!! So while it may be an I/O
> issue, we are not even reaching the saturation point of our attached I/O
> system.
> "Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
> news:OCuhqipuDHA.3220@.tk2msftngp13.phx.gbl...
> > With direct attached SCSI I saw the same behavior. Basically, your
> > checkpoint is saturating the IO bandwidth of your controller card. I
saw
> > this with high-end RAID controllers with 128MB RAM when the checkpoint
was
> > large enough to flood the on-board cache. This went away with a SAN
since
> > FC is full duplex and the SAN has 3GB of write cache, easily enough to
> hold
> > a checkpoint.
> >
> > --
> > Geoff N. Hiten
> > Microsoft SQL Server MVP
> > Senior Database Administrator
> > Careerbuilder.com
> >
> >
> >
> >
> > "Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
> > news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> > > While profiling our stored procs that write data to the database, we
> > notice
> > > that while the database is checkpointing, the performance of these
> stored
> > > procs gets worse. What usually will takes 100 ms takes 400 to 700 ms.
> > Once
> > > the checkpoint is done, they are back to their normal runtime
> > >
> > > Is this expected behavior?
> > >
> > > BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running
Windows
> > > Server 2003/SQL Server 2000 Enterprise SP3a.
> > >
> > >
> >
> >
>|||I have seen minimally configured SAN systems saturate at around 50MB/sec.
are you seeing high disk queue lengths? if so, then how much bandwidth do
you have to the storage subsystem? How many spindles, what raid level, how
much write cache?
--
Kevin Connell, MCDBA
----
The views expressed here are my own
and not of my employer.
----
"Kevin Jackson" <softwiz@.covad.net> wrote in message
news:u6qIPUquDHA.2308@.TK2MSFTNGP11.phx.gbl...
> I forgot to add we are using attached to an EMC storage system with 1
> terabyte of storage... The controller card is fiber to the EMC system and
> the sustained throughput during a checkpoint is 8 MBps and the that is
> nothing compared to what we can actually push to that system. During
> database restores we can get over 50 MBps!!! So while it may be an I/O
> issue, we are not even reaching the saturation point of our attached I/O
> system.
> "Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
> news:OCuhqipuDHA.3220@.tk2msftngp13.phx.gbl...
> > With direct attached SCSI I saw the same behavior. Basically, your
> > checkpoint is saturating the IO bandwidth of your controller card. I
saw
> > this with high-end RAID controllers with 128MB RAM when the checkpoint
was
> > large enough to flood the on-board cache. This went away with a SAN
since
> > FC is full duplex and the SAN has 3GB of write cache, easily enough to
> hold
> > a checkpoint.
> >
> > --
> > Geoff N. Hiten
> > Microsoft SQL Server MVP
> > Senior Database Administrator
> > Careerbuilder.com
> >
> >
> >
> >
> > "Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
> > news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> > > While profiling our stored procs that write data to the database, we
> > notice
> > > that while the database is checkpointing, the performance of these
> stored
> > > procs gets worse. What usually will takes 100 ms takes 400 to 700 ms.
> > Once
> > > the checkpoint is done, they are back to their normal runtime
> > >
> > > Is this expected behavior?
> > >
> > > BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running
Windows
> > > Server 2003/SQL Server 2000 Enterprise SP3a.
> > >
> > >
> >
> >
>|||Very short disk queue lengths.
4 ea FC2 HBAs per host computer. Theoretical max 800MB/sec. Box maxes out
at 1100 MB/sec manufacturer's spec. Real limit seems to be aroudn
200MB/sec.
Most RAID sets are 16 spindles RAID 1+0 Some are fewer spindles but all are
1+0.
Total system has 3GB write cache.
Front end is 4-node 3 instance cluster (Windows Server 2003) 8x2.8GHz procs
and 32GB RAM /box. (Unisys ES-7000)
Obviously NOT a minimally configured system. I was able to get 50MB/sec
from direct attached SCSI. Again, the problem I ran into was cache
saturation and half-duplex data pathing.
Then again, there is the flip-side problem where the IO system can be too
fast.
FIX: The Checkpoint Process Can Delay SQL Server Database Activity and Does
Not Yield Scheduler Correctly Causing Error: 17883 to Occur
http://support.microsoft.com/default.aspx?scid=kb;en-us;815056&Product=sql2k
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com
"Kevin" <ReplyTo@.Newsgroups.only> wrote in message
news:u2X1wUsuDHA.1512@.TK2MSFTNGP10.phx.gbl...
> I have seen minimally configured SAN systems saturate at around 50MB/sec.
> are you seeing high disk queue lengths? if so, then how much bandwidth do
> you have to the storage subsystem? How many spindles, what raid level,
how
> much write cache?
> --
> Kevin Connell, MCDBA
> ----
> The views expressed here are my own
> and not of my employer.
> ----
> "Kevin Jackson" <softwiz@.covad.net> wrote in message
> news:u6qIPUquDHA.2308@.TK2MSFTNGP11.phx.gbl...
> > I forgot to add we are using attached to an EMC storage system with 1
> > terabyte of storage... The controller card is fiber to the EMC system
and
> > the sustained throughput during a checkpoint is 8 MBps and the that is
> > nothing compared to what we can actually push to that system. During
> > database restores we can get over 50 MBps!!! So while it may be an I/O
> > issue, we are not even reaching the saturation point of our attached I/O
> > system.
> >
> > "Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
> > news:OCuhqipuDHA.3220@.tk2msftngp13.phx.gbl...
> > > With direct attached SCSI I saw the same behavior. Basically, your
> > > checkpoint is saturating the IO bandwidth of your controller card. I
> saw
> > > this with high-end RAID controllers with 128MB RAM when the checkpoint
> was
> > > large enough to flood the on-board cache. This went away with a SAN
> since
> > > FC is full duplex and the SAN has 3GB of write cache, easily enough to
> > hold
> > > a checkpoint.
> > >
> > > --
> > > Geoff N. Hiten
> > > Microsoft SQL Server MVP
> > > Senior Database Administrator
> > > Careerbuilder.com
> > >
> > >
> > >
> > >
> > > "Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
> > > news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> > > > While profiling our stored procs that write data to the database, we
> > > notice
> > > > that while the database is checkpointing, the performance of these
> > stored
> > > > procs gets worse. What usually will takes 100 ms takes 400 to 700
ms.
> > > Once
> > > > the checkpoint is done, they are back to their normal runtime
> > > >
> > > > Is this expected behavior?
> > > >
> > > > BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running
> Windows
> > > > Server 2003/SQL Server 2000 Enterprise SP3a.
> > > >
> > > >
> > >
> > >
> >
> >
>|||16 spindles is not very much - 200MB/sec is actually really quite good for
16 15KRPM spindles.
--
Kevin Connell, MCDBA
----
The views expressed here are my own
and not of my employer.
----
"Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
news:Os3suQ1uDHA.2340@.TK2MSFTNGP12.phx.gbl...
> Very short disk queue lengths.
> 4 ea FC2 HBAs per host computer. Theoretical max 800MB/sec. Box maxes
out
> at 1100 MB/sec manufacturer's spec. Real limit seems to be aroudn
> 200MB/sec.
> Most RAID sets are 16 spindles RAID 1+0 Some are fewer spindles but all
are
> 1+0.
> Total system has 3GB write cache.
> Front end is 4-node 3 instance cluster (Windows Server 2003) 8x2.8GHz
procs
> and 32GB RAM /box. (Unisys ES-7000)
> Obviously NOT a minimally configured system. I was able to get 50MB/sec
> from direct attached SCSI. Again, the problem I ran into was cache
> saturation and half-duplex data pathing.
> Then again, there is the flip-side problem where the IO system can be too
> fast.
> FIX: The Checkpoint Process Can Delay SQL Server Database Activity and
Does
> Not Yield Scheduler Correctly Causing Error: 17883 to Occur
>
http://support.microsoft.com/default.aspx?scid=kb;en-us;815056&Product=sql2k
>
> --
> Geoff N. Hiten
> Microsoft SQL Server MVP
> Senior Database Administrator
> Careerbuilder.com
>
>
> "Kevin" <ReplyTo@.Newsgroups.only> wrote in message
> news:u2X1wUsuDHA.1512@.TK2MSFTNGP10.phx.gbl...
> > I have seen minimally configured SAN systems saturate at around
50MB/sec.
> > are you seeing high disk queue lengths? if so, then how much bandwidth
do
> > you have to the storage subsystem? How many spindles, what raid level,
> how
> > much write cache?
> >
> > --
> > Kevin Connell, MCDBA
> > ----
> > The views expressed here are my own
> > and not of my employer.
> > ----
> > "Kevin Jackson" <softwiz@.covad.net> wrote in message
> > news:u6qIPUquDHA.2308@.TK2MSFTNGP11.phx.gbl...
> > > I forgot to add we are using attached to an EMC storage system with 1
> > > terabyte of storage... The controller card is fiber to the EMC system
> and
> > > the sustained throughput during a checkpoint is 8 MBps and the that is
> > > nothing compared to what we can actually push to that system. During
> > > database restores we can get over 50 MBps!!! So while it may be an
I/O
> > > issue, we are not even reaching the saturation point of our attached
I/O
> > > system.
> > >
> > > "Geoff N. Hiten" <SRDBA@.Careerbuilder.com> wrote in message
> > > news:OCuhqipuDHA.3220@.tk2msftngp13.phx.gbl...
> > > > With direct attached SCSI I saw the same behavior. Basically, your
> > > > checkpoint is saturating the IO bandwidth of your controller card.
I
> > saw
> > > > this with high-end RAID controllers with 128MB RAM when the
checkpoint
> > was
> > > > large enough to flood the on-board cache. This went away with a SAN
> > since
> > > > FC is full duplex and the SAN has 3GB of write cache, easily enough
to
> > > hold
> > > > a checkpoint.
> > > >
> > > > --
> > > > Geoff N. Hiten
> > > > Microsoft SQL Server MVP
> > > > Senior Database Administrator
> > > > Careerbuilder.com
> > > >
> > > >
> > > >
> > > >
> > > > "Kevin Jackson" <kjackson@.powerwayinc.com> wrote in message
> > > > news:ONLYmSpuDHA.2304@.TK2MSFTNGP12.phx.gbl...
> > > > > While profiling our stored procs that write data to the database,
we
> > > > notice
> > > > > that while the database is checkpointing, the performance of these
> > > stored
> > > > > procs gets worse. What usually will takes 100 ms takes 400 to 700
> ms.
> > > > Once
> > > > > the checkpoint is done, they are back to their normal runtime
> > > > >
> > > > > Is this expected behavior?
> > > > >
> > > > > BTW this is a quad P4 xeon 2.8ghz box with 12 gig of ram running
> > Windows
> > > > > Server 2003/SQL Server 2000 Enterprise SP3a.
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>

No comments:

Post a Comment