Database Unit Testing: Database options in TPC-C test

I read full disclosure report for best TPC-C SQL Server results, and have
some questions. What is trace flag 828, where to find decription of the flag
?
How useful it for OLTP system under heavy load? Why checkpoint were done
manually avery 15 min, not simply configure server to set checkpoint interva
l
15 min? Is it better in terms of performance? How big impact of "torn page
detection option"? - it was turned off in the test.
I have SQL Server database with 30-40 transactions per second average, and
300 transactions per second in peaks. And the load would increase 4-5 times
in next 2 monthes. So I trying to find what to do to be able to handle such
load.You should not go by how the server was configured in a TPC test for a
normal production environment. It was tweaked for months to get every last
transaction with a massive amount of hardware. You would not do this in a
production env. The volumes you are talking about are well within the
limits of a normally configured server with default settings on SQL Server.
If you start messing with things liek the recovery interval without
understanding exactly what effect it will have you will most likely hurt
performance instead of increasing it. Never change a configuration setting
in SQL Server just because someone else did. You are better off making sure
you have a proper hardware configuration to handle your load. Especially the
disk configurations. The more trans per second you do the more important it
is to have a proper disk I/O subsystem and to isolate the log files onto
their own Raid 1 or Raid 10.
Andrew J. Kelly SQL MVP
"andsm" <andsm@.discussions.microsoft.com> wrote in message
news:F58FF223-B5F1-43C2-BB3A-17CB2A51495F@.microsoft.com...
>I read full disclosure report for best TPC-C SQL Server results, and have
> some questions. What is trace flag 828, where to find decription of the
> flag?
> How useful it for OLTP system under heavy load? Why checkpoint were done
> manually avery 15 min, not simply configure server to set checkpoint
> interval
> 15 min? Is it better in terms of performance? How big impact of "torn page
> detection option"? - it was turned off in the test.
> I have SQL Server database with 30-40 transactions per second average, and
> 300 transactions per second in peaks. And the load would increase 4-5
> times
> in next 2 monthes. So I trying to find what to do to be able to handle
> such
> load.|||I know what I/O system is most important to handle the load. But additionall
y
to be able to handle the load, for the database necessary to have transactio
n
duration in good range - not more than 120 ms. Currenly average transaction
duration is 30 ms. But some transactions go up to 1.4 sec. Looking at
transaction log by Log Explorer, I found it usually happens during
checkpoints. And with increased load system would have checkpoints more
often. Note what during nearby all of long runned transactions had no waits
on locks - I have monitor of locking, it show nothing for most of such long
running transactions.
So I looking which options can help to decrease number of long-running
transactions. Its reason why I looking database options in TPC-C tests.
We upgrading our HW, I/O system. Currently data is on RAID10, log on RAID1.
We buying Dell CX300 storage system, database would be moved to the system.
Write cache would be enabled. What can be expected from such hardware in
terms of decreasing number of long running transactions? And about
checkpoints. Because most of long-runned transactions happens during
checkpoints, may be it is better to do the checkpoints manually, each 15
minutes as in the TPC-C test? How trace flag 828 (for which I not found
decription anywhere in Internet) affect checkpoints behavior? Can "torn page
detection" option cause some of long-running transaction?|||The reason why checkpoints affect the length of the transaction is due to
the fact the disk subsystem can not handle the large spike presented to it
during the checkpoint. If you are using direct attached storage now it
probably does not have a lot of cache on the controller as compared to a
SAN. While the CX300 is the lower end of the DELL (really EMC ) line of
SAN's it still will probably have more cache than what you have now. The
larger cache will buffer or absorb the spikes of a checkpoint and keep the
response times more even. With direct attached storage you almost always
want the controller cache to be 100% write back and 0% read. Usually by
default they are around 50 / 50. Even with a SAN you still have to ensure
the drives are configured properly to handle the load. Often on a SAN the
administrators make one large RAID and split it up into several logical
partitions or LUNs that they hand you. So you will see the log file, data
and tempdb all on one physical Raid array even though it appears as 3
separate drives. This is only acceptable if the cache is so much that it
never gets saturated and then you are at the mercy of the Raid array and all
the luns tacked on to it. A CX300 has the smallest cache of that line so
make sure it is enough for what you need to do. Again adjusting the
checkpoint interval is not the answer. Sizing the hardware properly to
begin with is. I don't know what trace 828 is and again you should not
count on a trace flag to get the performance you need. With few exceptions
if it is designed right you should not need any trace flags. Torn Page
detection can add some overhead on very large I/O bound systems but you are
better off leaving it on to begin with. The type of volumes you are talking
about should not be affected that much by this being on if you have the
right hardware.
Andrew J. Kelly SQL MVP
"andsm" <andsm@.discussions.microsoft.com> wrote in message
news:7FD2F25D-FE03-453C-AA09-8DD75B23C1B7@.microsoft.com...
>I know what I/O system is most important to handle the load. But
>additionally
> to be able to handle the load, for the database necessary to have
> transaction
> duration in good range - not more than 120 ms. Currenly average
> transaction
> duration is 30 ms. But some transactions go up to 1.4 sec. Looking at
> transaction log by Log Explorer, I found it usually happens during
> checkpoints. And with increased load system would have checkpoints more
> often. Note what during nearby all of long runned transactions had no
> waits
> on locks - I have monitor of locking, it show nothing for most of such
> long
> running transactions.
> So I looking which options can help to decrease number of long-running
> transactions. Its reason why I looking database options in TPC-C tests.
> We upgrading our HW, I/O system. Currently data is on RAID10, log on
> RAID1.
> We buying Dell CX300 storage system, database would be moved to the
> system.
> Write cache would be enabled. What can be expected from such hardware in
> terms of decreasing number of long running transactions? And about
> checkpoints. Because most of long-runned transactions happens during
> checkpoints, may be it is better to do the checkpoints manually, each 15
> minutes as in the TPC-C test? How trace flag 828 (for which I not found
> decription anywhere in Internet) affect checkpoints behavior? Can "torn
> page
> detection" option cause some of long-running transaction?
>

Tuesday, February 14, 2012

Database options in TPC-C test

No comments:

Post a Comment

Database Unit Testing

Blog Archive

About Me