This problem isn't specific to SQL Server, but because of the size of the files I deal with with SQL Server, it is the place I notice it more often, and hopefully one of you has, too.
If I am restoring a database (or simply copying a huge file) that takes more than a couple of minutes, I find, by watching the network bandwidth, that after a couple of minutes, the data rate cuts on half and stays that way for the rest of the restore (or copy). If I have a Gb/s connection, maybe I start at 240Mb/s and then drop off to 120MB/s. If I have a 100Mb/s connection, maybe I start at 70Mb/s and drop to 30-40MB/s. It is fairly consistant in how long before the drop and in the magnatude of the drop (approximately 50%).
The network staff have no clue. They say that they have put nothing in place to throttle bandwidth hogs. It doesn't seem to matter which servers the transfer is going between. It doesn't matter if they are plugged into the same switch or go across several switches.
I have googled for any reference to this with no luck. Has anyone else experienced this? Does anyone know the cause?
A couple of possibilities:
Have you observed Perfmon data on the drive you are copying your large file to? If the Average Disk Queue Length is exceeding 2 on a sustaining basis you are saturating the Disk I/O
Check the Read/Write cache ratio on your disk controller(s), if they are set to 100% Read and you are trying to write a large file to disk, it's going to slow it down significantly
Do you have the /3GB switch enabled in the boot.ini file on your server? We found on a number of our servers that large file copies (22GB+) were actually failing because we were exhausiting the Kernel resources on both the Source and Target servers. The resolution in this particular case was to remove the /3GB switch from both the source and target servers and then the large file copies succeeded.
Huge memory paging (2,000 pages/sec +) may also be another area to look into
Check the network card on your source and target servers to make sure they are negiotiating at Full Duplex. I have found that the network folks set the ports on the switch to Auto/Auto and the server SA's set the NIC cards to Forced Full. This causes a negotiation conflict and forces the NIC to negotiate at half duplex or worse
|||One reason I don;t think it is the target drive is that with I restore a DB, before data is read from the backup, the whole target file is written. With permon, I have seen it write out a 40G file at much higher, fairly constant rate than the fastest rate it will write to once it starts loading it with data from the backup. I haven't tried it lately, but I suspect I wouldn't see this throttling behavior if the backup file is on the local machine. Maybe I will try that to confirm.|||The /3GB switch in your boot.ini would help|||
I just tried it between two Win2003x64 servers with 8G of ram each and saw the same phenomenon. Presumably, the /3G option would be moot in this case.
|||Can't believe I found this thread by accident, when I just encountered the same error this morning
I was copying a 12GB SQL bak file from one server to another (identical servers, Windows 2003 R2 32-bit, 8GB RAM with /PAE and /3GB boot.ini, 15K rpm 200GB RAID5 disks, Gigabit NICs both at Auto speed)
The file copy would come to almost dead after a while (it initially displayed 3 minutes remaining, where NIC utilization is at 60%), then about one minute after (the estimate remaining time starts to go up to ~15 minutes, and NIC utilization is at 4~9% only
I saw this thread, remove the /3GB in boot.ini in BOTH servers, restart, re-copy, and the same thing still happened. Now I'm at a loss to what caused this
|||
We have the same problem - win2003x64(AMD) server - 1 with 4G ram the other 1G ram. Coping from the one with 4 to the 1 with 1.
Anyone ever come up with a solution for this? Would be most appreciated.
Thanks!
Mark
|||Regarding restore speed, do you have Windows Instant File Initialization enabled? You need to give the SQL Server Service account the right (in Group Policy Editor, for example), to "Perform Volume Maintenance Tasks". Otherwise, Windows has to zero out the space allocated for the file, which makes the restore take at least 3-4 times longer. You will have to restart the SQL Server Service for this to take effect.|||GlennAlanBerry,
Does this apply to SQL Server 2000 or just 2005? I just checked my SQL service account and it has this right, but (in 2000) it still makes the empty file first. In 2005, I have seen that it starts restoring instantly (determined by seeing that it is pulling data from across the network).
Most of the time, I am restoring over an existing DB, so this doesn't make a whole lot of difference to my speed issues. When it does matter, the writing of the empty file usually goes several times faster than the actual restore. That is what concerns me. With RAID5 arrays on either end and a gigabit connection in between, you would think that the array write speed should be the bottleneck. Even before this odd throttling affect kicks in, it isn't the bottleneck. I did a series of restores last night. Based on the statistics at the end, the smaller DBs (2G and 6G) restored at around 20MB/s, but the larger ones (16G and 47G) restored at around 8MB/s. I wasn't watching the network throughput, but I'm sure if I was, I would have seen the all to familiar pattern of starting off fast and then abruptly dropping to a slower speed after a few minutes.
|||Windows Instant File Initialization is not used by SQL Server 2000
No comments:
Post a Comment