Monday 31 May 2010

Fix - SharePoint Very slow to start after an IISRESET or Recycle of App Pool (30-130 seconds)

I was asked by another team at my current client to look at a performance issue that they'd been having major issues with. There were no obvious errors in the Windows Event Log or SharePoint logs related to the issue. The problem was that:
  1. If the application pool is recycled, it would take around 90-120 seconds for the first page to be served. This would be unacceptable to the client in case the App pool was recycled in the middle of the day - it would mean 2 minutes of downtime for all employees.
  2. A similar issue with after an IIS reset was performed - it also happened with ALL sites, not just one or two.
To diagnose the issue, I did the following:
  1. ANY performance improvement should be measurable. So I used the Fiddler Web Debugger (http://www.fiddler2.com/fiddler2/)  to measure the total request time. Time was 84 seconds on this particular test server.
  2. Used Sysinternals Process Explorer to see what the threads were doing. This revealed little - but it was clear that the process wasn't 100% the whole time so it wasn't a problem related to intensive CPU processing.
  3. I enabled ASP.NET tracing at the application level as per http://msdn.microsoft.com/en-us/library/1y89ed7z(VS.71).aspx and viewed the trace log through http://servername/Pages/Trace.axd. However, looking at the load of the control tree - nothing was taking a particularly long time. Even when the trace.axd was loading up, it would take an inordinately long time to start up and server the first requested page. This ruled out the possibility of it being a slow control being rendered.
  4. I created a completely new web application in SharePoint and it exhibited the same problem. I began to suspect machine-level config settings.
  5. I found and fixed several errors in the Windows Event Log and Sharepoint Log but they made no difference.
  6. I began to look at the Fiddler trace while testing again and by chance noticed that requests were also being made to an external address at Microsoft for code signing certificates. I thought this was unusual - so did a bit of research and found that it was checking for a revoked certificates list on a Microsoft web server. This is done when any of the cryptography calls are performed. Some details about this can be found here - but the article is related to Exchange specifically:
    http://msexchangeteam.com/archive/2010/05/14/454877.aspx  
  7. To work around the issue, I tried the registry entries suggested by http://msexchangeteam.com/archive/2010/05/14/454877.aspx, but it didn't seem to work. What DID work was pointing the hosts file so that crl.microsoft.com would resolve to the local host (127.0.0.1).  This meant that the call would much more quickly fail when it tries to access the certificate revoke list at http://crl.microsoft.com/pki/crl/products/CSPCA.crl and http://crl.microsoft.com/pki/crl/products/CodeSignPCA2.crl, and not hold up the loading of Applications on the SharePoint server.
  8. After the HOSTs file change, recycle time (and reset time) went from 84 seconds to 20 seconds.
Hopefully this blog entry helps someone else with diagnosing this slowdown problem. Note that this fix only applies if your server doesn't have access to the internet - it is a problem specific to offline or intranet servers.

[UPDATE] - Found that someone else encountered this same issue as per
http://blogs.technet.com/b/markrussinovich/archive/2009/05/26/3244913.aspx and
http://www.muhimbi.com/blog/2009/04/new-approach-to-solve-sharepoints.html


The first article suggests the use of an XML file in each config - but I've not tested this out:

<?xml version="1.0" encoding="utf-8"?> 
<configuration> 
      <runtime> 
              <generatePublisherEvidence enabled="false"/> 
      </runtime> 
</configuration>

[UPDATE - 11 October 2010]
One of my colleagues from Oakton had a similar issue and the above fix (using the hosts file) didn't work for them.

One of the fixes that did work was to do the following:
"Disable the CRL check by modifying the registry for all user accounts that use STSADM and all service accounts used by SharePoint. Find yourself a group policy wizard or run the vbscript at the end of this posting to help you out. Alternatively you can manually modify the registry for each account:


[HKEY_USERS\\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing]
"State"=dword:00023e00 "

The following script applies the registry change to all users on a server. This will solve the spin-up time for the service accounts, interactive users and new users.


const HKEY_USERS = &H80000003
strComputer = "."
Set objReg=GetObject("winmgmts:{impersonationLevel=impersonate}!\\" _
& strComputer & "\root\default:StdRegProv")
strKeyPath = ""
objReg.EnumKey HKEY_USERS, strKeyPath, arrSubKeys
strKeyPath = "\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing"
For Each subkey In arrSubKeys
  objReg.SetDWORDValue HKEY_USERS, subkey & strKeyPath, "State", 146944
Next



DDK

5 comments:

Unknown said...

great post!

Blake said...

Sharepoint helps the employees communicate with each other fast that help to accomplish the job swiftly. Like Washington, DC meeting rooms is equipped enough to meet the employees and clients wherever they maybe around the world. Washington DC meeting space provided with video conferencing helps the management meet with the clients in pronto to avoid hassles that maybe encountered on streets.

Todd Wilder said...

Bravo

Unknown said...

THANK YOU, THANK YOU, THANK YOU! A post that keeps on giving... 16/08/2013

Your VB script saved my ass big time! I had 100 users suddenly come to a screeching halt and this post got me on the right track.

Henrik said...

Thank you for posting this. Helped alot.