3Cp>大约一年前,我用2008r2群集了这个问题。我在这里询问了,但我没有在这个过程中抓住足够的日志。现在我在2012年的2012年故障转移群集中具有相同的问题。所以我正在对“新”问题进行一个单独的问题。u003C/

3Cp>我很难思考它只是一个巧合,两簇都有同样的问题。但我找不到一个解决方案,加上它很多计划去测试解决方案。但我在这里把它扔掉,看看有人有什么想法。u003C/

3Cp>群集是具有SQLServer 2012 SP2的两个物理节点Windows Server 2012 R2标准。 SQLServer包含101个DB,其中大小从2 MB跨越2 MB到150 GB。大多数DBS约为200-300 MB,处于简单模式,使用较低。 (2008年群集与此非常相似,但达到了150迪斯)u003C/

3Cp>当我在被动节点上安装SP3时,它可以正常工作,没有错误。但是当我故障转移时,它需要在线存储,ServerName,文件服务器和DTC资源,SQL Server正在线挂起,SQL Server Agent已关闭。 10分钟后,它将SQL Server资源更改为失败,并且将A失败返回到另一个节点u003C/

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          -
Event ID:      4
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      ACTIVE_NODE.domain.se
Description:
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server PASSIVE_NODE$. The target name used was RPCSS/CLUSTER_NAME.domain.se. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (DOMAIN.SE) is different from the client domain (DOMAIN.SE), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Kerberos" Guid="{98E6CFCB-EE0A-41E0-A57B-622D4E1B30B1}" EventSourceName="Kerberos" />
    <EventID Qualifiers="16384">4</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T20:21:01.000000000Z" />
    <EventRecordID>1806734</EventRecordID>
    <Correlation />
    <Execution ProcessID="0" ThreadID="0" />
    <Channel>System</Channel>
    <Computer>ACTIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="Server">PASSIVE_NODE$</Data>
    <Data Name="TargetRealm">DOMAIN.SE</Data>
    <Data Name="Targetname">RPCSS/CLUSTER_NAME.domain.se</Data>
    <Data Name="ClientRealm">domain.SE</Data>
    <Binary>
    </Binary>
  </EventData>
</Event>

3Cp>还有这个:u003C/

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          -
Event ID:      4
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server ACTIVE_NODE$. The target name used was cifs/CLUSTER_NAME.domain.se. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (DOMAIN.SE) is different from the client domain (DOMAIN.SE), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Kerberos" Guid="{98E6CFCB-EE0A-41E0-A57B-622D4E1B30B1}" EventSourceName="Kerberos" />
    <EventID Qualifiers="16384">4</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T20:19:57.000000000Z" />
    <EventRecordID>1735401</EventRecordID>
    <Correlation />
    <Execution ProcessID="0" ThreadID="0" />
    <Channel>System</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="Server">ACTIVE_NODE$</Data>
    <Data Name="TargetRealm">domain.SE</Data>
    <Data Name="Targetname">cifs/CLUSTER_NAME.domain.se</Data>
    <Data Name="ClientRealm">domain.SE</Data>
    <Binary>
    </Binary>
  </EventData>
</Event>

3Cp>我添加了所有抱怨的SPN:u003C/

3Cblockquote> n ..u00

3Cp> setspn -s cifs / cluster_name.de.se cluster_name检查域 n dc = domain,dc = se n注册serviceprincipalnames for n cn = cluster_name,ou = clustername,ou =服务器, n dc = domain,dc = se n cifs / cluster_name.domain.se n更新了对象u003C/

p> n ..u003C/blockquot

3Cp>错误中的其他条目:u003C/

Log Name:      Application
Source:        Application Error
Date:          -
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.ltdalarna.se
Description:
Faulting application name: rhs.exe, version: 6.3.9600.17396, time stamp: 0x5434e29b
Faulting module name: KERNELBASE.dll, version: 6.3.9600.18202, time stamp: 0x569e7eb1
Exception code: 0x80000003
Fault offset: 0x00000000000de0e2
Faulting process id: 0x206c
Faulting application start time: 0x01d16e778b9bb4fb
Faulting application path: C:\Windows\Cluster\rhs.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 4459c209-da6b-11e5-80d8-fc15b41e47f0
Faulting package full name: 
Faulting package-relative application ID: 
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Application Error" />
    <EventID Qualifiers="0">1000</EventID>
    <Level>2</Level>
    <Task>100</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T20:23:15.000000000Z" />
    <EventRecordID>502077</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>rhs.exe</Data>
    <Data>6.3.9600.17396</Data>
    <Data>5434e29b</Data>
    <Data>KERNELBASE.dll</Data>
    <Data>6.3.9600.18202</Data>
    <Data>569e7eb1</Data>
    <Data>80000003</Data>
    <Data>00000000000de0e2</Data>
    <Data>206c</Data>
    <Data>01d16e778b9bb4fb</Data>
    <Data>C:\Windows\Cluster\rhs.exe</Data>
    <Data>C:\Windows\system32\KERNELBASE.dll</Data>
    <Data>4459c209-da6b-11e5-80d8-fc15b41e47f0</Data>
    <Data>
    </Data>
    <Data>
    </Data>
  </EventData>
</Event>

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          -
Event ID:      1146
Task Category: Resource Control Manager
Level:         Critical
Keywords:      
User:          SYSTEM
Computer:      PASSIVE_NODE.domain.se
Description:
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1146</EventID>
    <Version>0</Version>
    <Level>1</Level>
    <Task>3</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:36:32.356702900Z" />
    <EventRecordID>1735312</EventRecordID>
    <Correlation />
    <Execution ProcessID="3292" ThreadID="7588" />
    <Channel>System</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="NodeName">PASSIVE_NODE</Data>
  </EventData>
</Event>

3Cp>关于这个,我试图没有运气的资源的最大失败价值:u003C/

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          -
Event ID:      1254
Task Category: Resource Control Manager
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      PASSIVE_NODE.ltdalarna.se
Description:
Clustered role 'SQL Server (MSSQLSERVER)' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1254</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>3</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:13:16.839580300Z" />
    <EventRecordID>1735228</EventRecordID>
    <Correlation />
    <Execution ProcessID="3292" ThreadID="7432" />
    <Channel>System</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="ResourceGroup">SQL Server (MSSQLSERVER)</Data>
  </EventData>
</Event>

3Cp>然后打开日志文件时的一堆错误。我试图为该文件夹添加权限,了解SQLServer资源在下面的情况下运行,仍然获得这些文件:u003C/

Log Name:      Application
Source:        ESENT
Date:          -
Event ID:      490
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
msmdsrv (5744) An attempt to open the file "C:\Windows\system32\LogFiles\Sum\Api.chk" for read / write access failed with system error 5 (0x00000005): "Access is denied. ".  The open file operation will fail with error -1032 (0xfffffbf8).
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESENT" />
    <EventID Qualifiers="0">490</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:33:49.000000000Z" />
    <EventRecordID>501908</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>msmdsrv</Data>
    <Data>5744</Data>
    <Data>
    </Data>
    <Data>C:\Windows\system32\LogFiles\Sum\Api.chk</Data>
    <Data>-1032 (0xfffffbf8)</Data>
    <Data>5 (0x00000005)</Data>
    <Data>Access is denied. </Data>
  </EventData>
</Event>

Log Name:      Application
Source:        ESENT
Date:          -
Event ID:      489
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
msmdsrv (5744) An attempt to open the file "C:\Windows\system32\LogFiles\Sum\Api.log" for read only access failed with system error 5 (0x00000005): "Access is denied. ".  The open file operation will fail with error -1032 (0xfffffbf8).
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESENT" />
    <EventID Qualifiers="0">489</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:33:59.000000000Z" />
    <EventRecordID>501909</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>msmdsrv</Data>
    <Data>5744</Data>
    <Data>
    </Data>
    <Data>C:\Windows\system32\LogFiles\Sum\Api.log</Data>
    <Data>-1032 (0xfffffbf8)</Data>
    <Data>5 (0x00000005)</Data>
    <Data>Access is denied. </Data>
  </EventData>
</Event>

Log Name:      Application
Source:        ESENT
Date:          2016-02-23 20:33:59
Event ID:      455
Task Category: Logging/Recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
msmdsrv (5744) Error -1032 (0xfffffbf8) occurred while opening logfile C:\Windows\system32\LogFiles\Sum\Api.log.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESENT" />
    <EventID Qualifiers="0">455</EventID>
    <Level>2</Level>
    <Task>3</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:33:59.000000000Z" />
    <EventRecordID>501910</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>msmdsrv</Data>
    <Data>5744</Data>
    <Data>
    </Data>
    <Data>C:\Windows\system32\LogFiles\Sum\Api.log</Data>
    <Data>-1032 (0xfffffbf8)</Data>
  </EventData>
</Event>

Log Name:      Application
Source:        ESENT
Date:          -
Event ID:      489
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
msmdsrv (5744) An attempt to open the file "C:\Windows\system32\LogFiles\Sum\Api.log" for read only access failed with system error 5 (0x00000005): "Access is denied. ".  The open file operation will fail with error -1032 (0xfffffbf8).
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESENT" />
    <EventID Qualifiers="0">489</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:34:09.000000000Z" />
    <EventRecordID>501911</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>msmdsrv</Data>
    <Data>5744</Data>
    <Data>
    </Data>
    <Data>C:\Windows\system32\LogFiles\Sum\Api.log</Data>
    <Data>-1032 (0xfffffbf8)</Data>
    <Data>5 (0x00000005)</Data>
    <Data>Access is denied. </Data>
  </EventData>
</Event>

Log Name:      Application
Source:        ESENT
Date:          -
Event ID:      455
Task Category: Logging/Recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PASSIVE_NODE.domain.se
Description:
msmdsrv (5744) Error -1032 (0xfffffbf8) occurred while opening logfile C:\Windows\system32\LogFiles\Sum\Api.log.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESENT" />
    <EventID Qualifiers="0">455</EventID>
    <Level>2</Level>
    <Task>3</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:34:09.000000000Z" />
    <EventRecordID>501912</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PASSIVE_NODE.domain.se</Computer>
    <Security />
  </System>
  <EventData>
    <Data>msmdsrv</Data>
    <Data>5744</Data>
    <Data>
    </Data>
    <Data>C:\Windows\system32\LogFiles\Sum\Api.log</Data>
    <Data>-1032 (0xfffffbf8)</Data>
  </EventData>
</Event>

3Cp>这些也出现了,但他们正在出现,无论服务包装如何安装u003C/

Log Name:      System
Source:        Microsoft-Windows-DistributedCOM
Date:          -
Event ID:      10016
Task Category: None
Level:         Error
Keywords:      Classic
User:          DOMAIN\SQL_AD_ACCOUNT
Computer:      ACTIVE_NODE.domain.se
Description:
The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID 
{FDC3723D-1588-4BA3-92D4-42C430735D7D}
 and APPID 
{83B33982-693D-4824-B42E-7196AE61BB05}
 to the user LTDALARNA\sys309 SID (S-1-5-21-910452376-877226765-825688854-92084) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-DistributedCOM" Guid="{1B562E86-B7AA-4131-BADC-B6F3A001407E}" EventSourceName="DCOM" />
    <EventID Qualifiers="0">10016</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8080000000000000</Keywords>
    <TimeCreated SystemTime="2016-02-23T19:40:01.178905000Z" />
    <EventRecordID>1806578</EventRecordID>
    <Correlation />
    <Execution ProcessID="976" ThreadID="19656" />
    <Channel>System</Channel>
    <Computer>ACTIVE_NODE.domain.se</Computer>
    <Security UserID="S-1-5-21-910452376-877226765-825688854-92084" />
  </System>
  <EventData>
    <Data Name="param1">application-specific</Data>
    <Data Name="param2">Local</Data>
    <Data Name="param3">Activation</Data>
    <Data Name="param4">{FDC3723D-1588-4BA3-92D4-42C430735D7D}</Data>
    <Data Name="param5">{83B33982-693D-4824-B42E-7196AE61BB05}</Data>
    <Data Name="param6">DOMAIN</Data>
    <Data Name="param7">sys309</Data>
    <Data Name="param8">S-1-5-21-910452376-877226765-825688854-92084</Data>
    <Data Name="param9">LocalHost (Using LRPC)</Data>
    <Data Name="param10">Unavailable</Data>
    <Data Name="param11">Unavailable</Data>
  </EventData>
</Event>

3Cp>我也通过Windows群集日志(Get-ClusterLog)看起来,并且找不到任何突出的东西。u003C/

3Cp>在带有100+个DB的2个服务器上有这个问题,它是否可以升级到长时间升级,并且Windows群集越来越不耐烦,认为它失败了?u003C/

3Cp>我看过这个艺术品:[3Ca href="https://blogs.msdn.microsoft.com/clustering/2013/01/24/understanding-how-failover-clustering-recovers-from-unresponsive-resources/]" rel="nofollow">https://blogs.msdn.microsoft.com/clustering/2013/01/24/01/24/01/24/01/24/unerstersing-how-failover-clustering-recovers-from-unresponsive-resources/“u003C/并试图加倍没有运气的死锁时间价值。u003C/

3Cp>任何想法的人?我在这里踩水。u003C/

有帮助吗?

解决方案

3Cp>我在很长一段时间后发现了这个问题。它是由于 MSSQL 日志文件夹中的100万+文件。u003C/

3Cp>设置清除该文件夹的作业后。 SP安装后的故障转移工作正常。u003C/

3Cp>在本2012集群上确认该解决方案,2008R2集群我们对此有同样的问题u003C/

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top