Skip to content

NTLM Connection Timeout due to Domain Controller

Malcolm Stewart edited this page Aug 2, 2021 · 8 revisions

NTLM Connection Timeout due to Domain Controller

The Players

IP Address Computer Role
10.10.10.1 DC01
10.10.10.2 DC02
10.10.10.3 Client
10.10.10.4 SQL Server virtual IP address
10.10.10.5 SQL Server physical IP address

Symptom

Intermittently, the client application would get a login timeout error:

[Microsoft][SQL Server Native Client 11.0]Login timeout expired

Data Collection

We captured a network trace and ran it through the SQL Network Analyzer program.
A number of login timeout errors while collecting the network trace.

SQLNA Report Analysis

Trace was probably taken on this IP address: 10.10.10.4, MAC Addr 001DD8A7211B, (80%)
Trace was probably taken on this IP address: 10.10.10.5, MAC Addr 001DD8A7211B, (20%)

The network trace was taken on a machine with two IP addresses and the MAC address matches. The first address matches the SQL Server IP address:

    IP Address   HostName       Port  ServerPipe  Version      Files  Clients  Conversations  Kerb Conv  NTLM Conv  MARS Conv  non-TLS 1.2 Conv  Redirected Conv  Frames       Bytes  Resets  Retransmits  IsClustered
    -----------  -------------  ----  ----------  -----------  -----  -------  -------------  ---------  ---------  ---------  ----------------  ---------------  ------  ----------  ------  -----------  -----------
    10.10.10.4   SQLPROD01\v01  1433              13.0.17.122      0        6             77          0         37          0                 0                0  114366  95,275,362       6          354             

The server is a named instance on port 1433; most likely SQL Server is clustered and 10.10.10.4 is the cluster virtual IP address. Many of the conversations are using NTLM to authenticate the user.

There are two domain controllers visible in the network trace:

    IP Address  Files  Clients  Conversations  Kerb Conv  DNS Conv  LDAP Conv  MSRPC Conv  MSRPC Port  Frames    Bytes
    ----------  -----  -------  -------------  ---------  --------  ---------  ----------  ----------  ------  -------
    10.10.10.1      0        2             80          0        61          0           5       49673     448  104,027
    10.10.10.2      0        1             17          4         0          5           7       49673     292  111,353

There were a number of SQL Server conversations that resulted in a network reset:

The following conversations with SQL Server 10.10.10.4 on port 1433 were reset:

    NETMON Filter (Client conv.)                  Files  Reset Frame  Start Offset  End Offset         End Time  Frames   Duration  Who Reset  Flags  Keep-Alives  KA Timeout  Retransmits  Max RT
    --------------------------------------------  -----  -----------  ------------  ----------  ---------------  ------  ---------  ---------  -----  -----------  ----------  -----------  ------
    IPV4.Address==10.10.10.3 AND tcp.port==57714      0        12659     11.537988   32.585142  10:35:47.380 AM      18  21.047154  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57719      0        12872     12.639515   33.676661  10:35:48.472 AM      18  21.037146  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57726      0        13683     26.267277   35.293956  10:35:50.089 AM      18   9.026679  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57727      0        13830     27.376348   36.402662  10:35:51.198 AM      18   9.026314  Client     A.R..            0           0            0       0
    IPV4.Address==10.10.10.3 AND tcp.port==57722      0        13879     15.843832   36.871136  10:35:51.666 AM      19  21.027304  Client     A.R..            0           0            1       1
    IPV4.Address==10.10.10.3 AND tcp.port==57723      0        15859     23.251253   44.278744  10:35:59.074 AM      18  21.027491  Client     A.R..            0           0            0       0

    Distribution of RESET connections.

    81+|                                                                                                                                                      
    27+|                                                                                                                                                      
     9+|                                                                                                                                                      
     3+|                                                                                                                                                      
     1+|                   XXXX   X                                                                                                                           
       |---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|

The conversations all cluster together, indicating there was probably a systematic issue that resulted in multiple failures, and that it cleared up on its own.

There were a number of login failures:

The following conversations with SQL Server 10.10.10.4 on port 1433 timed out or were closed prior to completing the login process or had a login error:

    NETMON Filter (Client conv.)                  Files  Last Frame  Start Offset  End Offset         End Time  Frames   Duration  Login Progress                      Keep-Alives  Retransmits  DHE  NullCreds  LoginAck  Error
    --------------------------------------------  -----  ----------  ------------  ----------  ---------------  ------  ---------  ----------------------------------  -----------  -----------  ---  ---------  --------  -----
    IPV4.Address==10.10.10.3 AND tcp.port==57714      0       12659     11.537988   32.585142  10:35:47.380 AM      18  21.047154  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57719      0       12872     12.639515   33.676661  10:35:48.472 AM      18  21.037146  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57726      0       13683     26.267277   35.293956  10:35:50.089 AM      18   9.026679  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57727      0       13830     27.376348   36.402662  10:35:51.198 AM      18   9.026314  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57722      0       13879     15.843832   36.871136  10:35:51.666 AM      19  21.027304  S PL PR CH SH    CE AD NC NR                  0            1                  Late           
    IPV4.Address==10.10.10.3 AND tcp.port==57723      0       15859     23.251253   44.278744  10:35:59.074 AM      18  21.027491  S PL PR CH SH    CE AD NC NR                  0            0                  Late           
  • These are on the same connections that got reset.
  • The NC and NR entries in the Login Progress means the connection was using NTLM.
  • The LoginAck column shows "Late" for all entries, meaning that the SQL Server successfully logged the user in, but that it took a while and the client timed out the connection.
  • You can see the Duration for many of the connection attempts is 21 seconds, which is longer than the default of 15 seconds for the connection timeout value.

This is further confirmed in the Slow Login report:

The following conversations with SQL Server 10.10.10.4 on port 1433 took more than 2 seconds to login or error out:
Login progress durations are in milliseconds.

    NETMON Filter (Client conv.)                  Files  Start Offset  End Offset         End Time  Frames    Duration  AS  PL  PR  CH  SH  KE  CE  AD  SS  NC  NR     LA  ER  Keep-Alives  Retransmits
    --------------------------------------------  -----  ------------  ----------  ---------------  ------  ----------  --  --  --  --  --  --  --  --  --  --  --  -----  --  -----------  -----------
    IPV4.Address==10.10.10.3 AND tcp.port==57714      0     11.537988   32.585142  10:35:47.380 AM      18   21.047154   0   0   0   0   0       0   0       0   0  21042                0            0
    IPV4.Address==10.10.10.3 AND tcp.port==57719      0     12.639515   33.676661  10:35:48.472 AM      18   21.037146   0   0   0   0   0       0   0       0   0  21032                0            0
    IPV4.Address==10.10.10.3 AND tcp.port==57726      0     26.267277   35.293956  10:35:50.089 AM      18    9.026679   0   0   0   0   0       0   0       0   0   9022                0            0
    IPV4.Address==10.10.10.3 AND tcp.port==57727      0     27.376348   36.402662  10:35:51.198 AM      18    9.026314   0   0   0   0   0       0   1       1   2   9018                0            0
    IPV4.Address==10.10.10.3 AND tcp.port==57722      0     15.843832   36.871136  10:35:51.666 AM      19   21.027304   0   0   0   0   0       0   0       0   0  21021                0            1
    IPV4.Address==10.10.10.3 AND tcp.port==57723      0     23.251253   44.278744  10:35:59.074 AM      18   21.027491   0   0   0   0   0       1   0       0   0  21022                0            0
    IPV4.Address==10.10.10.3 AND tcp.port==57728      0     30.580409  236.074560  10:39:10.870 AM      65  205.494151   0   0   0   0   0       0   0       0   0   2998               12            0

The Login Ack column (LA) shows the long durations (in ms). All other parts of the login are without appreciable delay.

There also a number failed connections to DC01. They precede the end of the SQL Server logins by 12 seconds (End Time column).

The 12 seconds is due to how the SYN timeout works.

  • The first SYN packet is sent. If no response in 3 seconds, then
  • The second SYN packet is sent. If no response in 6 seconds, then
  • the third SYN packet is sent. If no response in 12 seconds, then
  • Assume the connection is bad.

What we see in the network trace is the duration of 9 seconds, the time between the first SYN packet and the last SYN packet (3 + 6). The end time is 12 seconds prior to the SQL Server Login Timeout because we still wait 12 seconds for a response even though this does not show in the network trace.

The following conversations with Domain Controller 10.10.10.1 failed to connect:

    DC port  NETMON Filter (Client conv.)                  Files  Last Frame  Start Offset  End Offset         End Time  Frames  Duration
    -------  --------------------------------------------  -----  ----------  ------------  ----------  ---------------  ------  --------
      49673  IPV4.Address==10.10.10.5 AND tcp.port==58135      0        7870     11.544119   20.573426  10:35:35.369 AM       3  9.029307
      49673  IPV4.Address==10.10.10.5 AND tcp.port==58136      0        8098     12.644529   21.667155  10:35:36.462 AM       3  9.022626
      49673  IPV4.Address==10.10.10.5 AND tcp.port==58139      0        8858     15.849071   24.854640  10:35:39.650 AM       3  9.005569
      49673  IPV4.Address==10.10.10.5 AND tcp.port==58140      0       12568     23.256426   32.264624  10:35:47.060 AM       3  9.008198

The traffic to the DC comes from SQL Server's physical IP address.

Network Trace Exploration

Filtering on a failing SQL Server conversation, we can see the details of the failed conversation.

Filter: IPV4.Address==10.10.10.3 AND tcp.port==57714

Frame Time Offset Source IP  Dest IP    Description
----- ----------- ---------- ---------- -----------------------------------------------------------------------------------
4862  11.5387390  10.10.10.3 10.10.10.4 TCP:Flags=CE....S., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291850407, Ack=
4863  11.5388180  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=.E.A..S., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5
4864  11.5392940  10.10.10.3 10.10.10.4 TCP:Flags=...A...., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291850408, Ack=
4865  11.5395120  10.10.10.3 10.10.10.4 TDS:Prelogin, Version = 7.4 (0x74000004), SPID = 0, PacketID = 1, Flags=...AP..., S
4866  11.5396560  10.10.10.4 10.10.10.3 TDS:Response, Version = 7.4 (0x74000004), SPID = 0, PacketID = 1, Flags=...AP..., S
4867  11.5403920  10.10.10.3 10.10.10.4 TLS:TLS Rec Layer-1 HandShake: Client Hello. {TLS:69, SSLVersionSelector:68, TDS:67
4868  11.5406890  10.10.10.4 10.10.10.3 TLS:TLS Rec Layer-1 HandShake: Server Hello.; TLS Rec Layer-2 Cipher Change Spec; T
4869  11.5412550  10.10.10.3 10.10.10.4 TLS:TLS Rec Layer-1 Cipher Change Spec; TLS Rec Layer-2 HandShake: Encrypted Handsh
4870  11.5418480  10.10.10.3 10.10.10.4 TDS:Data, Version = 7.4 (0x74000004), Reassembled Packet {TDS:67, TCP:66, IPv4:23}
4871  11.5418630  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5
4872  11.5421990  10.10.10.4 10.10.10.3 NLMP:NTLM CHALLENGE MESSAGE {TDS:67, TCP:66, IPv4:23}
4873  11.5428260  10.10.10.3 10.10.10.4 NLMP:NTLM AUTHENTICATE MESSAGE Version:NTLM v2, Domain: CONTOSO, User: admin.CONTOS
4879  11.5585300  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5

Everything above is a normal NTLM login.
The client only gives SQL Server 1 second to validate the login regardless of the Connection Timeout and closes
the connection in frame 5020. The Server OS ACKs the closure request in frame 5021.

5020  12.5301940  10.10.10.3 10.10.10.4 TCP:Flags=...A...F, SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291851642, Ack=
5021  12.5302260  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5

SQL Server responds 20 seconds later in frame 12657, a total of 21 seconds since the client sent the NTLM AUTENTICATE message.
12657 32.5851940  10.10.10.4 10.10.10.3 TDS:Response, Version = 7.4 (0x74000004), SPID = 108, PacketID = 1, Flags=...AP...,

The server OS responds to the close request.
12658 32.5852740  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...F, SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5

Since frame 12657 was not a close request, the client resets the connection.
12659 32.5858930  10.10.10.3 10.10.10.4 TCP:Flags=...A.R.., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291851643, Ack=

Filtering on the same conversation and adding the domain controller traffic, we can see more:

Filter: (IPV4.Address==10.10.10.3 AND tcp.port==57714) OR (IPV4.Address==10.10.10.5 AND IPV4.Address==10.10.10.1)

Frame Time Offset Source IP  Dest IP    Description
----- ----------- ---------- ---------- -----------------------------------------------------------------------------------
4862  11.5387390  10.10.10.3 10.10.10.4 TCP:Flags=CE....S., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291850407, Ack=
4863  11.5388180  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=.E.A..S., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5
4864  11.5392940  10.10.10.3 10.10.10.4 TCP:Flags=...A...., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291850408, Ack=
4865  11.5395120  10.10.10.3 10.10.10.4 TDS:Prelogin, Version = 7.4 (0x74000004), SPID = 0, PacketID = 1, Flags=...AP..., S
4866  11.5396560  10.10.10.4 10.10.10.3 TDS:Response, Version = 7.4 (0x74000004), SPID = 0, PacketID = 1, Flags=...AP..., S
4867  11.5403920  10.10.10.3 10.10.10.4 TLS:TLS Rec Layer-1 HandShake: Client Hello. {TLS:69, SSLVersionSelector:68, TDS:67
4868  11.5406890  10.10.10.4 10.10.10.3 TLS:TLS Rec Layer-1 HandShake: Server Hello.; TLS Rec Layer-2 Cipher Change Spec; T
4869  11.5412550  10.10.10.3 10.10.10.4 TLS:TLS Rec Layer-1 Cipher Change Spec; TLS Rec Layer-2 HandShake: Encrypted Handsh
4870  11.5418480  10.10.10.3 10.10.10.4 TDS:Data, Version = 7.4 (0x74000004), Reassembled Packet {TDS:67, TCP:66, IPv4:23}
4871  11.5418630  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5
4872  11.5421990  10.10.10.4 10.10.10.3 NLMP:NTLM CHALLENGE MESSAGE {TDS:67, TCP:66, IPv4:23}
4873  11.5428260  10.10.10.3 10.10.10.4 NLMP:NTLM AUTHENTICATE MESSAGE Version:NTLM v2, Domain: CONTOSO, User: admin.CONTOS

The SQL Server machine asks DNS for the address of the domain controller.
This uses the Node IP as the virtual IP is only for new connections to SQL Server and not for Windows making new connections to other machines.
4876  11.5434340  10.10.10.5 10.10.10.1 DNS:QueryId = 0x4103, QUERY (Standard query), Query  for DC01.contoso.com of type
4877  11.5444080  10.10.10.1 10.10.10.5 DNS:QueryId = 0x4103, QUERY (Standard query), Response - Success, 10.10.10.1  {DNS

Windows tries to connect to the domain controller.
4878  11.5448700  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58135, DstPort=49673, PayloadLen=0, Seq=

Meanwhile SQL acknowledge the receipt of the NTLM AUTHENTICATE packet. 
4879  11.5585300  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5

And the client times out the authentication request after 1 second. This is independent of the connection timeout.
5020  12.5301940  10.10.10.3 10.10.10.4 TCP:Flags=...A...F, SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291851642, Ack=

The SQL machine ACKs the close request, but does not send its own as SQL is busy.
5021  12.5302260  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...., SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5

The SQL machine tries to make another connection to the DC.
5053  12.6452800  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58136, DstPort=49673, PayloadLen=0, Seq=

Windows also queries the IP address of the SQL Server Virtual Name.
5096  13.2504300  10.10.10.5 10.10.10.1 DNS:QueryId = 0x22BE, QUERY (Standard query), Query  for SQLPROD01.contoso.com of t
5100  13.2509470  10.10.10.1 10.10.10.5 DNS:QueryId = 0x22BE, QUERY (Standard query), Response - Success, 10.10.10.4  {DNS
5104  13.2542320  10.10.10.5 10.10.10.1 DNS:QueryId = 0x8BDF, QUERY (Standard query), Query  for SQLPROD01.contoso.com of t
5105  13.2548270  10.10.10.1 10.10.10.5 DNS:QueryId = 0x8BDF, QUERY (Standard query), Response - Success, 10.10.10.4  {DNS

The connections to the domain controller are retried after 3 seconds.
5881  14.5585370  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #4878] [Bad CheckSum]Flags=CE....S., SrcPort=58135, DstPort=4967
6216  15.6522810  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #5053] [Bad CheckSum]Flags=CE....S., SrcPort=58136, DstPort=4967
6773  15.8498220  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58139, DstPort=49673, PayloadLen=0, Seq=
7499  18.8554100  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #6773] [Bad CheckSum]Flags=CE....S., SrcPort=58139, DstPort=4967
7524  19.0597230  10.10.10.5 10.10.10.1 DNS:QueryId = 0x9A6E, QUERY (Standard query), Query  for SQLPROD01.contoso.com of t
7525  19.0603900  10.10.10.1 10.10.10.5 DNS:QueryId = 0x9A6E, QUERY (Standard query), Response - Success, 10.10.10.4  {DNS

And they eventually time out as the domain controller does not respond.
7870  20.5741770  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #4878] [Bad CheckSum]Flags=......S., SrcPort=58135, DstPort=4967
8098  21.6679060  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #5053] [Bad CheckSum]Flags=......S., SrcPort=58136, DstPort=4967
8300  23.2563150  10.10.10.5 10.10.10.1 DNS:QueryId = 0xC6A7, QUERY (Standard query), Query  for DC01.contoso.com of type
8301  23.2569050  10.10.10.1 10.10.10.5 DNS:QueryId = 0xC6A7, QUERY (Standard query), Response - Success, 10.10.10.1  {DNS
8302  23.2571770  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58140, DstPort=49673, PayloadLen=0, Seq=
8388  23.9237310  10.10.10.5 10.10.10.1 DNS:QueryId = 0x4B6A, QUERY (Standard query), Query  for 0.0.0.0.0.0.0.0.0.0.0.0.0.
8389  23.9262680  10.10.10.1 10.10.10.5 DNS:QueryId = 0x4B6A, QUERY (Standard query), Response - Name Error  {DNS:116, UDP:
8858  24.8553910  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #6773] [Bad CheckSum]Flags=......S., SrcPort=58139, DstPort=4967
9168  26.2639400  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #8302] [Bad CheckSum]Flags=CE....S., SrcPort=58140, DstPort=4967
9181  26.2726480  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58141, DstPort=49673, PayloadLen=0, Seq=


9452  27.3856390  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58142, DstPort=49673, PayloadLen=0, Seq=
10961 29.2810660  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #9181] [Bad CheckSum]Flags=CE....S., SrcPort=58141, DstPort=4967
11243 30.1252850  10.10.10.5 10.10.10.1 DNS:QueryId = 0x882B, QUERY (Standard query), Query  for 3.0.0.0.1.0.0.0.0.0.0.0.0.
11244 30.1258340  10.10.10.1 10.10.10.5 DNS:QueryId = 0x882B, QUERY (Standard query), Response - Name Error  {DNS:141, UDP:
11259 30.3903980  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #9452] [Bad CheckSum]Flags=CE....S., SrcPort=58142, DstPort=4967
11416 30.5859220  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58143, DstPort=49673, PayloadLen=0, Seq=
12568 32.2653750  10.10.10.5 10.10.10.1 TCP:[SynReTransmit #8302] [Bad CheckSum]Flags=......S., SrcPort=58140, DstPort=4967

This connection attempt is eventually answered in frame 12642 and frame 12643 completes the TCP 3-way handshake to the domain controller.
12641 32.5783410  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58144, DstPort=DCE endpoint resolution(1
12642 32.5789740  10.10.10.1 10.10.10.5 TCP:Flags=.E.A..S., SrcPort=DCE endpoint resolution(135), DstPort=58144, PayloadLen
12643 32.5790220  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=...A...., SrcPort=58144, DstPort=DCE endpoint resolution(1

MSRPC is the protocol for authentication.
12644 32.5791030  10.10.10.5 10.10.10.1 MSRPC:c/o Bind: EPT(EPMP) UUID{E1AF8308-5D1F-11C9-91A4-08002B14A0FA}  Call=0x2  Ass
12645 32.5792520  10.10.10.1 10.10.10.5 TCP:Flags=...A...., SrcPort=DCE endpoint resolution(135), DstPort=58144, PayloadLen
12646 32.5798500  10.10.10.1 10.10.10.5 MSRPC:c/o Bind Ack: EPT(EPMP) UUID{E1AF8308-5D1F-11C9-91A4-08002B14A0FA} Call=0x2  
12647 32.5799070  10.10.10.5 10.10.10.1 EPM:Request: ept_map: NDR, Netlogon(NRPC) {12345678-1234-ABCD-EF00-01234567CFFB} v1
12648 32.5800180  10.10.10.1 10.10.10.5 TCP:Flags=...A...., SrcPort=DCE endpoint resolution(135), DstPort=58144, PayloadLen
12649 32.5806020  10.10.10.1 10.10.10.5 EPM:Response: ept_map: NDR, Netlogon(NRPC) {12345678-1234-ABCD-EF00-01234567CFFB} v

Another connection to the domain controller succeeds
12650 32.5809460  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=CE....S., SrcPort=58145, DstPort=49673, PayloadLen=0, Seq=
12651 32.5813610  10.10.10.1 10.10.10.5 TCP:Flags=.E.A..S., SrcPort=49673, DstPort=58145, PayloadLen=0, Seq=2046690876, Ack
12652 32.5813880  10.10.10.5 10.10.10.1 TCP: [Bad CheckSum]Flags=...A...., SrcPort=58145, DstPort=49673, PayloadLen=0, Seq=

And another MSRPC request to authenticate, which succeeds.
12653 32.5814620  10.10.10.5 10.10.10.1 MSRPC:c/o Bind: Netlogon(NRPC) UUID{12345678-1234-ABCD-EF00-01234567CFFB}  Call=0x2
12654 32.5817710  10.10.10.1 10.10.10.5 MSRPC:c/o Bind Ack: Netlogon(NRPC) UUID{12345678-1234-ABCD-EF00-01234567CFFB} Call=
12655 32.5819230  10.10.10.5 10.10.10.1 NRPC:NetrLogonSamLogonEx Request, *Encrypted* {MSRPC:168, TCP:167, IPv4:58}
12656 32.5839860  10.10.10.1 10.10.10.5 NRPC:NetrLogonSamLogonEx Response, *Encrypted* {MSRPC:168, TCP:167, IPv4:58}

Now, the user is validated at the domain controller and SQL responds with the LoginACK packet.
Unfortunately, the packet arrives after the client has initiated the connection close, so the connection is reset.
12657 32.5851940  10.10.10.4 10.10.10.3 TDS:Response, Version = 7.4 (0x74000004), SPID = 108, PacketID = 1, Flags=...AP...,
12658 32.5852740  10.10.10.4 10.10.10.3 TCP: [Bad CheckSum]Flags=...A...F, SrcPort=1433, DstPort=57714, PayloadLen=0, Seq=5
12659 32.5858930  10.10.10.3 10.10.10.4 TCP:Flags=...A.R.., SrcPort=57714, DstPort=1433, PayloadLen=0, Seq=4291851643, Ack=

Conclusion

Understanding of typical network patterns and timings can help identify where the issue is.

SQL Network Analyzer reports can help identify what are the useful conversations to look for in the network trace. They are fast and can give you an overview of the problem as to whether it is an isolated issue or part of a larger issue.

There is still the underlying issue of why the domain controller refused to respond for a period of time, but that is a story for another day.

Clone this wiki locally