Anyone familiar with deploying Lync knows there are a lot of firewall requirements. There are plenty of great articles detailing the port requirements, so I won’t get into that. But what happens when you have all the ports open, and you still run into issues. You’ve verified with the firewall guys that the configurations are correct, you’ve tested the ports with Telnet or PortQry and everything appears to be configured as defined by Microsoft’s requirements.
I ran into a similar situation recently at a customer site, which proved to be a great learning experience.
In this article we will look at such an example where the environment has an application layer firewall (unbeknownst to the Lync admin, but knownst to us. Bonus points if you know the movie reference), how this impacts Lync, and how we can troubleshoot it.
An application layer firewall has the ability to inspect network traffic up to the application layer (Layer 7) of the OSI model. Where a traditional stateful firewall only inspects traffic up to the transport layer (Layer 4) of the OSI model.
In this example, the following assumptions are made:
- A single Standard Edition is deployed
- A single Edge Server is deployed
- There are multiple firewalls on the network. One separating the internet from the DMZ, and one separating the DMZ from the internal network
- All ports as required and defined are open: http://technet.microsoft.com/en-us/library/gg425891.aspx
- Alice is a user connected to the external network via the Lync Edge. No VPN.
- Bob is on the internal network
Users are reporting issues with Lync desktop sharing saying they receive the following error message: “Sharing failed to connect due to network issues.” The problem appears to be intermittent, sometimes it works while other times it fails.
Upon further investigation, you notice that desktop sharing works when both users are internal. You also notice it works when both users are external. However, it is not working when one user is internal and the other user is external. It seems to be impacting application sharing and file transfers, but not IM/P and Voice.
Being the great Lync admins that we are, we get out Snooper and OCSLogger. We can use these to look at SIP logs on both the client (UCCAPI log) and server (SIPStack log). In the diagnostic logs we see a SIP BYE packet, with the following diagnostic error: (We can also pull this from the monitoring server if deployed)
Table: SIP BYE Diagnostic
In this diagnostic message we see several things of importance:
So where do we turn to start looking at ICEWarning errors? How about Chapter 9 of the Lync Server 2010 Resource Kit. Inside we find Table 2: ICE Protocol Warning Flags. From the results we find that 0x8xxxxx typically refers to an issue communicating with a TURN server.
Next we see that the LocalMR=10.0.0.100. This is the local media relay, aka TURN server, that the client should use. The error message indicates a failure when communicating with the TURN server.
To verify that the Lync client is pulling the correct Media Relay information, we filter the client UCCAPI logs to look at the media relay authentication service (MRAS) details. Below, we can see that the Edge pool is listening on ports UDP 3478 and TCP 443 for media relay. The hostname equals the FQDN of the Edge Pool. Quickly validate that the client can resolve the FQDN to the internal IP of the Edge Pool to eliminate a DNS issue.
Knowing that the client is successfully getting the correct media relay (TURN) server information, we should start looking at the SIP session details relating to media relay.
To do this, we look in the original SIP Invite for the Application Sharing request and we see the below SDP candidate list. This is the candidate list that the internal client is sending to the external client. This should include all possible connection points, including the local host IP as well as the Edge servers media relay IP. What we notice here is that it only includes the clients local host IP. It does not include a Media Relay candidate. Which is odd, because we can see clearly in MRAS that the client gets a successful response back from the EdgePool with Media Relay information.
Table: App Sharing SDP
IM/P and Voice calls are working without any issue. Why isn’t app sharing and file transfer?
Let’s compare the SDP logs for a voice call as compared to that of the app sharing SDP logs from above.
In this candidate list we see both the local host IP, as well as the Media Relay IP of the Edge. What we also notice is that voice is using UDP, not TCP as the app sharing request was.
So what do we know at this point?
- Lync IM/P work internal and external
- Lync Voice works internal and external
- App sharing works between two internal users
- App sharing works between two external users
- App sharing does not work when one user is internal and one is external
- Lync Voice is using UDP
- App Sharing is using TCP
The fact that Voice is using UDP and only includes UDP information in the SDP is throwing a few red flags. Why don’t we see any TCP candidates? To answer this, we turn to our good friend Netmon to see what exactly is occurring at the transport layer.
For a good reference for SDP and ICE negotiations in Lync, I suggest you check out the following link: How Communicator Uses SDP and ICE To Establish a Media Channel.
In the picture below, copied from Mr. Ott’s TechNet article, we can see the SDP discovery process for both TCP and UDP.
Table: Copied from How Communicator Uses SDP and ICE to Establish a Media Channel
During the TCP connection test, a TLS Handshake is completed and then TURN is negotiated. Where with UDP, there is no TLS requirement and TURN is immediately negotiated.
We will want to start Netmon traces on both the internal client and the Lync Edge server, and then initiate a desktop sharing session.
First we will check Netmon for the TCP connection test. For this, we want to filter requests including the client IP, the Edge internal interface IP, and TCP port 443. We will expect to see the TLS Handshake negotiation, as well as the TURN negotiation.
Below we can see the filtered Netmon traces from the client. The first packet we see is the TLS Handshake request from the client. What we don’t see is a TLS Handshake response back from the Edge. We see packets that appear to be from the Edge servers private IP. Yet we do not see any TURN negotiation packets.
“ipv4.address==192.168.100.50 and ipv4.address==10.0.0.100, and tcp.port==443”
Table: Netmon Client Results (App Sharing)
A similar filter on the Netmon capture for the Edge server results in zero packets. Indicating the client traffic is never making it to the Edge. Even though the client traces show responses back from the Edge, the Edge server does not show these packets.
Table: Netmon Edge Results (App Sharing)
So we know the client is trying to send the SDP discovery, but the Edge is not receiving the packets, and therefore never responding.
If we look at the same Netmon logs for the UDP Connection Test, we see the communication on both the client and the server. We see TURN negotiation on UDP 3478.
Earlier we validated that the ports were open and listening via Telnet and PortQry. While we are running Netmon, let’s go ahead and initiate another Telnet and PortQry test to validate the ports are still open and listening, and see if we capture the traffic.
Table: Netmon Telnet Capture Client
Table: Netmon Telnet Capture Edge
Sure enough, we can see here that the traffic is captured on both the client and the server, validating traffic is successfully making it through the firewall on port 443. What is different between the Telnet and the SDP negotiation is that Telnet simply connects to the port, where the SDP connection test starts with a TLS Handshake to initiate the encryption process.
It still does not seam to be a firewall port issue, as all tests show the port to be open. But the traffic is still not making it to the Edge based on our NetMon queries. Further when we look at our NetMon traces, we appear to be getting a response back from the Edge.
We know Lync voice is working over UDP, and App Sharing is not over TCP 443.
What other traffic uses port TCP 443?
Secure web traffic uses TCP 443. In many environments, customers deploy forward proxies for web traffic. These can be used for many reasons, but a common reason is to filter all web traffic through a single point for filtering purposes. In this way companies can filter the types of web searches employees are able to perform.
Is it possible that all port TCP 443 traffic is being funneled through a proxy?
What happens if we fire up a web browser and try to hit the Lync Edge Pool internal FQDN via https://edgepool.silbers.net? While the Edge server listens on port 443, it does not use this port for displaying content in a web browser. So we should not get anything displayed in the browser. However, when we launch the web browser, to our surprise we get a web page displayed saying we must authenticate to access the web page we are trying to search. Sure enough, the firewall is acting as a forward proxy, and inspecting layer 7 traffic. All outbound traffic on port TCP 80 and TCP 443 are being forwarded through the proxy for inspection. Which results in the SDP negotiation to fail.
In this environment, the firewall between the internal user and the Edge server, was acting as both a stateful firewall, and an application layer firewall. Inspecting traffic at both layer 4 and layer 7.
The stateful firewall was correctly configured to allow ports UDP 3478 and TCP 443; however, the application layer firewall was filtering web traffic on port TCP 443 via its forward proxy feature set. Telnet and PortQry succeed because they are simply connecting to a port. They are not sending any SSL traffic, and therefore the application firewall was not forwarding the traffic via the proxy. When Lync tries the SDP negotiation, it sends a TLS Handshake request. The application firewall sees this and forwards it to the proxy for inspection. Therefore we never see any SDP or TURN traffic actually hit the Edge server. We only see the telnet and PortQry traffic.
To resolve this issue, we work with the networking team to disable the application and proxy filtering on the firewall for traffic destined to the Lync Edge servers.
Once application filtering is disabled on the firewall, we test Desktop Sharing again with an external user and validate everything now works as expected.
Further, we can look at the SIP traffic via snooper, and now see the media relay included in the SDP candidate list.