Skype Front End not starting on dual-homed VM

TL;DR: When dual-homing, make sure both your NICs have the same link speed (1Gb, 10Gb). VMware’s E1000E is 1Gb, and VMXNET3 is 10Gb. Automatic metrics will prefer the 10Gb and that may cause the issue below….

After a power loss over the weekend, two Skype for Business Front-Ends were restarted and the RTCSRV service failed to start. A bit about these machines that’s relevant to the issue:

  • Running Widnows Server 2012 R2 as VMware ESXi 5.5 guests.
  • Collocated Mediation service.
  • Dual-homed, with Data network as default gateway, and Voice network to talk to a Sonus SBC, with service usage limited to the specified addresses for Primary and PSTN in the topology.

Certificate stores were in good order, so KB2795828 did not apply.

Event ID’s seen in the log were LS User Services 32178:

Failed to sync data for Routing group {0FCDD1FD-39AF-502A-AECA-E702A5E8FC55} from backup store.
Cause: This may indicate a problem with connectivity to backup database or some unknown product issue.
Ensure that connectivity to backup database is proper. If the error persists, please contact product support with server traces.

LS User Services 30988:

Sending HTTP request failed. Server functionality will be affected if messages are failing consistently.

Sending the message to failed. IP Address is IPOFVOICENIC. Error code is 0x2EFD. Content-Type is application/replication+xml. Http Error Code is 0x0.
Cause: Network connectivity issues or an incorrectly configured certificate on the destination server. Check the eventlog description for more information.
Check the destination server to see that it is listening on the same URI and it has certificate configured for MTLS. Other reasons might be network connectivity issues between the two servers.

and User Services 32174:

Server startup is being delayed because fabric pool manager has not finished initial placement of users.

Currently waiting for routing group: {EF5151C7-B5E1-53B8-9F61-0CC90C82B9F6}.
Number of groups potentially not yet placed: 9.
Total number of groups: 9.


The issue ended up being different Adapter Type in VMware for both NICs. The primary NIC was set to E1000E, so 1Gb/s Max, and the Voice NIC, which was added after the server was deployed, was set to VMXNET 3, which runs at 10Gb/s regardless of uplink bandwidth from the host.

Turns out the Windows Automatic Metric was messing up interface preference here because it was setting the 10Gb/s NIC with an automatic lower metric.

Manually setting a lower metric for the Primary NIC and rebooting the server resolved the issue.

Is your Lync/SfB starved for memory?

Let’s say it was totally underprovisioned at some point. Just bumping up the RAM won’t give you the performance you expect. Here’s why:

You deploy a Server 2012 R2 template with 2GB RAM using your favorite hypervisor and just roll with the Skype for Business or Lync deployment without even thinking about it. Or… let’s say you ask for a VM to be provisioned so you can roll out SfB, and it’s underprovisioned from the start, but changing the resources would take too long so you go ahead with the deployment anyway and just wait for resources to be added later on. No time wasted. How many times has that happened? Plenty to me…

Down the road, whether it’s a reactive need for more memory, or you just realized the VM’s were completely underprovisioned and not up to those 32GB RAM Microsoft really asks for… What then? Just bump up the RAM, right?

Not quite…

Do that, and your SQL instances RTCLOCAL and LYNCLOCAL will just daydream about those sweet 32GB you allocated… Let’s take a look at a VM with only 4GB on it:

2016-06-28 13_28_33 2016-06-28 13_28_06

Pretty sad, right? at no point in time can both SQL instances consume more than 941MB. What if you add RAM you say? The Minimum and Maximum Server Memory stay the exact same!!!

If you want to go ahead and change these values, you’re open to do so, but it’s not technically supported. Don’t care? then pick values for 6%-8% of your total RAM for LYNCLOCAL, and 12%-15% for RTCLOCAL. Care? Then:

  1. Open the Deployment Wizard, Install or Update, and run Step 1 again. Go grab a coffee while RTCLOCAL gets pimped out with more RAM.
  2. Then run Step 2 again. If done with coffee, get a new one while LYNCLOCAL gets a memory makeover.

You can verify the added memory now. Big difference. Big. HUGE!

2016-06-28 13_56_112016-06-28 13_55_56

7/28/2016 Edit: Looks like Tom Pacyk wrote a better post over two years ago, and also points out the SQL Express limit of 1GB per instance.

Skype for Business Server June 2016 CU

New features!

  • Video Based Screen Sharing (VBSS) in meetings, enables much more efficient screen sharing with fluid motion (not the 2fps we’re used to in meetings.
  • Multiple Emergency Numbers in a location policy, useful for universities that may have their own local emergency number in addition to 911
  • Busy Options like Busy on Busy and Voicemail on Busy.

Get it here:

T.38 Fax over IP call on Wireshark

Ever wondered what a proper T.38 Fax over IP (FoIP) transmission looks like running through Wireshark? Maybe you’re troubleshooting a call flow, or never seen a T.38 capture. Below I’ll try to explain the call flow and steps to look out for when troubleshooting T.38 calls. Here’s an Outbound FAX call originating from a FXS port in a Cisco CUBE, and going towards Flowroute.

  • Initial SIP INVITE and early media receipt (ringback). Note this is all RTP.
    2016-03-22 16_08_08
  • SDP from the INVITE shows media offered is all voice (RTP)
    2016-03-22 16_10_50
  • 183 Session in Progress, and we start sending media too (again, RTP). Later on comes the 200 OK, meaning the call was answered on the remote end.
    2016-03-22 16_09_50
  • Things changing now… in-dialog (RE)INVITE from Cisco CUBE to SIP trunk… RTP and T.38 packets mixed because the remote end has not accepted our INVITE yet, but we start sending media either way.
    2016-03-22 16_14_17
  • And the SDP of the new INVITE now shows all T.38 media now.
    2016-03-22 16_14_59
  • Once we get the 200 OK from Flowroute, it’s all T.38 media both ways.
    2016-03-22 16_15_44
  • Now the flow gets interesting, more Fax-ey. Wireshark will decode the HDLC data and show interesting bits here
    • TSI, is our Fax station number programmed in the machine.
      2016-03-22 16_17_35
    • DCS, our Fax machine communicates the capabilities, and starts training.
      2016-03-22 16_18_34
    • If we look inside the packet’s data, our DCS has a lot more information about our Fax machine’s settings and resolution
      2016-03-22 16_44_22
    • Then we get an FTT, means the remote end “Failed to Train”. Not usually a sign something is wrong, but more a capability mismatch. The remote fax may accept only lower baud rates, and will fail to train any higher. This is normal unless it’s the only response we get back from the remote end.
      2016-03-22 16_18_49
    • We see the same process of TSI, DCS and FTT until we hit the right baud rate… in our case it’s 9600… Once we get that, we receive a CFR.
      2016-03-22 16_20_19
    • Followed by a short training to sync-up and data (because we did long training before the CFR)
      2016-03-22 16_20_35
    • And our actual FAX data which will vary
      2016-03-22 16_21_21
    • At the end of the data, Wireshark reassembles the packets and tells us whether there was a loss or not. In our case, we’re good!
      2016-03-22 16_21_50
    • We send an EOP to signal the end of the transmission
      2016-03-22 16_22_21
    • The remote end does an MCF to acknowledge receipt (this is how your Fax machine knows the fax is “good” on the other end)
      2016-03-22 16_22_40
    • And then we send a DCN to logically hang up the HDLC stream, but we wait for the remote end…
      2016-03-22 16_24_42
    • Remote end hangs up the call… and we’re done…
      2016-03-22 16_25_02

And that was it. Many exchanges and training but in the end our page was sent over a SIP trunk, negotiating T.38, training with the remote fax machine at 9600 baud, and transmitting one page in about a minute.

Remote Wireshark capture for Sophos UTM over SSH

Sophos UTM v9 comes with the tcpdump utility, which lets you run packet captures from the shell. This is great and all, but in order to look at those pcaps with Wireshark, you need to pipe to a file, copy the file, then run Wireshark against it. Annoying. All of it.

What if we could remotely capture packets over an SSH tunnel? YES… turns out it’s a bit tricky if you’re on Windows, and the authentication piece to get root access without having to do the loginuser first. How? Keep reading…

First, the necessary ingredients:

  • Sophos UTM
  • Wireshark (or your favorite pcap application)
  • Putty suite (specifically Plink and PuttyGen)

To start, we’ll need to enable Shell Access, with public key authentication, and with Root access but only with SSH key.

2016-03-16 15_10_50

We need to use PuttyGen to generate the key pair we’ll use for root authentication, so open it, Generate the key, then copy the Public Key into the Authorized Keys for root in the UTM, apply and save… and also Save private key to somewhere you’ll remember. We’ll need this for Plink.

2016-03-16 15_10_08

There’s our new key…

2016-03-16 15_13_30

Then run the actual magic using Plink. Take the following command as an example:

plink -ssh -i C:\ssh-priv.ppk “tcpdump -s 0 -U -n -w – not port 22 and not host” | “C:\Program Files\Wireshark\Wireshark.exe” -k -i –

Replace the SSH connection string for your actual firewall FQDN, the filename of ssh-priv.ppk for the location of your saved Private Key generated with PuttyGen, and the not host with the IP address of the firewall from the interface you’re reaching it.

Wireshark will open and start showing packets. You can smile and jump now.

You can modify the tcpdump parameters to better match the capture, for example, using -i eth1 to capture a specific interface, or filter specific traffic… once you’re done, just close Wireshark and CTRL+C the command.

Note, if you’re doing this capture remotely over WAN or Internet, it will tunnel ALL packets over SSH, so it will take up a lot of bandwidth…

Have fun!!!

VMware Power Policy and CPU Ready latency

VMware’s Performance Best Practices mentions you should set power management in the BIOS to “OS Controlled Mode” or equivalent. This is so you can control power saving from the hypervisor itself. It’s very useful when you want to change these settings on the fly without having to reboot into the BIOS, similar to how Windows power profiles work.

But the “gotcha” here, which is also mentioned in the best practices documentation, is that the default Power Policy setting is set to Balanced, when you most likely want to set this to High Performance, as you’ll see later…

2016-03-01 10_39_33

This makes it pretty awful for latency-sensitive workloads. In Balanced, your CPU sometimes has to scale up or down in the power states before it can process an instruction, and this adds latency. The difference can be clearly seen by looking at the CPU Ready (RDY%) metric. Here’s the difference changing to High Performance made in a single vCPU VM:

2016-03-01 10_32_13

And here is a 4 vCPU VM running Exchange 2013

2016-03-01 10_32_34

My VM’s felt “snappier” after the change. It’s hard to avoid speaking subjectively here, but click-to-action felt quicker. Maybe it’s in my head, but I feel those charts tell a different story.

The effect the “Balanced” power savings has on CPU Ready times is clear as day, though it’s mentioned that Balanced has minimal to no impact on performance. I have yet to do benchmarks to show how CPU Ready% affects real workloads, but at the very least, CPU instruction latency from a guest VM is dramatically decreased, which benefits those real-time workloads like Lync, Skype for Business or VoIP.

VMware NSX Lab in a night = awesome

So… VMware’s NSX is super awesome! I’m one of those weird guys that find playing with networking and virtualization on a Monday night more fun and exciting than a weekend in Vegas. Ok, maybe not so much, but still somehow I managed to stay up past midnight deploying an NSX “Lab” just by messing with it. I say screw the guide, I learn better by just pressing buttons and breaking things… I’m not doing this for a client so what gives? Let’s poke…

After some fun I’ve gone from just knowing concepts of SDN to a fully usable network running on top of VMware NSX. It’s complete with:

  • Single 6.2 controller
  • VXLAN transport on a Force10 S60 with PIM and IGMP snooping enabled
    • Since I already had Distributed vSwitches, it was very easy to provision the transport
  • Multicast Transport Zone and segment ID
  • Single NSX Edge running OSPF connecting to the S60 core and redistributing connected networks
  • Single logical switch (for now)
  • Two VM’s on two different hosts to test connectivity
  • Smiles

Captured live flows while downloading a CentOS ISO from a mirror site just to test speeds.

Screen Shot 2016-02-23 at 12.35.01 AM

So far i’m very impressed with what NSX can do, and i’ve only scratched the surface. Think stretched networks over L3, per-VM firewall policies both at Layer 3 and Layer 2 levels, Logical routers between virtual switches, each with its own ACLs, HA edges, so many cool things!. Only 59 days left…

It’s almost 1am and I should really go to sleep now. Good night.

New Technical Diagrams for Skype for Business Server 2015

Released last week, new technical diagrams in Visio and PDF for Skype for Business workloads, Call Quality Methodology (CQM) and different hybrid scenarios.

SfB Protocol Workloads poster

Thumbnail for the CQM poster

Plan Voice Solution poster Thumbnail

Export UM Custom Prompts

My first attempt at a semi-useful PowerShell script. This script will export all UM prompts from all dial plans and auto attendants for Exchange 2010 and 2013. The output of the script is a collection of files with WAV extension on the running directory, matching the names of the custom prompts.

Future changes: Set a working folder, and organize the prompts into folders based on AA and DP name.

The output files are in WAV extension, but are actually MP3 files (Insert blame for Export-UMPrompt). There is another script coming that will convert these files from MP3 to 8Khz Mono PCM WAV files so they can be reused for other UM attendants.

Disclaimer: This is one of my first attempts at scripting, so the code may be completely unoptimized, slow, confusing or just plain ugly.

Hope this is useful!

Skype for Business for Android is now released

Skype for Business for Android is out of preview and finally released. Time to try it with the Grandstream GXV3275…