Monday, September 28, 2015

Response Groups Stop Responding

Our company (Event Zero - makers of the best Skype for Business analytics software out there BTW) relies on Skype for Business response groups for our sales and support queues.  Last week, I noticed that every single one of them on two separate pools were no longer accepting calls.  It wasn't a SIP trunk or mediation issue, because I couldn't get to them by directly entering in their SIP address in the Skype for Business client.  They appeared available (green presence) but would not accept calls. Snooper logs showed they were throwing 480 Temporarily Unavailable errors.

It was especially odd that it happened on two separate S4B pools at roughly the same time.  I tried numerous things, from restarting the RGS service on the affected servers to restarting the servers.  So, yeah, not a lot of tools in my RGS troubleshooting arsenal apparently.

What I found DID work, was to change the Tel URI of one of the workflows to a slightly different number, then changing it back.  Within a few minutes, that particular workflow started working again. 

Rather than doing the same thing to all 20-odd response groups (which would take a LOOONG time because the RGS Workflow web page is so slow), I created a Powershell script to do the same thing.

WARNING: Use script at your own risk. It worked fine for me, but hey I'm not a programmer. Also, this script will use the Description field of the workflow to store the original Tel URI of the response group.  Whatever is there now will get wiped out. You can do this script in other ways, but I was strapped for time and we weren't using the Description field for anything.
$Workflows = Get-CsRgsWorkflow 
Foreach ($WF in $Workflows)
{
 Write-Host "Adding dummy extension to " $WF.Name
 $WF.Description = $WF.LineURI
 $WF.LineURI = $WF.LineURI + ";ext=0"
 Set-CSRgsWorkflow -Instance $WF
}
Foreach ($WF in $Workflows)
{
 Write-Host "Reverting back to original number for " $WF.Name
 $WF.LineURI = $WF.Description
 Set-CSRgsWorkflow -Instance $WF
}

I don't know what caused the problem to start with, but at least this fixed it. Presumably, something happened to "break" the connection between RGS and the associated contact objects, and resetting the Tel URI re-linked them.

Hopefully, this might help others who come across the same thing. If anybody has any insight into why it broke, and why this fix worked, please enlighten me!