A good friend and colleague, Chris, spends a lot of his time implementing our VoIP Management solutions. As I'm in Melbourne, Australia this week, I had a chance to sit down with Chris and talk about the difficulties in troubleshooting VoIP as seen while being on site with customers. We also discussed where our solutions help. I thought I could put together a list of the ideas we were throwing around that helps to troubleshoot call quality issues. However, when sitting down to write this I realised very quickly that listing 10 ways wasn't going to include everything possible to solve all your voice quality issues but it might get you on the right path. Hopefully, you will be able to add more! What I describe below however follows several of the steps we at NetIQ go through when having issues at customer sites that are impacting the ability to make good quality, VoIP-based phone calls:
1. Sometimes, you need to get creative
Sure, it might sound a bit odd to say this, but while AppManager for VoIP proves invaluable for receiving alerts and getting a large amount of data from various sources (e.g. call managers, network devices, simulated calls, call detail records), there are times when you will still need to sniff the network or do a lot of SNMP-walking and testing. There are plenty of free tools out there to keep on hand that can help here. Everything from Wireshark to Getif will be useful and should always be in your kit bag when you need to understand what's happened at the packet or communication protocol level. This is no different to the typical systems administrator using tools like Sysinternals to supplement the standard information available in a systems management tool.
2. Collaborate with your teams
The anecdote of the network team not talking to the infrastructure team not talking to the application team is still in effect in many cases. Diagnosis of a VoIP call will cause you to investigate and ask questions to administrators in different areas. These are your friends here, not enemies! Chances are, you don't have one tool that tells you everything that's happening in the environment so you will need to get supplemental information from other people and other tools. Am I being too naive here assuming you can't always do this? I know it can be difficult but if you ever want to get to the bottom of a major issue, sometimes, you need a little outside help :)
3. Get technical!
Traversing packets and understanding codecs, RTP streams, routing, and protocols will be necessary to understand what's going on. Don't be overwhelmed! :) Just remember though that to get to the root of the issue, it will typically involve some healthy technical banter, analysis, and study.
4. Look at the call routes
Sometimes there's a big difference between inbound and outbound routes. You may be using different devices along the way by using different paths. Monitoring the appropriate streams of data and what direction they are travelling will help you understand where data goes or where it comes from. Do your tracing and stand by what the data tells you when you approach others to ask for assistance.
5. Do your own correlation
Take a look at the timestamps of events. Correlate those to when other alerts are happening. What time are network changes being made - do those happen around the same time that issues arise. You will need to correlate this information appropriately if you want to get on the right track. Look at the events from just one device isn't going to help you much. You need to keep an eye on the big picture and see what's alerting across the board.
6. Investigate all parties
Who's involved in the organisation? Is there a WAN outsourced to a carrier? Is there a 3rd party system integrator who managed a portion of the network? Ask around, meet the right people, buy them a coffee, do what it takes to get on their good side and get an understanding of what's happening where. If you don't have access to what's in the cloud and you think your problem may lie somewhere in there, the only way you're going to get some useful information is by getting some help from someone who has the right access to information you may need. You don't need to be James Bond here, but you do need to bounce your theories and ideas off others who manage a portion of the network where your calls may traverse.
7. Re-create the problem
The first rule of tech support is... well, I can't answer that but I have feeling re-creating the issue is quite useful. Are there any patterns to the call issues? Do they happen at certain times (other than when the network is just generally busy)? Do they happen over specific networks or links? The better you can consistently re-create the issue, the better you are at determining where the problem may lie as there are less unknowns.
8. Triple-check QoS
That's right, not double-check, but triple-check everything about QoS. Where it's configured, how it's configured, why it's configured. This is one of the most important issues to check when using QoS on a large network. Just one minor misconfiguration with your QoS will cause problems in many areas. Don't take a verbal response for an answer. Get proof that the configuration is the way it is. It is not an underestimate in my experience to say that if QoS is being used, three quarters of the time, the problem lies in QoS configuration. Get some of our VoIP Quality endpoints out there and run some simulated tests to see what's going on!
9. Do not make any assumptions
I briefly mentioned this in point 8 but it's worth stressing again. Just because someone says they have configured a device doesn't necessarily provide the confidence that it really is configured the way you need it to be. Getting proof of network configuration or call manager configuration will be tricky but is vital. Many people may be involved in the management in all these devices and sometimes the processes are simply not in place for proper change management that allows for proper execution of changes. You need some form of proof that configurations are set and as expected so you can move to the next item to check. Don't let this bit drag on. Call quality issues are rarely non-critical.
10. Gain support, not frustration!
When approaching others to help you out by letting show you configurations of their part of the infrastructure, you need to gain their support in helping you solve the issue. Put yourself in their shoes. It may be easy for them to feel like they are being blamed for the problem. This is not the case. You need their support and you have to get them on board to help you by either working with you on troubleshooting the problem or providing the right details for you so you can either fix the issue or focus elsewhere. This part is admittedly more of a "soft" skill but can prove priceless when being pressured to find a resolution quickly.
Well, there you have it. A short list of things we feel in our experience helps. The tool is one part of the bag of tricks used to your advantage. The other is your network of colleagues who can assist you further. This list is definitely short of a few items I've missed so feel free to add those in the comments!

BFHWJEZBBP4A
Posted
Feb 01 2010, 04:55 PM
by
Haf Saba