Reliable Service = Happy Customers
Monitor your network
Your Network Monitoring System (NMS) will be your most valuable tool to keep your network up and running. The NMS will monitor each device on your network over time and trigger alerts if something goes wrong so you can fix it. Most of the billing and customer management software systems listed here include NMS features. The NMS gathers data on each device using ICMP (pings) and SNMP. Set up profiles in your NMS to monitor at least these statistics for each network device (including customer devices):
- Pings Send a ping to each device once every second. Monitor the round trip time (latency) and percent of pings lost. Also make sure to watch for short outages. An outage of 30 seconds may not show up as a high percentage of loss over the course of a day but will be very frustrating for the customer if they are streaming or on a conference call.
- Throughput Most network devices will report traffic counters on each interface which can be pulled via SNMP. Monitor the counters at least each minute to get a minute-by-minute average of throughput. This will be especially helpful when you are looking for links on your network that need to be upgraded.
- Receive Signal Level Wireless devices will report their receive signal level (RSL) via SNMP. If the receive signal level drops it may indicate that the device has moved and needs to be re-aimed.
- Ethernet errors especially watch for CRC errors. A few CRC errors here and there is fine but if the error count is climbing it indicates a problem with the Ethernet cable that should be addressed. CRC errors are only reported on the receive (rx) side of an Ethernet link so you need to watch for them on both sides of any Ethernet link.
- Uptime Most devices will report the time since they were last booted via SNMP. Monitoring this value will let you know if a device reboots unexpectedly.
Configure your NMS to alert you if any of these statistics cross an unacceptable threshold. Some alerts you will want to know about immediately at any time, such as if a main backhaul feeding many customers goes offline. Some alerts will be more informational, such as if a customer’s CPE device reboots. You can define these thresholds based on your needs and your customer’s expectations. Check out PagerDuty to manage on-call rotations, SMS alerts, and escalations.
Upgrade your network
Watch the usage graphs on your infrastructure devices so you can upgrade your network when necessary.
When the usage on a fiber or wireless backhaul begins to approach it’s capacity, get it upgraded as quickly as possible. If you are not able to get a backhaul with high enough capacity then re-arrange your network or buy a new fiber connection so the backhaul does not get overloaded. An overloaded backhaul will cause slow speeds for customers, especially during peak times, which will cause a poor customer experience.
An overloaded access point will also cause poor speeds, high latency, and timeouts, especially during peak times. It’s more difficult to determine when an access point is overloaded because the capacity can vary significantly based on how many customers are using the access point and what type of traffic they’re using. Sometimes you can only determine the access point to be the problem by elimination - test everything ahead of the access point first and make sure there are no problems upstream.
There are a few things you can do to add Access Point capacity:
- Add a duplicate of the overcrowded access point. Another access point facing the same direction and with the same beamwidth, then move some of the customers over to the new access point. You will need to have another full channel available.
- Split the access point. Replace the existing access point with two access point with half the beamwidth coverage - replace a 90˚ access point with two 45˚ access points, for example. This may be preferable because the new access points will have higher gain and may offer better channel re-use options.
- Add a new relay. If some of the customers are grouped together then try to get a relay that is closer to them. In general it is always better to have your access points as close to the customers as possible.
Some common network problems and things to check to resolve them:
- Power problems / devices rebooting / power outages
- Make sure your equipment is grounded well. Discuss with an electrician if you are not sure how to ground your equipment.
- Consider using a UPS if you are not already. Even if you are already using a UPS make sure that it is sufficient for the site and that you are monitoring it properly and keeping the batteries in good condition. More tips on using a UPS.
- Slow speeds / timeouts / intermittent service
- A device is overloaded. Start at the fiber connection and run a speed test then work your way toward the customer testing at each node to determine which device is the problem, then service or upgrade it.
- A wireless connection is not performing well. Could be low RSL, interference, or poor line of sight. Identify the connection causing the problem and resolve the issue.
- Ethernet problems. Look for CRC errors in your NMS and re-terminate or replace the offending cable. If you are not able to monitor CRC errors then watch for packet loss or high latency across an Ethernet cable and inspect the ends and length of the cable visually.
- Wireless interference - you could be taking interference on one of the channels your wireless devices are using. The interference could be coming from another party or from one of your own devices. Try changing to a cleaner channel. See RF and Channel Planning for more info.
Take steps ahead of time to keep your equipment functioning well in bad weather. Know the types of weather you’ll face in your area and plan accordingly:
Bugs / Critters
When you put a small, dark, warm enclosure out in an open area you are likely to find some critters take up residence there. Wasp nests are common in and around equipment boxes as well as all kinds of spiders and sometimes mice and snakes.
You can avoid the majority of personal safety problems by carrying wasp spray and wearing gloves while working (obviously take extra precautions if you have allergy risks!). You can keep animals out of your equipment box by sealing it very well and making sure to re-seal anytime you run new cable in to the box.
Heavy rain will cause the RSL on your wireless links to go down. This is called rain-fade. The heavier the rain and the longer the distance the more the link will be affected. Make sure your wireless links have enough RSL overhead that they won’t go down during a heavy rainstorm. Here’s some useful info on calculating rain-fade for US and Canada.
Also make sure that all devices you put outside are sealed. Use NEMA rated enclosures when installing equipment outside and make sure all cables are outdoor rated. Use sealant any time you penetrate a roof (see installing a customer) for more info about sealing roofs.)
Heavy winds will sometimes blow your CPE radios and backhaul antennas out of alignment. Use a network monitoring system to monitor all of your devices and make sure you are tracking the Receive Signal Level (or RSSI) over time. If a link starts to have problems after a windstorm you can look back at the Receive Signal Level over time and see if it dropped during the windstorm - if so go back and re-aim it.
Ice / Snow
Snow in the air will not cause as much of a problem as rain for wireless links. Snow or ice on your equipment is a problem, though. With a little bit of wind in the right direction the snow will stick right to a backhaul dish or CPE device and cause the link to degrade or drop. The devices generate enough heat to eventually melt the ice but it can take a long time. One way to avoid this problem is to use hydrophobic coating (check out the video below) on the devices. If you are in an area where snow will be a problem you can apply the coating to your most at-risk backhaul devices each fall to save yourself some trouble over the winter.