To quote the global website company Pingdom; “the internet is fragile”. That’s because it’s very complex with thousands of moving parts, most of which we can’t directly see. With this is in mind, we thought we’d make it a little bit easier to ‘get under the hood’ of your website by sharing our problem solving process.
The very first thing we do is ask a few basic questions about the problem. This isn’t an attempt to say why it’s happening or how to fix it; we try to keep focused on what symptoms are suggesting there is a problem. We ask:
Symptoms: Can you describe what’s happening; the symptoms that suggest the website isn’t working. As much detail as possible is helpful including the date, time, user and device (like iPhone or iMac). Screenshots or video are particularly helpful.
Time: How long has the problem been happening? How often does it happen? Does it happen at a particular time or event?
Changes: What has changed with the website (in the above time frame) and who made the changes? If I remember correctly; Alex Yellop was replaced by Lara Rhodes about 2 years ago. Then about 6 months later she moved on. Is this right? If it is, who’s been managing the website since May? Maybe we can help them with some advice or even just be a sounding board.
The initial consultation should give us an idea of where to look and what tests to run. We’ll look at all the data we can to try and identify the exact moments and surrounding circumstances the problem occurs. This will normally give us insight into what is causing the problem. For example:
Server Logs. These show how much strain the server is under. The volume of visitors and size of files (like photos and videos) or the complexity of actions (like complex search and filter) will all put the server under pressure. We’d be looking for the moments that the RAM, CPU or bandwidth were unusually high. The more resources you can give the server, the less likely it is to ‘fall over’ under pressure.
Google Analytics. Like server logs, the volume of traffic or behaviour of users can show if there are any moments or circumstances that correlate with when the symptoms are recorded.
Plugin Reports. Many of the plugins we use include reports that tell us when or if things aren’t working properly. We use plugins like Simple History, iThemes Security and WPMail to keep track of changes and subsequent symptoms.
Uptime Monitors. We also recommend using an uptime monitoring service like ManageWP or Pingdom to keep an eye on your website. It’s worth noting that there are two main functions here:
- Downtime: By ‘down’ we mean the website is not accessible. So these services will check to see if a site or page is available to view. They won’t look to see what that page is doing.
- Content: Uptime monitors can also check to see if certain content is showing. This helps you see when the website is still ‘up’ but the site isn’t functioning properly.
Manual Testing. It’s always worth someone else following the user journey to see if they can recreate or record the symptoms. Whilst this can be helpful, it can also be misleading. These problems can be caused by issues such as device, software, time or circumstance. In short, just because someone doesn’t see it, doesn’t mean there isn’t an issue.
Some notes on testing times:
- Averages. If you only test one period (a week or a month) it’s impossible to see if that period was good or bad. If you compare two periods, you won’t know if month 1 was really good or month 2 was really bad. So, you always need at least 3 periods to see what normal looks like. The longer the time period, the more accurate your analysis will be.
- Recording. It seems obvious but most reports only start recording information once you’ve switched them on. But, many people only want a report after a problem has happened. So, like server resources, don’t wait for a problem; get the best you can afford from the very beginning.
Diagnosis & Prescription
Hypotheses. By this point, we’ve normally come up with a few ideas of what is causing the problem. It’s tempting to assume that you’ve found the answer but until you’ve seen the problem solved, you can’t be sure. Some of our older staff can definitely scare you with stories where they applied a ‘guaranteed’ solution only to see the same problem pop back up again. So, we come up with various hypotheses (a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation).
Prescription. We’ll choose our most likely hypothesis and suggest a course of action. Unfortunately, like a GP’s prescription, it doesn’t come with any guarantees. All we can do is take the action we think will solve the issue. If it doesn’t work, it gives us more information to go on.
All this might seem like a lot; but sometimes these stages happen very rapidly e.g. in our initial consultation we find someone has uploaded a rogue plugin so we simply roll back to a back-up taken before that plugin was installed.
The other side of this coin is on websites we don’t manage. In this scenario many things have been changed, things aren’t being recorded or there aren’t back-ups being made. In this situation it can be hard to unpick all the possible problems, so it becomes a long and sometimes frustrating process. This is why we don’t offer to manage or repair sites that other developers have tinkered with. It’s a bit like taking a toaster back to the shop after a well meaning neighbour has taken it apart with a screwdriver.
We always hope that clients will stay with us to make all the amends and manage all their elements; but we know that’s not always going to happen. All we can do is be on hand to help whoever is managing the site now to find a solution.