What techniques are there for debugging remote client side errors in a web application, especially when they only affect a small subset of users?
In my case we have an app that is working well for hundreds of users, internally and externally, but a handful (12) have a specific problem with a JavaScript that prevents them from using the site. We have screenshots of the error, have confirmed they have no server side errors, confirmed that everything is getting rendered to the browser correctly, have seen the specific error in the IE console, but still have no idea why it isn’t working for these specific users. The issue is exhibiting on different versions of IE. We have never been able to replicate the problem here.
I’m not looking for a solution to my problem here, but rather what are the steps you would take to solve this kind of problem, and what tools there may be that might help?
1
If you get really desperate, you can see if one of the users with this problem will allow you to remote desktop into their machine (using something like Logmein, or similar service. I know at least one of them has a free option). Then you can load your site, hit F12 in IE, and run the debugger to see whats going on. This of course would only work if they are using a version of IE that has an integrated debugger… I don’t know how far back that goes.
What you need is more “information”. You need to know from the web browser the execution flow is, what data it is receiving and so on. And you need to know this from both the working clients and the broken clients.
You can then compare the information between working and non-working and work out where you need more detailed information, and so on.
Eventually you will get the point where you have an ‘ah-ha’ and know what the problem is, and can work on a fix / workaround.
One possible strategy to manage all this “information” is to have the java scripts logging their progress and forwarding this to a splunk server. In the log information have an identifier that can be used to track the progress of each execution thread, and another that uniquely identifies the web session.
This kind of approach is often implemented at the server farm end to monitor and fix server side bugs when code is distributed over many servers, but as long as its not crippling the client connection, I see no reason why it could not be used to solve your issues too.
1
I’ll start by asking what version of IE is used. There are chances that in this version, there is a JavaScript error, but any other version runs the code fine. Remember that those JavaScript errors are tricky to test, since IE compatibility mode won’t help: on IE10 in IE7 compatibility mode, it’s still the IE10 JavaScript engine which runs.
Another problem may come from the specific configuration. An accessibility option. A blocked resource. It could be anything, which makes it particularly difficult to find. In order to get an idea, I would ask one of the users to be able to connect to his machine remotely in order to inspect the configuration by hand. Of course, this works well for internal products done for a small set of customers, but is out of the question for a public website.
Finally, changing the JavaScript code to trace as much information as possible (and submit it to the server) can also help. With enough hard data, A/B testing can be used in order to narrow the problem by making small variants to the code and check whether they solve the issue.