This is not about software as such, but metrics have the same problems everywhere.
A package delivery company wanted to measure their service quality. They had a day out with all of their delivery personnel and discussed the topic of quality with them and presented the idea of the metrics.
Why is this good? The metric is not just dropped on people, and they are less afraid of it being a punishment mechanic.
They setup a system where they make a ‘test delivery’ each day, get feedback from that client. Rather than using normal people for it, they created a group of 30+, who all had experience in quality control field, trained them to look for specific areas for their feedback. They arranged it so, that the ‘test deliveries’ vary in client, size, shape, time, locations, etc.
Why is this good? They made a choice of having only valid input into their metrics, rather than huge number of questionable feedback, that they would have got if they had asked all of their clients to give them feedback. Additionally, this gives them good range of different deliveries, and ensures that it would be really hard to figure out which one is the ‘test delivery’. So the statistics cannot be gamed, easily.
The feedback was consisting of questions, which needed to be rated by number 1 to 5. Where number 5 was perfect score. And they made it mandatory to comment their rating in a comment box, for each question. These are not the ‘exact’ questions, but close enough:
*) Was the delivery on time and to right place?
*) Did the delivery person call beforehand in enough time?
*) Was the delivery person forthcoming and flexible?
*) How did the delivery person behave?
*) How did the delivery person look ok?
Why is this good? First of all, the comment box is most important. That puts a context to the rated number. A frame to the otherwise possibly misinterpreted information. Without this the whole metric would be worthless. Secondly, the questions are descriptive enough to differentiate it from other questions and not yes-no ones, so one would have to comment in order to give feedback. This way they can track the feedback not only in overall average, but also in certain areas. So,the low points and peaks would not be lost in the averaging.
Having set up the feedback system they actually did not set any goals to it, apart from saying that they would like to see the global average to be climbing in long-term perspective.
Why is this good? Big differences in expectations and actual results give oneself the urge to game the results, or change the measurement or something similar.
After getting the results from ‘test deliveries’ they would investigate the feedback. And when there has been a really bad results they would talk to the delivery person in question to get his side of the story. And then go on figuring out what to do next. There are no automatic systems tied to the feedback.
Why is this good? They are treating every instance separately, there could be 100s of reasons why there was a low score. But they also see the overall trends, and judge the situation.
They had it running for few weeks, and so far discovered problems that needed to be solved on the management side as well as some issues that needed general training for all delivery persons.
Job well done.
Actually, there are two points that I would like to comment:
- First, the sampling rate is a bit low. They make thousands of deliveries each day, so making just one ‘test delivery’ seems too few to give complete picture.
- Secondly, which is actually a very big problem in my view, they did not include a question:
Was the package delivered intact and with the right side up?