Fuzzy logic and customer comments

30 May, 2025

This post is part of a series for #WeblogPoMo2025. Read the introduction here.

Consider the following quote from someone who filled in a survey about their visit to an opticians:

My 2 new pairs of glasses were ready to pick up in less than 2 weeks! Great job!

If I asked you, based on the above, how long it took for the glasses to arrive as a whole number of days, as a specific integer, what would your answer be?

I spend my days looking after an amazing team of people who analyse the results from surveys and reviews¹.

One of the things this industry puzzles over is how to look at customer comments. They're often the richest source of information but parsing them is tricky.

The agreed best approach to date has been to develop all sorts of rulesets so we can quantify the topics within the comment: add a flag to the data if someone mentions their baggage being lost, their car still having faults after multiple visits, their experience of staff being really friendly.

These rulesets are scarily complex. They cover every topic you can imagine and are finely tuned for specific industries and languages and countries.

For example: if someone mentions "gaming" in France it's likely they're talking about video games, but in the Middle East it tends to mean the forbidden activity of gambling. If someone mentions 'service' after the word 'car' within a certain number of words, it's likely they mean they took their car for a service rather than them talking about the service they experienced.

Likely. Not always.

A good customer feedback programme² will get thousands of meaningful responses. Running some regular expressions against those thousands of comments will produce a line on a table for each comment saying whether, according to the ruleset, a predefined topic like baggage handling, continuing car problems, or friendly staff is mentioned.

It can't handle ambiguity very well. A motorist saying "the service was lousy" could be referring to the car servicing or the customer service. Programmatic flagging will decide if it is one or the other according to the rules.

These taxonomies also can't look for new topics unless we write some syntax to look for that topic in the right way. If want to see whether respondents are talking about the cost of living crisis we know that no-one will write "I am experiencing the cost of living crisis" so we have to write syntax around the topic. Syntax writing is hard and it's so easy to create rules with your own biases that result in some crucial comments or context being missed.

And syntaxes are very poor at getting quantified values because humans are very poor at stating discrete numbers. My glasses took less than two weeks - is that ten days, or twelve, or something else? Next day would be less than two weeks so a value of one technically holds true but would be grossly misleading.

Ultimately, for something really sensitive, my team often has to resort to reading comments. Hundreds, sometimes thousands. We read and look for things that a traditional yes/no structure wouldn't grasp. We make best-effort assumptions: less than two weeks is probably about 12 days, less than a month is probably more than three weeks but fewer than four so probably about 25 days except this person is talking about the time from ordering through to getting a complaint resolution so it's not really about delivery time at all.

We employ fuzzy logic, and we show our working.

And, at worst, this can take days of manual, tedious, mind-numbing effort when our limited hours are often better spent elsewhere.

I am sceptical of AI.

Over the course of this month I've shown how the conclusions it comes to can be totally wrong and that it is built on plagiarism. Both of those are difficult to stomach and the plagiarism in particular is infuriating in its brazenness and hypocrisy.

It would be easy for me to write off generative AI because of those things and I understand why others will reach that conclusion.

I try to be pragmatic. Nothing is ever a universal good or a universal evil. Can the flaws of confident incorrectness and rampant unchecked plagiarism be balanced out by something positive?

I've encouraged my team to experiment a little with generative AI to see if we can do more than simply get it to summarise text.

We're finding it to be AI is really helpful with customer comment data.

We can write a little script fairly easily that takes a comment, wraps it up with a prompt asking a generative AI model to see if the comment is about a topic or mentions a delivery timeframe or mentions something new that hasn't been seen in other comments before, ask it to prove whatever outcome it generates, and then output that information in a data file.

Something like:

"Ok computer - is this person talking about the cost of living crisis? If so, extract the exact phrases they use, summarise their experience in a few words, and output in a standard format we can put into a CSV."

And the output often comes with caveats:

"I've extracted this term from the comment but it is uncertain as to whether this person is talking about the cost of living crisis. It might be, so I've included it here."

The results are remarkable.

We're still experimenting and trying to determine the models and prompts that work best depending on what we're trying to achieve. We're also comparing the AI output to human output and there's a very high level of similarity which is only getting better as we refine our approach.

This isn't the destruction of a load of team members' jobs. This is generative AI removing the need for us to do a lot of manual work because the previous computer-aided tool was not good enough.

For us, for the textual analysis we do, AI's ambiguity and fuzzy logic is a huge advantage.

Is the idea of using generative AI as an assistant for offloading arduous tasks enough of an advantage to balance out the plagiarism?

I remain undecided. Sometimes I'm a firm yes, sometimes a no, at most other times in-between.

Fuzzy logic.

One day I will write a long and tedious post about the 0-10 "how likely are you to recommend" Net Promoter Score and how I intensely dislike how it is used and abused. Demand for such a long and tedious post is low so it hasn't happened yet. But one day.↩
A good programme doesn't ask every customer at every opportunity. A good programme respects a respondent's time and effort. A good programme is reciprocal. Few companies follow these simple rules, so good programmes are rare.↩

#AI #WeblogPoMo2025 #main #technology