Why do customers buy products seemingly irrelevant to their web and voice assistant searches? That’s a good question — and one a team of Amazon researchers sought to answer in a study scheduled to be presented at the upcoming ACM Web Search and Data Mining conference in February. In it, they say that their analyses — which looked at purchases and “engagements,” the latter of which was defined as interactions like sending search results to cell phones and adding products to shopping carts — suggests customers are partial to products that are broadly popular or cheaper than products relevant to a given search query. Additionally, they say that people are much more likely to buy or engage with irrelevant products in a few categories — such as toys and digital products — than in those in categories like beauty products and groceries.
“Product search algorithms, like the ones that help customers place orders through [our Alexa assistant], aim at returning the products that are most relevant to users’ queries, where relevance is usually interpreted as “anything that satisfies the users’ need,” wrote senior manager of applied research in the Alexa Shopping group Laine Lewin-Eytan in a blog post. “A common way to estimate customers’ satisfaction is to rely on the judgment of human annotators. (We annotate a very small fraction of 1% of interactions.)”
To this end, the researchers used statistical methods to identify customers who’ve issued either very short or unusually long queries, who they say tend to be more flexible in their purchasing decisions than those whose queries are of medium length. They also considered the relationships between relevant and irrelevant products, to the extent that two products have an indirect relationship if they are of the same type, brand, or category or if they tend to be purchased together.
Given two different measures of indirect relationship — one based on the meanings of descriptive terms and one based on purchase history — both correlated with increased likelihood of buying or engaging with seemingly irrelevant results, according to the researchers.
After performing the statistical analyses, a pair of experiments was conducted to assess the value of including irrelevant products in Amazon search results. First, the team identified 1,500 queries — each associated with one relevant and one irrelevant product — and then they considered the results of applying five different product selection strategies to all of them.
The first strategy — Optimal — always selected the product that led to the higher purchase level or engagement level, depending on which is being measured. (Here, the engagement or purchase level is the ratio of interactions that result in engagement or purchase actions to all the interactions in a data sample.) The Relevant strategy always returned the relevant product, while Irrelevant always returned the irrelevant product; Random arbitrarily selected between the two; and Worst always returned the product that led to the lower purchase or engagement level.
Perhaps unsurprisingly, the researchers report a “significant” gap between both the engagement and purchase levels achieved by selecting only relevant results and the optimal levels, which involve purchase and engagement with irrelevant results.
In a separate test, the team used the same 1,500 queries to train three different machine learning models: one taught to maximize relevance, the second to maximize purchase level, and the third to maximize engagement level. Then they built two so-called fusion models — one that combined the relevance model and the engagement model and one that combined the relevance model and the purchase model — and compared their overall performance.
There was a trade-off between relevance and purchase or engagement level, the researchers report — improving performance on one criterion affected performance on the other. That’s likely because if the results don’t satisfy a customer’s needs but appear to be relevant, the customer might understand and possibly excuse it, and because purchase and engagement levels capture a more subjective type of relevance than human annotations can communicate.
“The models we used to assess the trade-off between relevance and purchase/engagement level were fairly crude,” wrote Lewin-Eytan. “A more complex machine learning model should be able to achieve better results, particularly if it is explicitly trained to consider some of the factors we identified previously, such as query length, price, and indirect relation. While still preliminary, our results provide new insights on how to design product search algorithms and suggest that both objective relevance and purchase/engagement factors should be considered in returning results to customers.”