Valid8, detecting counterfeit and recommending authentic products
Machine Learning Engineer
Valid8 is a Singaporean startup working to prevent the circulation of counterfeit products, documents, or other items. They provide their clients with a set of tools to mark their product with special QR codes, which can be then verified with the Valid8's Web and smartphone apps. Although their first successful cases were related to verifying COVID-19 test certificates, Valid8 is also aiming at entering consumer goods markets, such as toiletries, wines, toys, and others. The resulting data would give Valid8 rich insights into consumer behavior allowing them to capitalize on this information. To convince prospective customers of the potential in this approach it became clear that a live demonstration of the end-to-end pipeline would be crucial. After consultations with business advisors and investors, Valid8 began searching for tech partners. At this point, they contacted our Singaporean office and, later on, the Belgian ML team.
The high-level task description was fairly straightforward: develop a prototype recommender engine for Valid8. After a user scans a particular item to verify its authenticity, the app will show them several potentially interesting products. However, translating this into specific requirements demanded the understanding of Valid8's business model and related constraints. Furthermore, we could not start right away with modeling and implementation: there was no training data, as Valid8 was still in the pre-launch phase. We addressed these challenges one by one.
Business case analysis
Together we decided to target the consumer goods use case for the prototype. Product manufacturers, also referred to as OEMs, become Valid8 clients and place QR codes on their packaging. Every time a consumer scans a code to verify a product's authenticity, they receive a list of recommended products. Even this simple setting leads to a number of far-reaching observations:
- Scanning a product represents only a weak signal with respect to the buying intent, yet a relatively strong signal w.r.t. consumer needs and preferences.
- It is highly likely that not every product may be recommended; for example, competitors' products perhaps should be excluded (P&G will not be happy if a consumer sees a Unilever shampoo recommended).
- Given the recent privacy trends, many consumers are unwilling to share detailed personal data, e.g. their demographic data or precise current location; thus we're effectively dealing with anonymous users.
- Furthermore, little information is available about the situational context, e.g. the product stock at the given shop.
- On the contrary, product descriptions are expected to be rich and accurate, as they originate from the ultimate source, OEMs themselves.
Modeling & data selection
Based on the business case analysis outlined above, we needed to choose a suitable variant of a recommender system that would support implicit feedback signals (cf. Observation 1 above) and allow using additional metadata features for users and products, in particular ones derived from product descriptions (cf. Observation 5). Furthermore, to address Observation 2, we decided to train a standalone model for each customer that would rule out mixing their data with the data of other customers. This makes model training efficiency an important consideration as well. As a result of this analysis, we settled on LightFM, a software package that implements a wide range of algorithms that support our requirements. Its efficiency is top-notch as well: looking ahead, for a dataset covering over 2 mln. products, 5 mln. users, and 13 mln. interactions, one training epoch takes only 30 seconds on a moderately sized Azure VM, which would certainly allow training multiple models every day.
The data was the last missing puzzle piece: we needed to find a publicly available dataset that matched Valid8's problem description, namely a large dataset of user interactions with consumer products that includes rich descriptions of said products. We ended up using the Amazon Reviews dataset, which we transformed in two ways:
- Users were assigning scores from 1 to 5 to reviewed products. This feedback signal is more fine-grained than Valid8's scans. Thus, we only kept the user-product pairs with scores of 4 & 5 and discarded the score values altogether, arriving at the desired binary data with 13+ mln. interactions as mentioned above.
- To emulate the data sharing restrictions, we selected a number of well-known brands from the dataset (e.g. such watch brands as Swatch, TAG Heuer, and Timex) and set their data aside, separate from each other. Thus, when a (virtual) user scans a Swatch product, they will only get other Swatch products as recommendations.
This allowed us to create a setting, which is highly relevant to Valid8's business case and easy to explain to external parties or pitch to potential customers.
The prototype needed to be integrated into the existing Valid8 infrastructure and its several user interfaces, such as the Web application and two smartphone apps for iOS & Android. These were being developed by a development team based in Singapore. In order to ensure a fast prototyping pace, the parties agreed to make the recommender an independent subsystem hosted and managed by Faktion; the apps would access the recommendations via a Web API.
To keep the hosting costs in check, we settled for a relatively simple approach:
- Models are trained on a short-lived Azure VM.
- After the training, top predictions for each combination of a user and a brand as well as fallback predictions for new or anonymous users are pre-computed on the same VM and stored in a compact format.
- The VM may be discarded afterward to avoid extra costs.
- The precomputed predictions and the API implementation based on FastAPI are compiled into a single Docker image.
- The image is then deployed on Faktion's own Kubernetes cluster and exposed under a static domain name.
All in all, we had the API up and running within 20 days from the kick-off meeting, including the search for a dataset! The application team had no troubles integrating with the API and after ironing out a few minor issues, Valid8 was ready to demonstrate the next version of their system, now augmented with a recommender engine, which led to securing the round of funding.
While certainly not being a state-of-the-art piece of AI-powered software, the recommender engine we designed and developed for Valid8 illustrates another important side of Faktion's expertise: end-to-end support to our customers in their AI needs. We do not restrict ourselves to purely technical tasks and prefer to go the extra mile to find the best solution that fits their needs.