Automating product classification and data quality control
SWF’s technology provides retailers with digital assistance in their customer journeys, including features like ‘recipe-to-basket,’ ‘healthy substitutes within recipes,’ and ‘healthy food product search.’ These tools not only help consumers but also enable food retailers to inform, inspire, and recommend suitable food products to their customers, creating a more personalized shopping experience and making Food-As-A-Service a competitive advantage.
Situation & impact
Correct product classification of food items is a mission-critical process for SmartWithFood (SWF), and with the strong growth of the business and the data volumes that need to be processed, manual work by domain experts combined with a rule-based system wasn’t sufficient anymore for which they have been relying completely on a rule-based system and lots of manual work by domain experts (dieticians).
To scale the business and onboard new customers within an acceptable timing, SWF needed to automate the data ingestion process with AI. Faktion has enabled SWF’s product classification by leveraging its proven approach called ‘Intelligent Data Quality Optimization’ (IDQO) to build reliable and automated machine learning pipelines within a self-service autoML platform. Crucially, this platform needed to be operable by SWF’s own data specialists.
As the first version of these pipelines automatically and accurately processes 80% of all products at 98% precision, the platform allows the domain experts to focus their attention on validating predictions that are less reliable. As a result, SWF can now ingest and auto-tag ever-increasing volumes of data and has the tooling in place to constantly monitor data quality. The onboarding time of a new food retailer with a representative assortment has been decreased from 2 months to less than one week.
Just like most ambitious and successful scale-up companies, SmartWithFood (SWF)’s strategy is to scale internationally. In the case of SWF, scaling their business would mean onboarding new retailers while incorporating their product databases into the SWF Platform. This implies the data entry and data quality control of large amounts of product data (free text, images, structured nutritional data), which is a costly and labor-intensive task for dieticians. To illustrate, a human annotator can just process up to 250 products per day. SWF realized that manual annotation and human validation by specialists are bottlenecks in their scaling strategy.
To solve these bottlenecks, reliable automation of the specific process flows for data entry and data quality control is mission critical. With regard to this, ‘reliable automation’ means that all data entry that has been automated is correct at an acceptable accuracy rate. It’s better to automate 50% of the data with 99% accuracy than 99% automation with less accuracy. For example, unhealthy products sold in a supermarket would be automatically but falsely tagged as ‘healthy’ instead of ‘unhealthy’, making the SWF application useless.
As the business of SWF grew, they arrived at a stage where static rule-based expert systems combined with manual annotation were too costly to maintain. Reliable AI-driven automation of data entry and data quality control was the next objective. Incorporating this as an AI component within the platform and shifting towards an AI-capable organisation became inevitable. However, just like many other scale-ups with AI ambitions, SWF didn’t have the necessary AI capabilities (in-house knowledge and ML engineering experience) in the initial phase to create this challenging AI application. When Machine Learning models are deployed, they can become less accurate over time. A significant cause of declining accuracy is data drift, where the model's training data is no longer relevant to its application. In turn, the models evaluation is abstracted from decision-makers which creates functional qualitative blindspots.
Faktion has already built various successful AI products that automate business processes. The experience to tackle similar problems has been accumulated and is absolutely in line with the mission 'Productizing AI Applications & Building AI Capabilities.
Faktion enables organizations to scale value creation through AI embedded in custom made software. With a strong belief that scale is only possible when the AI applications can be operated by all business users, and not solely by the data scientists. This means a process of engineering and organizational change. Running an internal AI software product will require AI capabilities.
The main goal is to make customers fully autonomous, but with the recommendation of a pragmatic approach, step-by-step, gradually building those capabilities as the product and AI component mature. SWF reached out to Faktion for both the AI component (model development, model training, architecture, MLOps interface and project reporting) and guidance in the journey to become ‘AI capable’.
Solution & approach
Faktion started this project from the proven approach for ML-driven automation of manual data quality tasks called ‘Intelligent Data Quality Optimization’ (IDQO). The core application of IDQO is data enrichment via knowledge graphsthrough data ingestion combined with classification, which enables the validation and sorting of existing data. In the case of SWF, new products are classified into the right category and respective subcategories based on the product and ingredient features from the knowledge graph (free texts and structured nutritional data) combined with product image data.
Faktion's IDQO architecture of the automated solution for data ingestion and data quality control being customized and configured for SWF consists of the following four interconnected components:
Feature Store: features for products like images, descriptions, and metadata are managed by a feature store. This reduces computation and increases consistency.
Quality Control: automated quality control is performed on the feature store as a single source of truth. So when an error is found, it is fixed once.
AutoML: Machine Learning models are trained on the feature store using automatically learnt parameters. This means that ML expertise is not necessary to train models on new products with new labels, hence operable by SWF’s data specialists.
Knowledge Graph: when the ML models classify a new product, it is added to a knowledge graph which stores the data in context and so enriches model predictions.
When different expert models are trained on food data, like text, image, or nutritional values, their predictions are combined into an ensemble model which learns when to trust which model. This approach allows SWF to quantify the trust they can place in each prediction and use this score to decide which products are offhanded to human annotators. Finally, the most performant models have been deployed as a microservice with an eye on the future roadmap development of the SWF platform.
In order to empower SmartWithFood (SWF) to tackle their challenges, Faktion used platform engineering principles so that non ML-engineers can take ownership and re-train models when necessary. Retraining could be motivated by a fall in accuracy, a change in business strategy, or a move in industry sector. Furthermore, the platform alerts engineers to the current health of models to encourage retraining and safeguard predictions from unhealthy models.
The first version of this AutoML setup automatically and accurately processes about 80% of all products at 98% accuracy, allowing human annotators to focus on less straightforward products. The predictions that are not trusted by the platform are off-handed to domain experts, with different configurable thresholds depending on the importance of the respective (sub)category. All these manual annotations will result in increasingly better models, resulting in less human annotation needed over time. Future iterations will be able to process an ever-larger portion of the products as more product expertise is transferred from experts to AI. And finally, SWF’s customer onboarding time of a new food retailer with a representative assortment (similar product categorization) has been decreased from 2 months to less than one week.