Cleanlab for E-Commerce and Retail
Ensure accurate information in your website, product listings, customer reviews, and internal data. Deploy more reliable ML Models and Analytics once you have more accurate information.
Case StudyPing An Insurance
Ping An Insurance used Cleanlab in an e-commerce application to: find 10% noise in their data labels, filter the detected bad data, and more robustly train their product classifier.
10%
reduction in label noise
If the classifier is trained with these noisy images directly, its performance could be degraded. In view of this, we attempted to find label errors in the image dataset with an open source tool cleanlab, a framework powered by the theory of confident learning. Specifically, we trained multiple ResNet50 image classifiers to compute the predicted product category probabilities for all the training samples in a cross-validation manner. Then the cleanlab tool could utilize the matrix of predicted probabilities to find noisy samples, ordered by likelihood of being an error. We removed the top 10% noisy samples from the training set.
Ping An Insurance is a Chinese holding conglomerate whose subsidiaries provide insurance, banking, asset management, financial, healthcare services.

HOW CLEANLAB CAN HELP YOUR BUSINESS
Better estimate the true quality of product from noisy reviews. In this example, Cleanlab Studio automatically found the given label to be incorrect and suggested the correct label of "5 stars".

Cleanlab Studio enables data-centric AI to build accurate ML models for messy real-world tabular or text data. You can effortlessly harness AutoML for various data types, including text, image, and tabular formats (Excel, CSV, Json), allowing you to focus on the most important aspect: the data. Learn more about Cleanlab:

Cleanlab Studio scans any image dataset for common real-world issues such as images which are blurry, under/over-exposed, oddly sized, or (near) duplicates of others, enabling you to produce high quality computer vision datasets. Learn more.

Videos on using Cleanlab Studio to find and fix incorrect labels for:
- product reviews (text data)
- product categories (image data)
- tabular data (e.g. numeric/categorical product metadata like price, rating, brand, etc.)
Detect errors in product descriptions/categorizations and issues like (near) duplicate or anomalous SKUs. Learn more.
Related applications
Customer Service
Fix common issues in customer data, and deploy robust ML models to better understand customers and handle their requests.
Business Intelligence / Analytics
Correct data errors for more accurate analytics/modeling enabling better decisions.
Data Entry, Management, and Curation
AI expert review of your data stores to find errors or incorrect labels.
Content Moderation
Train more accurate content moderation models in less time.
Foundation and Large Language Models
Boost fine-tuning accuracy and reduce time spent
Data Annotation & Crowdsourcing
Label data efficiently and accurately, understand annotator quality.
Cleanlab Studio auto-corrects raw data to ensure reliable predictions so you can maximize customer experience.

Case Study
Automated quality assurance for product catalogs
Automated quality assurance for product catalogs
Cleanlab Studio was used to improve an E-commerce website, product listings, and analytics. Finding and fixing errors in product descriptions/metadata can be entirely automated, and improves: customer experience, product discoverability, SEO, advertising, as well as analytics/decision-making.
Read more: Enhancing Product Analytics and E-commerce with Cleanlab Studio

Cleanlab Studio seamlessly handles data with image, text, and structured/tabular features (eg. product price, size, etc) to auto-detect many common issues in product catalogs including:
- Products (SKUs) that are miscategorized or have incorrect tags (tax-classifications, age-restrictions, ...)
- Near-duplicate products (SKUs)
- Products with images that are low-quality or NSFW
- Products with low-quality text descriptions
- Products whose image does not match description
- Text in descriptions or review comments containing: toxic language, Personally Identifiable Information, or is not English
