Cleanlab

Software Development

San Francisco, California 13,686 followers

Adding automation and trust to every data point in analytics, LLMs, and AI solutions. Don't let your data do you dirty.

View all 45 employees

About us

Pioneered at MIT and proven at Fortune 500 companies, Cleanlab provides the world's most popular Data-Centric AI software. Most AI and Analytics are impaired by data issues (data entry errors, mislabeling, outliers, ambiguity, near duplicates, data drift, low-quality or unsafe content, etc); Cleanlab software helps you automatically fix them in any image/text/tabular dataset. This no-code platform can also auto-label big datasets and provide robust machine learning predictions (via models auto-trained on auto-corrected data). What can I get from Cleanlab software? 1. Automated validation of your data sources (quality assurance for your data team). Your company's data is your competitive advantage, don't let noise dilute its value. 2. Better version of your dataset. Use the cleaned dataset produced by Cleanlab in place of your original dataset to get more reliable ML/Analytics (without any change in your existing code). 3. Better ML deployment (reduced time to deployment & more reliable predictions). Let Cleanlab automatically handle the full ML stack for you! With just a few clicks, deploy more accurate models than fine-tuned OpenAI LLMs for text data and the state-of-art for tabular/image data. Turn raw data into reliable AI & Analytics, without all the manual data prep work. Most of our cutting-edge research powering Cleanlab tools is published for transparency and scientific advancement: cleanlab.ai/research/

Website: https://cleanlab.ai
External link for Cleanlab
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Privately Held

Products

Cleanlab Studio

Machine Learning Software

No-code data correction solution for ML, Data, and Analytics teams ✨ Real-world data are messy and full of incorrect labels/values, outliers, and other issues! Our AI platform can automatically find and fix common issues in image, text, or tabular datasets. Good models & analyses require good data. Cleanlab Studio helps you quickly improve your dataset, and instantly deploy robust ML models for enterprise applications. For any supervised learning dataset (image, text, tabular/CSV/Excel/JSON data), Cleanlab Studio will: - Find label errors, outliers, and other data issues automatically via our AI - Enable easy data editing to fix these issues and produce a better dataset - Score and track data quality over time as you make improvements - Train accurate ML models on the cleaned data and deploy robustly in the real-world Many Studio customers see 15-50% improvement in ML/Analytics accuracy with 10x less time to get there. Your first clean dataset is free! https://cleanlab.ai/studio/

Locations

Primary

San Francisco, California 94110, US

Get directions

Employees at Cleanlab

See all employees

Updates

Cleanlab

13,686 followers
1y
Report this post
One of the largest financial institutions in the world, BBVA, uses Cleanlab to improve their categorization of all financial transactions. Results achieved *without having to change their current model*: ➡️ Reduced labeling effort by 98% ➡️ Improved model accuracy by 28% This is the power of #DataCentricAI tools that provide automation to improve your data: Your existing (and future) models improve immediately with better data! Start practicing automated data improvement: https://cleanlab.ai/studio

BBVA AI Factory

17,481 followers
1y

💡 How did we manage to reduce the effort put into labeling our financial transaction categorizer by up to 98%? 🌱 Over the past few months, we've been working on a new version of our Taxonomy of Expenses and Income. This new version helps our clients gain a more comprehensive view of their finances and improve their 💙#FinancialHealth. ➡️ To achieve this, we updated the #ML model behind the categorizer using #Annotify, a tool developed at BBVA AI Factory. ➡️ Our #DataScientists used libraries such as #ActiveLearning and #Cleanlab to label large amounts of financial data more efficiently. ✅ The result was a more accurate #AI model that required about 2.9 million fewer tags than the initial taxonomy. 📲 Learn more about the details of this work by the hand of David Muelas Recuenco, Maria Ruiz Teixidor, Leandro A. Hidalgo, and Aarón Rubio Fernández in the following article 👉 https://lnkd.in/ew8bBVJE

Money talks: How AI models help us classify our expenses and income - BBVA AI Factory

bbvaaifactory.com

Like Comment Share
Cleanlab

13,686 followers
7h
Report this post
In our latest blog, we walk through the benefits of reframing time-series data as a classification problem. This approach simultaneously leverages enhanced performance, flexibility, and interpretability while access to a variety of models like random forests and neural networks can capture complex patterns more effectively. Using a popular energy consumption dataset, we benchmark Cleanlab Studio AutoML alongside Prophet and Gradient Boosting - Cleanlab Studio's AutoML reached 94.61% accuracy, far outperforming other methods. Not only does Cleanlab Studio AutoML achieve superior results with minimal effort through automated modeling training, hyperparameter tuning and predictor selection, it simplifies deployment for production-level models, enabling real results in minutes. Accelerate development time and enhance forecast accuracy with Cleanlab Studio. Read the full blog and sign up today to see the difference it can make in your projects. https://lnkd.in/gKptsgBN

Robust and Accurate AutoML for Time Series in Quick Production Deployment | Cleanlab Studio

cleanlab.ai

Like Comment Share
Cleanlab

13,686 followers
2w
Report this post
Give it a shot - sign up for Cleanlab Studio today. https://app.cleanlab.ai/
Curtis Northcutt

CEO & Co-Founder @ Cleanlab. MIT PhD in CS. I build AI companies to empower people. Former Google, Oculus, Amazon, Facebook, Microsoft
2w

Anthropic says Claude Sonnet 3.5 beats GPT-4o-- we tested it out in a real-world customer use case. Results below. We benchmarked GPT-4o vs Sonnet 3.5 on our Banking Task Benchmark (intent recognition for categorizing customer support) using 3 approaches: zero-shot, few-shot, and Cleanlab-curated few shot. In this task, Anthropic is both more accurate and more affordable. The improvement over GPT-4o is slight but consistent, and with Claude 3.5 Sonnet's per-token prompt cost at almost half the price of GPT-4o, it's cheaper to feed in large prompts and examples. Both LLMs improved using Cleanlab to curate the few shot examples + reduced costs by removing and correcting problematic few shot examples. Shout out to Nelson Auner for the analysis. #genAI #llms #agi #artificialintelligence #machinelearning #datacuration #datacentricAI
Like Comment Share
Cleanlab reposted this

Vin Vashishta Vin Vashishta is an Influencer

AI Advisor | Author “From Data To Profit” | Course Instructor (Data & AI Strategy, Product Management, Leadership)
3w
Report this post
The problem is that most companies are investing 80% of their AI budget into models and only 20% into data. Here’s what needs to change to unlock AI value. Shift investment from gathering data to curating data. Curation builds a dataset for model consumers vs. human consumers. The cost of model training drops because reliable use case support is delivered with less data and less complex models. Shift investment from engineering data pipelines to engineering data-generating processes. Moving data from one place to another creates no value, while each new dataset makes the business more valuable. Data creates more AI opportunities. Unique datasets are the primary competitive advantage and AI moat. Models are only best-in-class for a few months. GPT-4o was upstaged by Gemini 1.5, which was just surpassed by Claude 3.5. The investment required to win on massive models is much too high for enterprise business models to support. Follow me here and click the link under my name to learn more about how to deliver value-centric AI. #AI #Data #AIStrategy #DataQuality
45 Comments

Like Comment Share
Cleanlab reposted this

Barr Moses

Co-Founder & CEO at Monte Carlo
3w
Report this post
Is “data-driven with a disclaimer” an acceptable future for AI applications? Tomasz Tunguz posed the same question in one of his latest newsletters highlighting the implicit bias consumers feel toward their AI products. Tomasz asserts that the totally reasonable expectations we have for SaaS products to be both safe and accurate for enterprise doesn’t apply to the AI era—at least not yet. With every AI software sneaking disclaimers into their products (“Gemini may display inaccurate info…double-check its responses” or, “ChatGPT can make mistakes…check important info.”), we’ve all but accepted the reality that we can’t totally trust AI applications. And if we can’t trust them, we can’t fully embrace them. “We suffer from a cognitive bias: work performed by a human is likely more trustworthy because we understand the biases & the limitations. AIs are a Schrodinger’s cat stuffed in a black box. We don’t comprehend how the box works (yet), nor can we believe our eyes if the feline is dead or alive when we see it," says Tomasz. The more important our work is, the more confident we all need to be. Even human error rates are too much for the most financially—and socially—critical data use cases. Self-driving cars. Navigation systems. Insurance claims. News summaries. AI trust requires trustworthy AI. And trustworthy AI requires trustworthy data. How is your team going beyond the status quo to meet the real-time data quality demands of generative applications? Let me know in the comments!
8 Comments

Like Comment Share
Cleanlab

13,686 followers
3w
Report this post
Calling all AI enthusiasts and data science professionals in SF! Don't miss out on our Sake Social event happening on July 17th with Cleanlab and Open Data Science Conference (ODSC). Join us for a night of networking, learning, and curated sake 🍶✨. We are excited to have Curtis Northcutt deliver a data curation keynote that will surely spark interesting conversations around AI. This is a great opportunity to connect with like-minded individuals, expand your knowledge, and have a great time. RSVP now to secure your spot: https://bit.ly/4ex4d8L. See you there!

Sake Social: An Evening with Cleanlab and ODSC · Luma

lu.ma

2 Comments

Like Comment Share
Cleanlab

13,686 followers
1mo
Report this post
Don’t let messy docs run you RAG-ged! Cleanlab Studio now directly supports heterogenous document collections composed of files of the following types: doc, docx, pdf, ppt, pptx, csv, xls, xlsx. It’s an incredible way to organize your docs within a RAG system. Auto-detect issues across your heterogeneous documents and auto-label/tag them as well. Use a no-code interface to quickly get your docs ready for RAG use. ⚡ Read the whole blog: 👇 https://lnkd.in/g3zUUZQM
Like Comment Share
Cleanlab

13,686 followers
1mo
Report this post
Yet another Foundation Model announcement that highlights the importance of data curation software. This time from Apple: "We find that data quality is essential to model success, so we utilize a hybrid data strategy in our training pipeline, incorporating both human-annotated and synthetic data, and conduct thorough data curation and filtering procedures." https://lnkd.in/gJgjCzzG

Introducing Apple’s On-Device and Server Foundation Models

machinelearning.apple.com

Like Comment Share
Cleanlab reposted this

Nikolai Liubimov

Co-founder & CTO at HumanSignal - Data Labeling solutions for Data Science & ML
1mo Edited
Report this post
New tutorial - LLM Evaluation: Comparing Four Methods to Automatically Detect Errors Large Language Models (LLMs) are revolutionizing many industries, but they come with a significant challenge: hallucinations. These unintended outputs can lead to misleading or even harmful information. Addressing these errors effectively requires innovative approaches, especially when manual supervision is not feasible. In this tutorial, we explore four cutting-edge techniques for automated LLM error detection. Using a practical example with Shopify app store reviews, the tutorial will walk you through using each method to help improve the reliability of LLM outputs: 🔍 Token Probability Analysis: Assessing token confidence scores to identify potential inaccuracies. 🤖 LLM-as-Judge: Utilizing a secondary LLM to evaluate the accuracy of the primary model's labels. 🔄 Self-Consistency: Running multiple inferences with varied prompts to check for consistent results. 🛡 Confident Learning with Cleanlab: Leveraging auxiliary classification models to detect label inconsistencies. Each technique has its own strengths and trade-offs. Dive into the full tutorial to see detailed results and insights on choosing the best approach for your projects. 📖 You can try the full tutorial here: https://lnkd.in/dnias5e7 #LLM #AI #MachineLearning #ErrorDetection #DataScience #Automation #AIResearch #TechInnovation

AI Evaluation Event | HumanSignal

humansignal.com

2 Comments

Like Comment Share
Cleanlab

13,686 followers
1mo
Report this post
The latest Scale Zeitgeist AI Readiness Report (2024) surveys 1800 ML practitioners across the US, from industries like Software/IT, Finance/Insurance, Business Service, Government, ... When asked to list "Top challenges in preparing high-quality training data for AI models", the #1 response was "Labeling quality", with "Curating data" ranking #4. The below image shows how the biggest challenges faced by AI Readiness respondents evolved over previous surveys, with Data Quality steadily growing in importance. At Cleanlab, we've long known how costly resolving data/label quality can be. It's vital for reliable AI, yet nobody wants to deal with this labor-intensive work. That's why we provide AI that can automatically find/fix dataset issues, as well as smart interfaces to quickly curate your {text, image, tabular} data.
1 Comment

Like Comment Share

Browse jobs

Funding

Cleanlab 2 total rounds

Last Round

Series A Nov 10, 2023

US$ 25.0M

Investors

Menlo Ventures TQ Ventures + 2 Other investors

See more info on crunchbase

Cleanlab

Software Development

San Francisco, California 13,686 followers

Adding automation and trust to every data point in analytics, LLMs, and AI solutions. Don't let your data do you dirty.

About us

Products

Cleanlab Studio

Machine Learning Software

Locations

Employees at Cleanlab

Aaref Hilaly Aaref Hilaly is an Influencer

Partner at Bain Capital Ventures

⚡️Kasey Evans

Founder & Managing Partner @ Lane VC

Chris Klink

Web Developer/Designer

Jeff Poulos

GTM Sales Leader I Builder I Coach I Trusted Advisor

Updates

Join now to see what you are missing

Similar pages

ChipBrain

unstructured.io

Cosmos Ventures

Contextual AI

Anomalo

Glean

Anthropic

Perplexity

Mistral AI

Kniru

Browse jobs

Engineer jobs

Scientist jobs

Developer jobs

Analyst jobs

Machine Learning Engineer jobs

Software Engineer jobs

Intern jobs

Lead Scientist jobs

Junior Developer jobs

Python Developer jobs

Data Analyst jobs

Data Scientist jobs

Marketing Manager jobs

Recruiter jobs

Human Resources Intern jobs

Senior Product Manager jobs

Software Engineering Manager jobs

Research Assistant jobs

Junior Software Engineer jobs

User Interface Engineer jobs

Funding