Research Scientist

At Cleanlab you’ll get to

Research novel data-centric ML algorithms to improve data quality, publish papers/blogs that shape how data science is practiced, work across data modalities that interest you (vision, text, tabular, etc), and contribute to open-source: https://github.com/cleanlab/cleanlab. We aim to invent algorithms that are maximally useful to data scientists! Come do impactful science at a dynamic startup pioneering the growing field of data-centric AI.

What we’re looking for

First and foremost – you must have strong software engineering skills and experience designing and productionizing your code into large (and maintainable) software systems, and you should be able to do so in significantly less time than a typical engineer.

Second – you must have strong experience with ML research, especially on developing practical methods that work well for real-world, human-facing applications on real-world data.

Responsibilities

Invent novel data-centric AI algorithms that help data scientists deal with messy data
Contribute high-quality implementations of these algorithms to Cleanlab’s open-source projects
Publish papers and/or blogs that convey why your algorithms are useful

Qualifications

We select candidates based on strengths, not on weaknesses. Experience with the following is highly recommended, but not required:

Publications at top conferences/journals in ML or Data Science
Python, NumPy
pandas, scikit-learn
PyTorch/PyTorch Lightning, JAX, or TensorFlow + Keras

Bonus:

Past contributions to Cleanlab, other data-centric AI tools, and/or ML blog posts
Experience with Hugging Face, Weights and Biases, OpenCV, Gradio/Streamlit
Experience with MLOps (ETL, model deployment and monitoring)
Prior relevant work experience in auto-ML, data-centric AI research, or areas related to Cleanlab

Benefits

Working at Cleanlab is awesome! Beyond the opportunity to work at a well-funded (backed by Bain Capital Ventures) early stage AI tech company with an incredible, friendly founding team of MIT, Stanford, and Harvard graduates, all full-time employees receive the following:

$9,000 per year travel benefit
- Travel enhances our empathy with different cultures and enables us to work together more effectively. It’s how we grow and learn: traveling is an essential part of what makes us human. At Cleanlab, every two months you will receive a $1500 reimbursable travel benefit (resets on Jan 1, March 1, May 1, July 1, Sep 1, Nov 1). This is a unique benefit that lets you work from Paris for a week in February, then take a backpacking trip in the Andes for a weekend in March. Cleanlab will cover the flight for your partner or friend, too, as long as you attend and its within the $1500 / two-month period. For remote employees, you can use this benefit to come work with us in Boston/SF from time to time (encouraged, but not required).
Premium health insurance
- We provide a fantastic $4 (we cover the rest) health insurance option. We also provide a $0 deductible 100% coverage premium health care option for those who prefer the best health insurance.
Stipend for attending conferences to keep up with the latest innovations in ML and software.
Competitive salary (+ equity offering for certain roles), with regular opportunities for a raise if things are going well.

About Us

Prior to Cleanlab, our founders (3 ML PhDs from MIT) worked at OpenAI, Google, Microsoft, Amazon, AWS, Facebook AI Research (FAIR), Dropbox, Oculus, Palantir, NASA, General Electric, MIT Lincoln Laboratory, MIT, Harvard, and Stanford – at every place we worked we repeatedly encountered the same issue – AI solutions failed to work reliably on real-world, human-centric data due to label errors and poor data quality. So, we spent eight years of PhD research at MIT inventing a new field to solve this problem and after successful pilots with world-leading organizations, Cleanlab emerged.

Everything we do at Cleanlab is guided by our north star – to improve the world’s ML data more easily and quicker than any other solution – enabling AI systems to train more reliably on real-world, messy, error-prone data. We develop next-generation data-centric AI, open-source algorithms and provide no-code SaaS enterprise solutions to help individuals and teams at companies (across all industries) diagnose/fix issues in their datasets and produce more reliable ML models by providing clean labels for training.

Cleanlab is a well-funded early-stage startup that is rapidly growing to transform the future of data-centric AI. Some of Cleanlab’s early work (while the company was still in stealth-mode) has been featured in various media such as: Wired, MIT Technology Review, and VentureBeat.

While many companies can help store/manage data or develop ML models, there exist few solutions today to improve the quality of existing data, which is the core asset of the modern enterprise. This is where you come in. At Cleanlab, you’ll be able to take ownership of critical projects that pioneer the future of data-centric AI.

We are a remote-first company, with roughly half of our team located near Boston, MA (EST time) and the other half located near San Francisco, CA (PST time).

Read about the Cleanlab team here.
Read how Cleanlab went from MIT PhD research to tech used by Amazon, Google, etc here.
See what Google, Wells Fargo, and other Cleanlab users think here.

How to Apply

Apply here: https://cleanlab.ai/apply