Mistral AI

Frontier AI. In your hands. We believe in a future where AI is abundant and accessible. We aspire to empower the world to build with—and benefit from—the most significant technology of our…

mistral.ai

Mistral AI

Frontier AI. In your hands. We believe in a future where AI is abundant and accessible. We aspire to empower the world to build with—and benefit from—the most significant technology of our…

mistral.ai

HQFR

Team Size985

Open Jobs5

Total Funding$3B

Latest FundraiseUnknown

TL;DR

Founded: April 2023 (Paris)

Core product: Open-weight frontier AI models and enterprise APIs (including Le Chat)

Team size: ~280 employees

Notable funding: Large multi-round funding including €385M Series A (Dec 2023) and €600M Series B (Jun 2024)

Company Overview

Problem Domain

Frontier generative AI — making powerful, efficient, and shareable models accessible for research and enterprise use.

Founded

2023

Industry

Artificial Intelligence

Funding Track Record

Seed- June 2023

~$113M

Reported as Europe’s largest seed round at the time

Series A- December 11, 2023

€385M (~$415M)

Strategic investment- February 27, 2024

$16M

Investment to convert into equity in a future round

Series B- June 2024

€600M (~$640M reported)

Investor Signal

“Backed by major VCs and strategic corporate investors including Andreessen Horowitz, General Catalyst, Lightspeed, Index Ventures, NVIDIA, Microsoft and Bpifrance”

Founders

What we do

Join the Team

Evaluation Engineer

On-SiteParis, FR

On-Site • Paris, FR

Related Companies

Company	HQ	Industry	Total Funding
muchbetter.ai	🇫🇷FR	Data and AnalyticsInformation TechnologySoftware	-
Deepomatic	🇫🇷FR	Data and AnalyticsDeepTechHardwareInformation TechnologySoftware	$22M
DeepIP	🇺🇸US	Information TechnologyLegalSoftware	-
MakiPeople	🇫🇷FR	Administrative ServicesHR and RecruitingInformation TechnologyProfessional ServicesSoftware	-
OPTIML	🌍Remote	ConstructionFinanceInformation TechnologyProfessional ServicesReal EstateSoftwareSustainability	-

Who you are

You are fluent in English
3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
You have proven experience in AI or machine learning product implementation with APIs, back-end
You have deep understanding of concepts and algorithms underlying machine learning and LLMs
You have strong technical coding skills in Python
You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences
Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
Experience with ML frameworks (PyTorch, HuggingFace Transformers)

What the job involves

Benefits

Competitive bonus structure
Equity
Opportunities for professional growth and development

Startup jobs. A lot of them.

Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.

52,000+

Startups

65,000+

Open Roles

1,500+

New This Week

Technical Writer

InternshipRotterdam, NL

Internship • Rotterdam, NL

Mobile Developer

InternshipMunich, DE

Internship • Munich, DE

Product Designer

ContractTel Aviv

Contract • Tel Aviv

Data Scientist

ContractUtrecht, NL

Contract • Utrecht, NL

Software Engineer

Full-timeLondon, GB

Full-time • London, GB

Product Designer

Part-timeRotterdam, NL

Part-time • Rotterdam, NL

The Applied AI team is Mistral's customer-facing technical organization

We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact

Our team combines deep ML expertise with strong customer engagement skills, operating like startup CTOs who own end-to-end project execution

However, the AI graveyard is full of great ideas nobody could measure or prototypes that never made it to production

As a first Evaluation Engineer, you'll design the methodology, build the infrastructure, and define what "ready for production" means across verticals and use cases

You will design and implement evaluation systems that help our customers understand model performance across their specific use cases, build robust evaluation infrastructure, and work closely with both research and customer-facing teams

Research builds evals for frontier capabilities but customers don't care about MMLU scores

We need in Applied AI evals and frameworks for customer reality domain-specific, risk-aware, production-grade

The kind that tell you whether your medical summarization model will hallucinate drug interactions, or whether your legal assistant will invent case citations

This role sits at the intersection of research, engineering, and solutions, you will play a critical cross role in measuring, understanding, and improving the capabilities of our models for our enterprise customers

Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications

Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance

Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics

Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria

Collaborate with research teams to translate evaluation insights into model improvements and training decisions

Partner with product teams to continuously improve our evaluation tooling based on customer feedback

How We Work in Applied AI:

We care about people and outputs

What matters is what you ship, not the time you spend on it

Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week

Always ask why. The best solutions come from deep understanding, not from copying what worked before

We say what we mean. Feedback is direct, timely, and given because we care

No politics. Low ego, high standards

We embrace an unstructured environment and find joy in it