FriendliAI

Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production

friendli.ai

FriendliAI

Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production

friendli.ai

HQUS

Team Size48

Open Jobs14

Total Funding-

Latest FundraiseUnknown

TL;DR

What they do: High-performance generative AI inference tooling and managed platforms for deploying, scaling, and monitoring large language and multimodal models

Founded: 2021

HQ / hubs: Redwood City, California; hub in Seoul, Korea

Recent financing: $20M seed extension led by Capstone Partners (announced Aug 28, 2025)

Founder / CEO: Byung-Gon Chun

Related Companies

Company	HQ	Industry	Total Funding
Baseten	🇺🇸US	—	$585M
quadric, Inc	🇺🇸Burlingame, US	Consumer ProductsDeepTechHardwareManufacturing	$74M
Modular	🇺🇸US	Data and AnalyticsDeepTechInformation TechnologySoftware	$380M
GenBio AI	🇺🇸Palo Alto, US	BiotechnologyDeepTechEducation	-
d-Matrix	🇺🇸US	Data and AnalyticsDeepTechHardwareInformation TechnologyInternet ServicesManufacturingSoftware	$429M

Company Overview

Problem Domain

Generative AI inference infrastructure for production deployments of LLMs and multimodal models

Founded

2021

Industry

Software Development

Funding Track Record

Seed extension- 2025-08-28

$20M

Participation from Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities (announced by company)

Seed

$6M

Prior seed round reported in late 2021

Investor Signal

“Led by Capstone Partners with participation from Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities”

Founders

What we do

Join the Team

Software Engineer - Inference Engine

HybridNew York, NY, US

Hybrid • New York, NY, US

About us

FriendliAI, a Redwood City, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 400,000 open-source models. We are on a mission to deliver the world’s best platform for generative and agentic AI.

The Role

We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. You will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical and compute-intensive systems deployed by our customers.

The Person

You are an exceptional engineer with a strong foundation in GPU programming and compiler infrastructure. You enjoy pushing the performance boundaries and have experience supporting production-scale machine learning applications.

Key Responsibilities

Design and optimize custom GPU kernels for AI (e.g., transformer and diffusion) workloads
Contribute to the development of FriendliAI’s kernel compiler, memory planner, runtime, and other core components.
Collaborate with cloud and infrastructure engineers to ensure end-to-end inference performance
Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations
Drive support for new model architectures and tensor compute patterns
Maintain production-grade performance infrastructure, including profiling, benchmarking, and validation tools

Qualifications

5+ years of experience in production or high-impact research environments
Production-level expertise in Python and C++
Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Experience developing machine learning frameworks or performance-critical runtime systems
Hands-on experience writing and optimizing GPU kernels
Hands-on experience profiling GPU kernels
Experience working with generative AI models such as transformer and diffusion models

Preferred Experience

Experience developing machine learning compilers or code generation systems
Familiarity with dynamic shape compilation, memory planning, and kernel fusion
Contributions to inference engines, compilers, or high-performance numerical libraries
Understanding of multi-GPU and distributed inference strategies

Benefits

Flexible working hours
Daily lunch and dinner provided
Unlimited snacks and beverages
Supportive work environment
Health check-up support
Top-tier equipment support
We offer competitive compensation, startup equity, health insurance, and other benefits.

FriendliAI

FriendliAI

TL;DR

Related Companies

Company Overview

Problem Domain

Founded

Industry

Funding Track Record

Investor Signal

Founders

What we do

Join the Team

Software Engineer - Inference Engine

Teeming tracks opportunities at over 24,000 AI startups, then works with you to find (and land) the one you'll love.

Backend Developer

DevOps Engineer

DevOps Engineer

Mobile Developer

Frontend Developer

Backend Developer