
Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production

Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production
What they do: High-performance generative AI inference tooling and managed platforms for deploying, scaling, and monitoring large language and multimodal models
Founded: 2021
HQ / hubs: Redwood City, California; hub in Seoul, Korea
Recent financing: $20M seed extension led by Capstone Partners (announced Aug 28, 2025)
Founder / CEO: Byung-Gon Chun
Generative AI inference infrastructure for production deployments of LLMs and multimodal models
2021
Software Development
$20M
Participation from Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities (announced by company)
$6M
Prior seed round reported in late 2021
“Led by Capstone Partners with participation from Sierra Ventures, Alumni Ventures, KDB Investment, and KB Securities”
| Company |
|---|
About the job
FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With our recent $20M funding, we are scaling our team to meet market demand.
This is a deeply technical, high-impact role where you will write GPU code, implement advanced optimizations. As part of our engine team, you will contribute directly to the company’s proprietary inference engine which supports over 450,000 models on Hugging Face. You will work with the inventors of continuous batching and collaborate with the platform team to deploy your work into production.
Key Responsibilities
Qualifications
Preferred Experience
Benefits
About us
FriendliAI, a San Mateo, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 450,000 open-source models. We are on a mission to deliver the world’s best platform for AI inference.