FriendliAI

Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production

friendli.ai

FriendliAI

Supercharge Generative AI Inference Efficient, fast, and reliable generative AI inference solution for production

friendli.ai

HQUS

Team Size49

Open Jobs2

Total Funding-

Latest FundraiseUnknown

TL;DR

What they do: Managed inference cloud for deploying and serving large language and multimodal models with performance optimizations

HQ: Redwood City, California

Founded: 2021

Recent funding: $20M seed extension (Aug 28, 2025)

CEO / Founder: Byung‑Gon (Gon) Chun

Company Overview

Problem Domain

AI inference infrastructure for large language and multimodal models

Founded

2021

Industry

Software Development

Funding Track Record

Seed extension- 2025-08-28

$20M

Round announced to expand AI inference platform, go-to-market, and product development

Investor Signal

“Capstone Partners led the $20M seed extension with participation from Sierra Ventures, Alumni Ventures, KDB, and KB Securities”

Founders

What we do

Join the Team

Software Engineer - GPU Kernel

On-SiteSeoul, KR

On-Site • Seoul, KR

Related Companies

Company	HQ	Industry	Total Funding
FuriosaAI	🌍Undisclosed	Data and AnalyticsDeepTechInformation TechnologyManufacturing	$266M
Baseten	🇺🇸US	Software	$585M
quadric, Inc	🇺🇸Burlingame, US	Consumer ProductsDeepTechHardwareManufacturing	$74M
Modular	🇺🇸US	Data and AnalyticsDeepTechInformation TechnologySoftware	$380M
GenBio AI	🇺🇸Palo Alto, US	BiotechnologyDeepTechEducation	-

About the job

FriendliAI is looking for a GPU Kernel Engineer to design, build, and optimize the low-level compute kernels that power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class inference speed across NVIDIA and AMD GPUs. With our recent $20M funding, we are scaling our team to meet market demand.

This is a deeply technical, high-impact role where you will write GPU code, implement advanced optimizations. As part of our engine team, you will contribute directly to the company’s proprietary inference engine which supports over 450,000 models on Hugging Face. You will work with the inventors of continuous batching and collaborate with the platform team to deploy your work into production.

Key Responsibilities

Design, implement, and optimize high-performance GPU kernels for AI inference (e.g., GEMM, attention, routing)
Develop and maintain GPU code in CUDA and C++, including low-level assembly when needed
Implement reduced-precision and quantized kernels (FP8/FP4) for low-latency or high-throughput inference
Benchmark and ensure cross-vendor performance parity between NVIDIA and AMD hardware
Contribute to internal GPU libraries and tune performance of performance-critical components
Accelerate multi-modal model pipelines
Investigate and integrate next-generation GPU features

Qualifications

3+ years of experience in GPU programming, HPC, or performance-critical systems
Bachelor’s or Master’s degrees in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Strong proficiency in CUDA for NVIDIA GPUs or ROCm/HIP for AMD GPUs
Deep understanding of GPU architecture: warps, threads, memory hierarchy, synchronization, and latency-throughput trade-offs
Proficiency in C++
Experience with GPU profiling and performance tuning
Strong numerical background with understanding of precision trade-offs and quantization techniques

Preferred Experience

Experience optimizing transformer, multi-modal, or Mixture-of-Experts (MoE) architectures at the kernel level
Familiarity with the latest GPU libraries and frameworks (CUTLASS, Triton, …)
Inter-GPU communication programming experience
Open-source contributions related to GPU performance or ML acceleration
Research or conference presentations on GPU optimization, HPC, or numerical computing

Benefits

A front-row seat to the AI infrastructure revolution.
Opportunity to work cross-functionally with top-tier engineers, product leaders, and go-to-market teams.
Competitive compensation.
Premium hardware and health support benefits.
A highly collaborative, fast-moving team where your impact is immediate and visible .

About us

FriendliAI, a San Mateo, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 450,000 open-source models. We are on a mission to deliver the world’s best platform for AI inference.

Startup jobs. A lot of them.

Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.

52,000+

Startups

66,000+

Open Roles

1,300+

New This Week

Mobile Developer

Part-timeAustin, US

Part-time • Austin, US

Technical Writer

Part-timeTel Aviv

Part-time • Tel Aviv

Software Engineer

InternshipLondon, GB

Internship • London, GB

AI Researcher

ContractLondon, GB

Contract • London, GB

AI Researcher

Full-timeRotterdam, NL

Full-time • Rotterdam, NL

Frontend Developer

InternshipAustin, US

Internship • Austin, US