To make AI inference commercially viable, d-Matrix has built a new computing platform from the ground up: Corsair™, the world’s most efficient compute solution for AI inference at datacenter scale.…
To make AI inference commercially viable, d-Matrix has built a new computing platform from the ground up: Corsair™, the world’s most efficient compute solution for AI inference at datacenter scale.…
Series B announced to commercialize inference compute platform.
Series C- 2025-11-12
$275,000,000
Round reported to bring total raised to $450M and value the company at $2 billion.
Founders
What we do
Join the Team
Machine Learning Intern
On-SiteSan Francisco Bay Area, US
On-Site • San Francisco Bay Area, US
Related Companies
Company
HQ
Industry
Total Funding
Modular
🇺🇸US
Data and AnalyticsDeepTechInformation TechnologySoftware
$380M
FriendliAI
🇺🇸US
Data and AnalyticsDeepTechInformation TechnologyInternet ServicesSoftware
-
Etched
🇺🇸US
Consumer ProductsData and AnalyticsDeepTechHardwareInformation TechnologyManufacturingSoftware
$625M
Fireworks AI
🇺🇸US
Data and AnalyticsDeepTechInformation TechnologySoftware
$307M
Baseten
🇺🇸US
Software
$585M
Who you are
Currently pursuing a degree in Computer Science, Electrical Engineering, Machine Learning, or a related field
Familiarity with PyTorch and deep learning concepts, particularly regarding model optimization and memory management
Understanding of CUDA programming and hardware-accelerated computation (experience with CUDA is a plus)
Strong programming skills in Python, with experience in PyTorch
Analytical mindset with the ability to approach problems creatively
Experience with deep learning model inference optimization
Knowledge of data structures used in machine learning for memory and compute efficiency
Experience with hardware-specific optimization, especially on custom hardware like D-Matrix, is an advantage
This role is ideal for a self-motivated intern interested in applying advanced memory management techniques in the context of large-scale machine learning inference
If you’re passionate about optimizing machine learning models and are excited to explore cutting-edge solutions in model inference, we encourage you to apply
What the job involves
Benefits
Equity
Health care
Flexible time-off
Paid paternity leave
401k retirement plan
Remote & hybrid working model
Work with world-leading engineers and researchers
Peer coaching & development
Team-building activities
Happy hours
Free food and snacks
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
66,000+
Open Roles
1,300+
New This Week
Mobile Developer
Part-timeAustin, US
Part-time • Austin, US
DevOps Engineer
InternshipMunich, DE
Internship • Munich, DE
Software Engineer
ContractManchester, GB
Contract • Manchester, GB
Backend Developer
ContractJerusalem
Contract • Jerusalem
Data Scientist
Full-timeMunich, DE
Full-time • Munich, DE
DevOps Engineer
Part-timeMunich, DE
Part-time • Munich, DE
We are seeking a motivated and innovative Machine Learning Intern to join our team
The intern will work on developing a dynamic Key-Value (KV) cache solution for Large Language Model (LLM) inference, aimed at enhancing memory utilization and execution efficiency on D-Matrix hardware
This project will involve modeling at the PyTorch graph level to enable efficient, torch-native support for KV-Cache, addressing limitations in current solutions
Research and analyze existing KV-Cache implementations used in LLM inference, particularly those utilizing lists of past-key-values PyTorch tensors
Investigate “Paged Attention” mechanisms that leverage dedicated CUDA data structures to optimize memory for variable sequence lengths
Design and implement a torch-native dynamic KV-Cache model that can be integrated seamlessly within PyTorch
Model KV-Cache behavior within the PyTorch compute graph to improve compatibility with torch.compile and facilitate the export of the compute graph
Conduct experiments to evaluate memory utilization and inference efficiency on D-Matrix hardware
Develop an efficient support system for KV-Cache on D-Matrix hardware
Create a torch-level modeling framework for dynamic KV-Cache
Ensure compatibility of the KV-Cache model with torch.compile and other PyTorch features for optimized graph export