ML Systems/Infrastructure Engineer | Oriole · Teeming.ai
Oriole
A new company that will revolutionise the performance of AI systems and speed up data centres, whilst dramatically reducing energy consumption for a sustainable future.
A new company that will revolutionise the performance of AI systems and speed up data centres, whilst dramatically reducing energy consumption for a sustainable future.
Round included UCL Technology Fund, XTX Ventures, Clean Growth Fund, Dorilton Ventures and Nexenai Capital
Investor Signal
“Led by Plural with participation from institutional and specialist investors including UCL Technology Fund and XTX Ventures”
Founders
What we do
Join the Team
ML Systems/Infrastructure Engineer
On-SiteLondon Area, GB
On-Site • London Area, GB
Related Companies
Company
HQ
Industry
Total Funding
SINGH AUTOMATION
🇺🇸Portage, US
DeepTechDesignHardwareManufacturing
-
XTEND
🌍Tel Aviv
DeepTechMobile, Platforms, and AppsSoftwareTransportation
$99M
Quoppo
🇮🇳Pune, IN
Data and AnalyticsDeepTechFinanceInformation TechnologyLending and InvestmentsSoftware
-
Bright Machines
🇺🇸San Francisco, US
Consumer ProductsData and AnalyticsDeepTechHardwareInformation TechnologyManufacturingSoftware
$437M
Octobotics Tech
🇮🇳Noida, IN
DeepTech
$150K
Oriole is seeking talented a ML Systems/Infrastructure Engineer to help co-optimize our AI/ML software stack with cutting-edge network hardware. You’ll be a key contributor to a high-impact, agile team focused on integrating middleware communication libraries and modelling the performance of large-scale AI/ML workloads.
Key Responsibilities:
Design and optimize custom GPU communication kernels
to enhance performance and scalability across multi-node environments
Develop and maintain distributed communication frameworks
for large-scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
Profile, benchmark, and debug GPU applications
to identify and resolve bottlenecks in communication and computation pipelines.
Collaborate closely with hardware and software teams
to integrate optimized kernels with Oriole’s next-generation network hardware and software stack.
Contribute to system-level architecture decisions
for large-scale GPU clusters, with a focus on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.
Required Skills & Experience:
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
60,000+
Open Roles
500+
New This Week
DevOps Engineer
Part-timeTel Aviv
Part-time • Tel Aviv
Machine Learning Engineer
ContractMunich, DE
Contract • Munich, DE
Data Scientist
InternshipRotterdam, NL
Internship • Rotterdam, NL
Product Designer
Part-timeLondon, GB
Part-time • London, GB
Data Scientist
InternshipAmsterdam, NL
Internship • Amsterdam, NL
Mobile Developer
Part-timeMunich, DE
Part-time • Munich, DE
Proficient in C++ and Python
, with a strong track record in high-performance computing or machine learning projects.
Expertisein GPU programming with CUDA
, including deep knowledge of GPU memory hierarchies and kernel optimization.
Hands-on experience debugging GPU kernels
using tools such as Cuda-gdb, Cuda Memcheck, NSight Systems, PTX, and SASS.
Strong understanding of communication libraries
and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
Familiarity with HPC networking protocols/libraries
such as RoCE, Infiniband, Libibverbs, and libfabric.
Experience with distributed deep learning
/MoE frameworks, including PyTorch Distributed, vLLM, or DeepEP.
Solid understanding of deploying and optimizing large-scale distributed deep learning workloads
in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.