ML Systems/Infrastructure Engineer | Oriole · Teeming.ai
Oriole
A new company that will revolutionise the performance of AI systems and speed up data centres, whilst dramatically reducing energy consumption for a sustainable future.
A new company that will revolutionise the performance of AI systems and speed up data centres, whilst dramatically reducing energy consumption for a sustainable future.
Product focus: Full-photonic switching and network fabrics for AI/ML and HPC (PRISM, PRISM Ultra)
Total reported recent funding: $22M Series A (after prior £10M seed)
Employee count: ~70
Company Overview
Problem Domain
Data-centre networking for AI/ML and HPC with emphasis on throughput, latency and energy efficiency.
Founded
2023
Industry
Networking / Photonics / AI infrastructure
Tech Stack
Photonic switching
Optical networking
Network fabric design
Funding Track Record
Seed
£10M
Seed round reported with backing from the Clean Growth Fund among others.
Series A- October 2024
$22M
Round reported led by Plural; company noted total raised that year reached $35M.
Investor Signal
“Investors include Plural, UCL Technology Fund, XTX Ventures, Clean Growth Fund and Dorilton Ventures; Innovate UK Investor Partnership participated in prior activity.”
Founders
What we do
Join the Team
ML Systems/Infrastructure Engineer
On-SiteLondon Area, GB
On-Site • London Area, GB
Related Companies
Company
HQ
Industry
Total Funding
SINGH AUTOMATION
🇺🇸Portage, US
DeepTechDesignHardwareManufacturing
-
XTEND
🌍Tel Aviv
DeepTechMobile, Platforms, and AppsSoftwareTransportation
$99M
Quoppo
🇮🇳Pune, IN
Data and AnalyticsDeepTechFinanceInformation TechnologyLending and InvestmentsSoftware
-
Octobotics Tech
🇮🇳Noida, IN
DeepTech
$150K
Nirbhav Automation
🌍Remote
DeepTechHardwareManufacturing
-
Oriole is seeking talented a ML Systems/Infrastructure Engineer to help co-optimize our AI/ML software stack with cutting-edge network hardware. You’ll be a key contributor to a high-impact, agile team focused on integrating middleware communication libraries and modelling the performance of large-scale AI/ML workloads.
Key Responsibilities:
Design and optimize custom GPU communication kernels
to enhance performance and scalability across multi-node environments
Develop and maintain distributed communication frameworks
for large-scale deep learning models, ensuring efficient parallelization and optimal resource utilization.
Profile, benchmark, and debug GPU applications
to identify and resolve bottlenecks in communication and computation pipelines.
Collaborate closely with hardware and software teams
to integrate optimized kernels with Oriole’s next-generation network hardware and software stack.
Contribute to system-level architecture decisions
for large-scale GPU clusters, with a focus on communication efficiency, fault tolerance, and novel architectures for advanced optical network infrastructure.
Required Skills & Experience:
Startup jobs. A lot of them.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
65,000+
Open Roles
1,300+
New This Week
Backend Developer
InternshipAustin, US
Internship • Austin, US
Mobile Developer
ContractUtrecht, NL
Contract • Utrecht, NL
Frontend Developer
Full-timeHamburg, DE
Full-time • Hamburg, DE
Data Scientist
Full-timeLondon, GB
Full-time • London, GB
Technical Writer
ContractBerlin, DE
Contract • Berlin, DE
Frontend Developer
ContractCambridge, GB
Contract • Cambridge, GB
Proficient in C++ and Python
, with a strong track record in high-performance computing or machine learning projects.
Expertisein GPU programming with CUDA
, including deep knowledge of GPU memory hierarchies and kernel optimization.
Hands-on experience debugging GPU kernels
using tools such as Cuda-gdb, Cuda Memcheck, NSight Systems, PTX, and SASS.
Strong understanding of communication libraries
and protocols, including NCCL, NVSHMEM, OpenMPI, UCX, or custom collective communication implementations.
Familiarity with HPC networking protocols/libraries
such as RoCE, Infiniband, Libibverbs, and libfabric.
Experience with distributed deep learning
/MoE frameworks, including PyTorch Distributed, vLLM, or DeepEP.
Solid understanding of deploying and optimizing large-scale distributed deep learning workloads
in production environments, including Linux, Kubernetes, SLURM, OpenMPI, GPU drivers, Docker, and CI/CD automation.