
Cosine is an autonomous, on-premise coding agent post-trained on human reasoning data to deliver unmatched software-engineering accuracy, security, and speed for regulated enterprises.

Cosine is an autonomous, on-premise coding agent post-trained on human reasoning data to deliver unmatched software-engineering accuracy, security, and speed for regulated enterprises.
Core product: Autonomous on-premise AI software-engineering agent that produces review-ready pull requests
Deployment: Cloud, dedicated-tenant, and fully air-gapped on-premise options
Models: Lumen family — post-trained for maintainability and niche languages
Founders: Alistair Pullen, Sam Stenner, Yang Li
Seed funding: $2.5M reported
Software engineering automation for large, production codebases with enterprise security and compliance requirements.
Developer tools / AI for software engineering
$2.5M
Seed round reported by Dealroom/Dealroom feed
“Lakestar; Warrick Shanly”
| Company |
|---|
Job title: Machine Learning Engineer – Lumen Enterprise Models (SWE-focused LLMs) Location: London; full in-office working as default
Start date: ASAP
Reports to: CEO
Compensation: BASE SALARY : £80,000 - £110,000 EQUITY : £80,000 - £110,000 ___________________________________________________________________________
Cosine at a glance At Cosine, we’re building autonomous AI engineers that plan, write, and ship code inside real development workflows.
Cosine is designed for on-premise and virtual private cloud (VPC) deployments, including fully air-gapped environments. We build our agent tooling entirely in-house and post-train open-source models to deliver reliable, enterprise-grade coding performance in security-critical settings.
In 2024, Cosine achieved a 72% score on OpenAI’s SWE-Lancer benchmark, placing us among the strongest real-world software-engineering AI systems evaluated.
YC-backed and well-funded, Cosine was founded by experienced operators focused on building dependable, production-grade AI.
This role is based in our Hoxton office, five days a week, because close collaboration, fast feedback, and shared context matter for the problems we’re solving. ___________________________________________________________________________
The role We’re looking for an ML engineer to own large-scale training of our Lumen Enterprise models – our open‑source–based software engineering LLMs.
You’ll work on supervised fine-tuning (SFT), and reinforcement learning (RL) and continued pretraining on top of open-source base models to push state-of-the-art performance on real software engineering tasks: reading and modifying large codebases, using tools, and reasoning about complex systems.
If you enjoy working close to the metal with PyTorch and distributed training, and you like making big models actually work in practice, this role is for you.
___________________________________________________________________________
About The Role In this role you will:
You’ll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it’s actually better for engineers.
What You’ll Do
___________________________________________________________________________
What We’re Looking For
Nice to have
___________________________________________________________________________
Why join Cosine
If this sounds like a fit, this is a role where you can meaningfully push the frontier of open-source–based software engineering models.
___________________________________________________________________________
Cosine is an equal opportunity employer. We value diverse backgrounds, perspectives, and ways of thinking, and we’re committed to creating an inclusive and respectful workplace.
We encourage applications from anyone who meets the role requirements, even if you don’t meet every single qualification. If you need reasonable adjustments at any stage of the hiring process, we’re happy to discuss them.
___________________________________________________________________________
Compensation, Benefits & Ways Of Working We’re an in-office team, five days a week, by design. We believe the work we’re doing benefits from being together, collaborating closely, and building shared context.
What You Can Expect
We care about focus, sustainability, and doing great work — not performative overwork. We value people who show up, contribute thoughtfully, collaborate well with their colleagues, and then go home.
This role won’t suit everyone. But if you want structure, clarity, strong collaboration, and a team that takes both the work and work-life balance seriously, it’s a great place to be.
___________________________________________________________________________
Agency & Data Protection Notice To comply with UK GDPR and our internal data-protection and equal-opportunity obligations, we only accept candidate applications and agency submissions via our Applicant Tracking System (ATS). This ensures appropriate privacy notices, lawful processing, auditability, and consistent retention controls.
Any CVs or candidate details received outside the ATS (including via email, Slack, or direct message) will be treated as unsolicited, will not be considered as part of the recruitment process, and will not give rise to any fee or payment obligation.
Your next opportunity is in here somewhere. Sign up to explore 52,000+ startups and their open roles. No spam. No gamification. Just jobs.
52,000+
Startups
66,000+
Open Roles
1,500+
New This Week
Participate in end-to-end training of Lumen Enterprise SWE models:
Supervised fine-tuning on curated code and conversation datasets.
RL on top of those models to align them with software-engineering objectives.
Occasional continued pretraining on domain-specific / long-context corpora.
Design, implement, and iterate on RL training pipelines
Build and maintain large-scale PyTorch training code:
Write and optimize custom dataloaders and batching strategies
Use PyTorch distributed primitives (DDP/FSDP and related) to scale training.
Operate large multi-node training jobs:
Launch and debug multi-GPU, multi-node runs (Slurm, k8s or similar schedulers).
Diagnose issues around NCCL, hangs, load balancing, and performance regressions.
Track experiment configs, checkpoints, and metrics across many runs.
Work on long-context and code-focused training:
Train models on long-context data (e.g. long documents, repos, multi-file tasks) and understand the tradeoffs between context length, batch size, and stability.
Ideate on novel and opinionated reward functions for the training of SWE agents
Improve evaluation for SWE models:
Help maintain/extend an evaluation suite for code models (unit tests, benchmark suites, repo-level tasks).
Analyze failure modes and feed them back into data and training plans.
Collaborate:
Work closely with infra engineers on performance and reliability.
Stay up to date with the latest research in the space, sharing knowledge throughout the team at lunch and learns and regular stand ups.
Strong experience training deep learning models in production:
Typically 3–5+ years working as an ML engineer / applied scientist, including hands-on responsibility for training and shipping models.
Deep proficiency with PyTorch and its primitives:
Comfort implementing custom training loops, losses, and dataloaders.
Hands-on experience with torch.distributed (DDP/FSDP-style training, distributed data loading, gradient scaling, etc.).
Experience training large sequence models or LLMs:
Have trained models at ≥70B parameters end-to-end on multi-GPU setups.
Understand practical issues: stability, init, scaling laws, gradient accumulation, curriculum and sampling strategies.
Experience with SFT and RL on top of LLMs:
Have implemented or meaningfully modified at least one RLVR system (e.g. PPO-style, GRPO-style, or similar).
Comfortable working with advantages, policy ratios, KL penalties, and sequence-level rewards.
Strong software engineering background:
You can read, debug, and write non-trivial production code (Python, plus familiarity with at least one of: TypeScript, Go).
You care about code quality, correctness, and maintainability as much as model metrics.
High level of Git proficiency.
Distributed systems / training ops experience:
Practical experience running multi-node jobs on GPU clusters (Slurm, Kubernetes, or managed cloud equivalents).
Familiarity with GPU performance tuning: memory usage, mixed precision, throughput vs. latency tradeoffs.
Data engineering instincts:
Comfortable working with large-scale datasets, object storage, dataset sharding, and filtering.
Know that data quality and sampling strategies matter as much as architecture.
Clear communication and ownership:
Can take a vague modelling goal (“make Lumen Enterprise better at X”) and turn it into a concrete plan of experiments.
Comfortable documenting decisions and walking others through tradeoffs.
You don’t need all of these, but the more you have, the more you’ll hit the ground running:
Continued pretraining and long-context experience:
Have run continued pretraining on domain-specific or long-context corpora.
Familiarity with techniques like RoPE scaling, YaRN-style extrapolation, context parallelism, or similar.
Code-focused RL and evaluation:
Experience building RL loops where rewards come from code execution (tests, linters, static analysis, fuzzing, runtime traces).
Familiarity with evaluation benchmarks for code models (e.g. HumanEval, MBPP, SWE-bench, or internal equivalents).
Experience with modern LLM training stacks:
Experience with large MoE models and expert/tensor parallelism is a plus.
Serving and online training:
Experience in tuning inference tasks for opensource frameworks, e.g. VLLM, SGLang, etc.
Safety, robustness, and reward shaping:
Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.
Open-source contributions or research:
Contributions to open-source LLM tooling, RL libraries, or relevant research papers in LLM training / RLHF / code models.