Cosine

Cosine provides an AI software engineer named Genie that autonomously completes engineering tasks, implements features, and submits pull requests so development teams can manage work asynchronously.…

AI EngineeringAutonomous AgentAutoPMCode GenerationCollaborationIDEProductivitySoftware Developmentcosine.sh

Cosine

AI EngineeringAutonomous AgentAutoPMCode GenerationCollaborationIDEProductivitySoftware Developmentcosine.sh

HQLondon, GB

Team Size37

Open Jobs3

Total Funding$4M

Latest Fundraise6 months ago

TL;DR

Product: Genie — an autonomous AI software engineer for branching, coding, testing, and opening PRs; platform includes AutoPM for task breakdown and verification

Integrations: GitHub, GitLab, Bitbucket, Jira, Linear, Slack, Vercel, VS Code, JetBrains, CLI

Founded: 2022

Headquarters: London (company described as London-based and also referenced San Francisco in some materials)

Funding (reported): $2.5M seed (Aug 2024); total funding reported ~ $3.53M

Company Overview

Problem Domain

Software engineering productivity / developer automation for production-grade codebases

Founded

2022

Industry

Data and Analytics

Tech Stack

Lumen coding models

Google Cloud

Common Crawl

Funding Track Record

Seed- 2024-08

$2.5M

Reported participation from Lakestar and Focal

Investor Signal

“Includes early-stage investors such as Uphonest, SOMA Capital, Lakestar, Focal, Warrick Shanly, Eight Capital, Gaingels, John Spindler, UpHonest Capital, and Y Combinator”

Founders

What we do

Join the Team

Machine Learning Engineer – Lumen Enterprise Models (SWE-focused LLMs)

On-SiteGreater London, England, GB

On-Site • Greater London, England, GB

Related Companies

Company	HQ	Industry	Total Funding
Lyzr AI	🇺🇸Jersey City, US	Data and AnalyticsDeepTechEducationInformation TechnologyMobile, Platforms, and AppsSoftware	$25M
Cline	🇺🇸San Francisco, US	Information TechnologySoftware	$32M
TurinTech AI	🇬🇧London, GB	Data and AnalyticsDeepTechInformation TechnologySoftware	$21M
Distyl	🇺🇸US	Data and AnalyticsDeepTechInformation TechnologySoftware	$202M
Lovable	🇸🇪SE	Data and AnalyticsDeepTechInformation TechnologySoftware	-

Job title: Machine Learning Engineer – Lumen Enterprise Models (SWE-focused LLMs) Location: London; full in-office working as default

Start date: ASAP

Reports to: CEO

Compensation: BASE SALARY : £80,000 - £110,000 EQUITY : £80,000 - £110,000 ___________________________________________________________________________

Cosine at a glance At Cosine, we’re building autonomous AI engineers that plan, write, and ship code inside real development workflows.

Cosine is designed for on-premise and virtual private cloud (VPC) deployments, including fully air-gapped environments. We build our agent tooling entirely in-house and post-train open-source models to deliver reliable, enterprise-grade coding performance in security-critical settings.

In 2024, Cosine achieved a 72% score on OpenAI’s SWE-Lancer benchmark, placing us among the strongest real-world software-engineering AI systems evaluated.

YC-backed and well-funded, Cosine was founded by experienced operators focused on building dependable, production-grade AI.

This role is based in our Hoxton office, five days a week, because close collaboration, fast feedback, and shared context matter for the problems we’re solving. ___________________________________________________________________________

The role We’re looking for an ML engineer to own large-scale training of our Lumen Enterprise models – our open‑source–based software engineering LLMs.

You’ll work on supervised fine-tuning (SFT), and reinforcement learning (RL) and continued pretraining on top of open-source base models to push state-of-the-art performance on real software engineering tasks: reading and modifying large codebases, using tools, and reasoning about complex systems.

If you enjoy working close to the metal with PyTorch and distributed training, and you like making big models actually work in practice, this role is for you.

___________________________________________________________________________

About The Role In this role you will:

Take open-source base models (code + general LLMs) and turn them into high-performance Lumen Enterprise SWE agents via SFT and RL.
Design and run large-scale training experiments on multi-node GPU clusters, including long-context training and MoE-style architectures.
Build and iterate on large-scale RL loops where models write code, run tests or tools, and get rewarded (or penalized) accordingly.
Work hands-on across the stack: custom PyTorch dataloaders, distributed training primitives, RL objectives, and evaluation on real-world repos and tasks.

You’ll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it’s actually better for engineers.

What You’ll Do

___________________________________________________________________________

What We’re Looking For

Nice to have

___________________________________________________________________________

Why join Cosine

Direct impact: Your work directly shapes the next generations of Lumen Enterprise SWE models that engineers use every day.
Real scale: You’ll work with large, modern open-source models, long context lengths, and multi-node training runs.
Full-stack ML engineering: From custom PyTorch code and distributed systems to data curation, RL design and MLOps.
Research + pragmatism: You’ll stay close to the latest literature in SFT, and code LLMs, but you’ll be judged by shipped improvements, not just ideas.

If this sounds like a fit, this is a role where you can meaningfully push the frontier of open-source–based software engineering models.

___________________________________________________________________________

Cosine is an equal opportunity employer. We value diverse backgrounds, perspectives, and ways of thinking, and we’re committed to creating an inclusive and respectful workplace.

We encourage applications from anyone who meets the role requirements, even if you don’t meet every single qualification. If you need reasonable adjustments at any stage of the hiring process, we’re happy to discuss them.

___________________________________________________________________________

Compensation, Benefits & Ways Of Working We’re an in-office team, five days a week, by design. We believe the work we’re doing benefits from being together, collaborating closely, and building shared context.

What You Can Expect

Competitive salary, benchmarked to the market
Equity / share options, so you share in the upside you help create
30 days’ holiday + bank holidays
Genuine 9–5 working hours — we don’t expect late nights or weekend work
Work hard in the office, collaborate closely, and switch off properly
Dog-friendly office — bring your dog to work
Daily lunch provided
Monthly team breakfasts

We care about focus, sustainability, and doing great work — not performative overwork. We value people who show up, contribute thoughtfully, collaborate well with their colleagues, and then go home.

This role won’t suit everyone. But if you want structure, clarity, strong collaboration, and a team that takes both the work and work-life balance seriously, it’s a great place to be.

___________________________________________________________________________

Agency & Data Protection Notice To comply with UK GDPR and our internal data-protection and equal-opportunity obligations, we only accept candidate applications and agency submissions via our Applicant Tracking System (ATS). This ensures appropriate privacy notices, lawful processing, auditability, and consistent retention controls.

Any CVs or candidate details received outside the ATS (including via email, Slack, or direct message) will be treated as unsolicited, will not be considered as part of the recruitment process, and will not give rise to any fee or payment obligation.

Startup jobs. A lot of them.

Your next opportunity is in here somewhere. Sign up to explore 70,000+ startups and their open roles. No spam. No gamification. Just jobs.

70,000+

Startups

80,000+

Open Roles

4,500+

New This Week

Frontend Developer

Part-timeUtrecht, NL

Part-time • Utrecht, NL

Software Engineer

Part-timeAustin, US

Part-time • Austin, US

Frontend Developer

Full-timeUtrecht, NL

Full-time • Utrecht, NL

Machine Learning Engineer

Part-timeJerusalem

Part-time • Jerusalem

AI Researcher

InternshipMunich, DE

Internship • Munich, DE

Frontend Developer

Full-timeHaifa

Full-time • Haifa

Participate in end-to-end training of Lumen Enterprise SWE models:

Supervised fine-tuning on curated code and conversation datasets.

RL on top of those models to align them with software-engineering objectives.

Occasional continued pretraining on domain-specific / long-context corpora.

Design, implement, and iterate on RL training pipelines

Build and maintain large-scale PyTorch training code:

Write and optimize custom dataloaders and batching strategies

Use PyTorch distributed primitives (DDP/FSDP and related) to scale training.

Operate large multi-node training jobs:

Launch and debug multi-GPU, multi-node runs (Slurm, k8s or similar schedulers).

Diagnose issues around NCCL, hangs, load balancing, and performance regressions.

Track experiment configs, checkpoints, and metrics across many runs.

Work on long-context and code-focused training:

Train models on long-context data (e.g. long documents, repos, multi-file tasks) and understand the tradeoffs between context length, batch size, and stability.

Ideate on novel and opinionated reward functions for the training of SWE agents

Improve evaluation for SWE models:

Help maintain/extend an evaluation suite for code models (unit tests, benchmark suites, repo-level tasks).

Analyze failure modes and feed them back into data and training plans.

Work closely with infra engineers on performance and reliability.

Stay up to date with the latest research in the space, sharing knowledge throughout the team at lunch and learns and regular stand ups.

Strong experience training deep learning models in production:

Typically 3–5+ years working as an ML engineer / applied scientist, including hands-on responsibility for training and shipping models.

Deep proficiency with PyTorch and its primitives:

Comfort implementing custom training loops, losses, and dataloaders.

Hands-on experience with torch.distributed (DDP/FSDP-style training, distributed data loading, gradient scaling, etc.).

Experience training large sequence models or LLMs:

Have trained models at ≥70B parameters end-to-end on multi-GPU setups.

Understand practical issues: stability, init, scaling laws, gradient accumulation, curriculum and sampling strategies.

Experience with SFT and RL on top of LLMs:

Have implemented or meaningfully modified at least one RLVR system (e.g. PPO-style, GRPO-style, or similar).

Comfortable working with advantages, policy ratios, KL penalties, and sequence-level rewards.

Strong software engineering background:

You can read, debug, and write non-trivial production code (Python, plus familiarity with at least one of: TypeScript, Go).

You care about code quality, correctness, and maintainability as much as model metrics.

High level of Git proficiency.

Distributed systems / training ops experience:

Practical experience running multi-node jobs on GPU clusters (Slurm, Kubernetes, or managed cloud equivalents).

Familiarity with GPU performance tuning: memory usage, mixed precision, throughput vs. latency tradeoffs.

Data engineering instincts:

Comfortable working with large-scale datasets, object storage, dataset sharding, and filtering.

Know that data quality and sampling strategies matter as much as architecture.

Clear communication and ownership:

Can take a vague modelling goal (“make Lumen Enterprise better at X”) and turn it into a concrete plan of experiments.

Comfortable documenting decisions and walking others through tradeoffs.

You don’t need all of these, but the more you have, the more you’ll hit the ground running:

Continued pretraining and long-context experience:

Have run continued pretraining on domain-specific or long-context corpora.

Familiarity with techniques like RoPE scaling, YaRN-style extrapolation, context parallelism, or similar.

Code-focused RL and evaluation:

Experience building RL loops where rewards come from code execution (tests, linters, static analysis, fuzzing, runtime traces).

Familiarity with evaluation benchmarks for code models (e.g. HumanEval, MBPP, SWE-bench, or internal equivalents).

Experience with modern LLM training stacks:

Experience with large MoE models and expert/tensor parallelism is a plus.

Serving and online training:

Experience in tuning inference tasks for opensource frameworks, e.g. VLLM, SGLang, etc.

Safety, robustness, and reward shaping:

Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.

Open-source contributions or research:

Contributions to open-source LLM tooling, RL libraries, or relevant research papers in LLM training / RLHF / code models.

High-quality equipment to do your best work