Senior Evaluation ML Engineer

Vor 4 Tagen

Zürich, Zürich, Schweiz kaiko Vollzeit € 120'000 - € 400'000 pro Jahr

About kaiko

Delivering high quality cancer care is complex; specialists form a view of each patient's condition by reasoning across different data - CT scans, genomics context, treatment history and clinical notes.

Current AI are powerful within domains but fall short when it comes to reasoning across data or domain areas. kaiko.w, our AI assistant for oncology, aims to equip every clinician with a full understanding of their patients, helping them to reason across data as they assess each case.

We're building this in close collaboration with the Netherlands Cancer Institute (NKI) and a growing network of hospitals and research centers. We've raised significant long-term funding and have nearly doubled our team over the past year. We're now 80+ people representing 25 nationalities, based across our offices in Zurich and Amsterdam.

About the role

Kaiko's Multimodal Large Language Model (MLLM) is trained on domain-specific, high-complexity medical data. To reach clinical-grade performance, we need comprehensive, large-scale evaluation that is
clinically grounded
.

As a
Senior Evaluation
ML Engineer, you'll design and own our end-to-end evaluation stack, from gold-standard ground truths and synthetic benchmark generation to automated release-gating, with a focus on
oncology-relevant tasks and metrics
. You will partner with clinicians, external annotators and ML researchers to ensure that every signal we measure reflects real clinical decision-making and informs our model development efforts.

As a Senior Evaluation ML Engineer you will

Build and operate
our eval infrastructure at scale (Python + Ray/Spark, Dagster preferred) with strong CI/CD, reproducibility, and observability principles in mind.
Source & curate
benchmarks (public, licensed, partner-provided) and
generate
high-fidelity synthetic cases with controls for clinical plausibility, leakage, cohort balance, and difficulty.
Define
clinically meaningful task taxonomies and rubrics spanning text (clinical notes, reports), imaging (CT/MRI/PET), pathology (
whole-slide images
), genomics (
VCF
, biomarkers), and structured
EHR/FHIR
data.
Automate
offline evaluations and
build
online evaluation flows (clinician-in-the-loop review, preference/ranking, A/B).
Collaborate
with clinicians and external partners to facilitate expert evaluations, design annotation protocols, and translate clinical questions into measurable tasks
Maintain
benchmark hygiene: deduplication, de-identification awareness, leakage audits, stratified sampling, etc.

You will be based in
Zurich or Amsterdam
, with the expectation of spending
~50% of your time in the office
.

About you

Excellent
Python
skills and strong software engineering fundamentals (testing, modular design, CI/CD).
Deep experience designing & operating
evaluation or data-quality pipelines
for ML/LLMs at scale.
Comfortable with
distributed
compute (Ray, Spark), data lakehouse paradigms (Delta/Iceberg) and columnar formats (Parquet/ORC).
Working knowledge of oncology workflows and terminology:
staging (TNM)
, common
biomarkers
, lines of therapy, response criteria (e.g.,
RECIST
), typical labs and imaging follow-up.

Nice To Have

Experience with eval frameworks (lm-eval-harness, OpenAI Evals, HF Evaluate) and preference modeling.
Background in biomed/healthtech (bioinformatics, medical imaging, clinical decision support, translational research, real-world evidence) or graduate work in a related field.
Safety/red-teaming for LLMs; familiarity with quality/risk practices for clinical software (e.g., MDR/SaMD concepts).
Experience reading and operationalizing
radiology
,
pathology
, and
molecular
reports for evaluation tasks.
Hands-on experience with
workflow orch
estration (Dagster preferred) and monitoring/observability.
Experience working with medical foundation models and evaluating them on benchmarks in radiology, pathology, and/or genomics
Familiarity with medical standards/ontologies:
FHIR/HL7
,
SNOMED CT
,
ICD-10/ICD-O
,
LOINC
,
DICOM
,
VCF
.

We are excited to gather a broad range of perspectives in our team, as we believe it will help us build better products to support a broader set of people. If you're excited about us but don't fit every single qualification, we still encourage you to apply: we've had incredible team members join us who didn't check every box.

Why kaiko

At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We've built a team of international experts where your work has direct impact. Here's what we value:

Ownership
: You'll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work.
Collaboration
: You'll have to approach disagreement with curiosity, build on common ground and create solutions together.
Ambition
: You'll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients.

In addition, we offer

An attractive and competitive salary, a good pension plan and 25 vacation days per year.
Great offsites and team events to strengthen the team and celebrate successes together.
A EUR 1000 learning and development budget to help you grow.
Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings.
An annual commuting subsidy.

Our interview process

Our interview process is designed to assess mutual fit across skills, motivation, and values. It typically includes the following steps:

Screening call: A short conversation to align on your motivation, career goals, and initial fit for the role.
Technical interview: A deep dive into your problem-solving approach through a technical challenge, case study, or role-specific scenario.
Onsite meeting (optional): You'll meet team members across functions to explore collaboration dynamics, team fit, and day-to-day context.
Final executive conversation: A discussion with a member of the executive team focused on long-term alignment, cultural fit, and shared expectations for impact.

Junior ML Engineer

vor 2 Wochen

Zürich, Zürich, Schweiz RiskPod Vollzeit CHF 80'000 - CHF 120'000 pro Jahr

Job Specification: Junior Machine Learning Engineer – Model Optimization & InferenceLocation:Zürich, Switzerland (On-site / Hybrid)Department:Engineering & AI ResearchEmployment Type:Full-time, PermanentStart Date:January 15, 2026 (flexible)Experience Level:Junior (0–3 years)Role OverviewMy client is seeking a Junior Machine Learning Engineer to support...
Senior AI/ML Software Engineer

Vor 4 Tagen

Zürich, Zürich, Schweiz Bjak Vollzeit CHF 120'000 - CHF 200'000 pro Jahr

Build AI Systems That Make Finance Simpler, Smarter, and More Inclusive At BJAK, we use AI to make insurance and financial services easier to access, understand, and afford for millions of users. As a Senior AI/ML Software Engineer, you'll help build the intelligent systems that power this mission - from personalized recommendations and fraud detection to...
Senior AI/ML Software Engineer

Vor 4 Tagen

Zürich, Zürich, Schweiz Bjak Vollzeit CHF 80'000 - CHF 120'000 pro Jahr

Build AI Systems That Make Finance Simpler, Smarter, and More InclusiveAt BJAK, we use AI to make insurance and financial services easier to access, understand, and afford for millions of users. As a Senior AI/ML Software Engineer, you'll help build the intelligent systems that power this mission - from personalized recommendations and fraud detection to...
Tech Lead AI Umfeld(GenAI, ML, GCP)

Vor 4 Tagen

Zürich, Zürich, Schweiz PROSTAFF Schweiz GmbH Vollzeit

Unser Auftraggeber (Versicherungsbranche) baut aktuell ein GenAI-Engineering-Team mit drei Spezialist:innen auf. Bereits an Bord sind ein Junior/Professional sowie ein fest angestellter Professional. Für die nächste Ausbauphase suchen wir einen erfahrenen Contractor (Senior GenAI Engineer), der hands-on entwickelt und zugleich als Tech Lead die technische...
AI-Engineer (GenAI / ML / GCP)

vor 1 Woche

Zürich, Zürich, Schweiz PROSTAFF Schweiz GmbH Vollzeit CHF 80'000 - CHF 120'000 pro Jahr

Unser Auftraggeber aus der Versicherungsbranche transformiert sein Business in Richtung AI-First. Ziel ist es, digitale Lösungen zu entwickeln, die messbaren Mehrwert für Kunden, Mitarbeitende und das Unternehmen schaffen.Zur Verstärkung des Data & AI Product Teams suchen wir einen erfahrenen GenAI & ML Engineer.Umsetzung von Data & AI Use Cases im Rahmen...
Senior Software Engineer, AI/ML, LLM Modeling

Vor 4 Tagen

Zürich, Zürich, Schweiz Google Vollzeit CHF 120'000 - CHF 180'000 pro Jahr

Minimum qualifications:Bachelor's degree or equivalent practical experience.5 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.3 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement...
Senior ML/RL Training Infrastructure Engineer

Vor 4 Tagen

Zürich, Zürich, Schweiz Apple Vollzeit CHF 120'000 - CHF 180'000 pro Jahr

Ready to transform how billions of people interact with technology? Apple's Core Foundation Models team is driving the intelligence that powers experiences across billions of devices worldwide—and we're looking for exceptional talent to join us Join our Europe-based applied ML team building the next generation of large-scale ML and RL training...
Senior Software Engineer, AI/ML, LLM Modeling

Vor 4 Tagen

Zürich, Zürich, Schweiz Google Vollzeit CHF 180'000 - CHF 250'000 pro Jahr

Minimum qualifications:Bachelor's degree or equivalent practical experience.5 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.3 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement...
Senior ML/RL Training Infrastructure Engineer

Vor 2 Tagen

Zürich, Zürich, Schweiz Apple Inc Vollzeit $ 1'200'000 - $ 2'000'000 pro Jahr

Summary Posted: Dec 03, Role Number: Ready to transform how billions of people interact with technology? Apple's Core Foundation Models team is driving the intelligence that powers experiences across billions of devices worldwide—and we're looking for exceptional talent to join us Join our Europe-based applied ML team building the next generation of...
Founding AI/ML Research Engineer

Vor 4 Tagen

Zürich, Zürich, Schweiz Bjak Vollzeit $ 80'000 - $ 160'000 pro Jahr

Transform language models into real-world, high-impact product experiences. A1 is a self-funded AI division backed by BJAK, operating in full stealth. We're building a new global consumer AI application focused on an important but underexplored use case — something practical, meaningful, and far beyond the typical chatbot or productivity agent. As a...

Amerika

Europa

Asien / Ozeanien

Afrika

Senior Evaluation ML Engineer