Research Internship: Diagnostic
vor 2 Wochen
Full-time | Voice & Conversational AI | Global Enterprise AI Platform
Duration
: 4-8 Months
Location
: Switzerland (Europe), On-Site at AGIGO's Zurich Office
About AGIGO
AGIGO is the first enterprise-grade conversational AI platform that empowers enterprises to transform customer engagement and business performance with high-agency AI-agents - agents that match well-trained human customer agents in naturalness, responsiveness, and autonomous task resolution. Built for on-premises or hybrid deployment, with no reliance on third-party services, our proprietary platform gives enterprises full control, observability, and data sovereignty. Its unified core, tunable base models, and end-to-end design toolchain deliver context-aware, adaptable agents that engage directly with customers in real-time. Founded February 2025 in Switzerland by a team of 18 experienced AI pioneers, AGIGO is driven by a bold vision to lead the next major wave in AI by transforming how businesses interact with their customers.
Your Research Mission
The objective of this internship is to design and build a next-generation diagnostic and perceptual evaluation framework for generative speech models - a system that not only tells us if a model is better, but why. You will combine robust objective metrics with novel techniques for automated failure diagnosis and perceptual correlation. The resulting framework will become a core internal tool, guiding model selection and optimization across AGIGO's voice-synthesis development and deployment cycle.
Phase 1: Foundational Objective Metrics
In the initial phase of your project, you implement a state-of-the-art suite of automated metrics that provide a comprehensive, objective view of model performance and robustness, going far beyond conventional Word Error Rate (WER):
Aggregated WER
An ensemble of diverse ASR models (auto and non-autoregressive (AR/NAR) models of different architectures) to measure intelligibility robustness.
Semantic Error Rate (SER)
You will implement a metric that goes beyond simple word matching. By comparing the semantic embeddings (e.g., from a T5 or BERT model) of the ground truth text and the ASR-transcribed text, this metric can tolerate minor transcription differences ("the car" vs. "a car") while heavily penalizing meaning-altering errors, e.g., hallucinations or repeated n-grams.
Signal & Perceptual Proxy Suite
- Integrate standardized metrics such as STOI, PESQ, Si-SNR from TorchAudio-Squim to assess signal-level fidelity.
- Integrate non-Intrusive perceptual objective metrics based on neural networks, such as DNSMOS.
- Implement speaker similarity metrics using pre-trained speaker-verification models to quantify performance in voice cloning tasks (optional, if time allows).
Phase 2: Automated Failure Diagnostics & Adversarial Testing
This is where the project becomes truly innovative. The goal is to automatically find and categorize the subtle failures that plague even the best TTS models. You will develop a classifier to detect common TTS failure modes on generated audio:
- Hallucination Detector
: Identifies repeated phrases, word omissions, and truncated sentences. - Prosody Mismatch Detector
: A model trained to detect when the intonation of a sentence does not match its punctuation, e.g., a question spoken as a statement. - Artifact Detector
: A model that specifically listens for common synthesis artifacts like metallic ringing or hissing. - Automated Challenge Set Generation
: A system to automatically find or generate difficult text samples (e.g., tongue twisters, complex numerical expressions) that are likely to cause a given model to fail, creating a constantly evolving stress test. We could potentially use an LLM to pursue this line of research. (Optional, if time allows.)
Key Research Challenges
- Predictive Evaluation
: Can you analyze a model's internal states or confidence scores before synthesis to predict whether it is likely to fail on a given piece of text? This could be used to build a fallback or self-correction mechanism directly into the TTS engine. - Multi-Lingual Generalization
: How do these advanced metrics and diagnostic tools generalize across different languages and phonetic systems? You will lay the groundwork for a truly universal evaluation suite.
Your Impact
Your project will develop a fast, reliable, and diagnostic evaluation pipeline that will accelerate the selection of the best TTS systems among candidates for deployment in real use cases. By moving from slow and subjective listening tests, your work will enable us to iterate on models faster, catch regressions, and scientifically measure progress towards truly human-like speech synthesis.
We value original thinking and encourage you to help shape and redefine the project's direction as your research uncovers new insights. AGIGO fosters an open, collaborative environment where ideas can evolve freely. Exceptional innovation often emerges where disciplines and perspectives intersect, and we actively support creative exploration that pushes the boundaries of what Voice-AI can achieve.
What You Bring
Required
- Master student (preferred) or PhD student in Computer Science, Machine Learning, or a related field
- Strong Python programming skills and Git
- Solid understanding of ML fundamentals and MLOps
- Hands-on experience with PyTorch
- Fluent in English, highly motivated, willingness to learn
Plus Points
- Experience with Hugging Face models (for LLMs, ASR, or "speech-LLMs")
- Familiarity with audio benchmarks
- Knowledge of speech, ASR, or/and TTS concepts
- Hands-on experience with large-scale data processing pipelines
- Hands-on experience with audio AI (ASR/TTS) model training and development
What You Will Gain
- Direct company impact: your project will strengthen the agility and effectiveness of AGIGO's industry-leading Voice-AI research
- Mentorship: work closely with our expert team of researchers and engineers
- Top-tier AI infrastructure: access to GPU clusters with NVIDIA Hopper (H200) and Blackwell RTX GPUs
- Research visibility: we will actively support you in publishing your work at a top-tier conference or in a journal paper
- Disciplined and inspiring research environment: a team of sharp minds grounded in expertise, autonomy, and a shared pursuit of impactful breakthroughs
- Paid internship: market-level salary, flexible hours, and free coffee, drinks, fruits and snacks
- Career path: this internship may lead to a full-time permanent role in AGIGO's world-class AI R&D team
How to Apply
To apply, please send your resume and a brief introduction to with the subject line:
Research Internship – Evaluation Framework for Generative Speech – [Your Full Name]
.
For more information:
By submitting your application, you agree to allow AGIGO to store and process your data for recruitment purposes. Unless otherwise requested, we may retain your data for up to one year to consider you for this or other future opportunities.
AGIGO is a registered trademark of AGIGO AG, Switzerland.
-
Zürich, Zürich, Schweiz Apple Vollzeit CHF 50'000 - CHF 70'000 pro JahrApplication Deadline: Monday 3rd November We are seeking an intern with a strong curiosity about applying machine learning research to impactful challenges inspired by the health domain. The selected candidate will join a collaborative team of research scientists and engineers who are passionate about delivering groundbreaking machine learning technologies...
-
Research Internship
Vor 5 Tagen
Zürich, Zürich, Schweiz Lakera Vollzeit CHF 104'000 - CHF 160'000 pro JahrYou want to build at the cutting edge of AI, pushing the limits of scalable AI security. At Lakera, we are not just another research lab: we are engineering the next generation of security foundation models with immediate impact at scale. As a Research Intern, you will have ownership in shaping our key initiatives and working closely with the team in solving...
-
Research Internship
Vor 5 Tagen
Zürich, Zürich, Schweiz Lakera Vollzeit CHF 90'000 - CHF 120'000 pro JahrYou want to build at the cutting edge of AI, pushing the limits of scalable AI security. At Lakera, we are not just another research lab: we are engineering the next generation of security foundation models with immediate impact at scale. As aResearch Intern, you will have ownership in shaping our key initiatives and working closely with the team in solving...
-
Zürich, Zürich, Schweiz Scandit AG Vollzeit CHF 50'000 - CHF 70'000 pro JahrComputer Vision Research Internship: Image to Sequence Modeling (e.g. Transformers) Zurich Duration: Minimum 6 months; ideally 9–12 months, depending on the candidate's experience Scandit gives people superpowers. Whether enabling delivery drivers to make quicker deliveries, matching a patient with their medication, or allowing retailers to make...
-
Research Internship, up to 100
Vor 5 Tagen
Zürich, Zürich, Schweiz Geneva Association VollzeitResearch Internship, up to 100%Location: Zurich (Switzerland)Start date: January 2026 or by arrangementDuration: 9 months with the option to extend to 12 monthsPaid positionThe Geneva Association (GA) is the leading international think tank of the insurance industry and the only global association of insurance companies; its members are insurance and...
-
Physicist Internship in Bali
Vor 7 Tagen
Zürich, Zürich, Schweiz EX Venture Inc. VollzeitPhysicist Internship in BaliLocation: Bali, Indonesia Work Hours: 11 AM - 5 PM Environment: Fast-paced, AI-powered innovationStart: ASAP Compensation: Unpaid internship Relocation: Mandatory relocation to Bali, Indonesia. No remote options available. Important Note: This is an unpaid internship. We don't cover accommodation, food, visa, or flights. What...
-
Research Intern
vor 2 Wochen
Zürich, Zürich, Schweiz Apple Vollzeit CHF 60'000 - CHF 90'000 pro JahrAt Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion for problem-solving and dedication to your job, and there's no telling what you could accomplish. We advance the state of the art to improve the lives of our customers worldwide. The European Vision Group (EVG) is dedicated to...
-
Research Intern
vor 2 Wochen
Zürich, Zürich, Schweiz Apple Vollzeit CHF 60'000 - CHF 120'000 pro JahrAt Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion for problem-solving and dedication to your job, and there's no telling what you could accomplish. We advance the state of the art to improve the lives of our customers worldwide. The European Vision Group (EVG) is dedicated to...
-
Internship Computer Vision
Vor 3 Tagen
Zürich, Zürich, Schweiz Apple Vollzeit CHF 60'000 - CHF 120'000 pro JahrWe're seeking research interns to create breakthrough innovations in machine learning, computer vision, computer graphics, and related areas. We are particularly interested in 3D reconstruction, depth estimation, neural rendering, view synthesis, and image and video synthesis. You will work directly with an organization of world-class machine learning...
-
Internship Computer Vision
Vor 3 Tagen
Zürich, Zürich, Schweiz Apple Inc Vollzeit CHF 60'000 - CHF 120'000 pro JahrSummary Posted: Oct 23, Role Number: We're seeking research interns to create breakthrough innovations in machine learning, computer vision, computer graphics, and related areas. We are particularly interested in 3D reconstruction, depth estimation, neural rendering, view synthesis, and image and video synthesis. You will work directly with an organization...