Reliability Engineer for Internet Computer

Vor 6 Tagen


Zürich, Zürich, Schweiz DFINITY Foundation Vollzeit
About the Role

We are seeking a highly skilled Site Reliability Engineer to join our team at the DFINITY Foundation. As a key member of our engineering team, you will play a critical role in ensuring the stability and reliability of our Internet Computer platform.

Key Responsibilities
  • Service Management: Design, build, deploy, and maintain services to ensure high availability and reliability of our products and the Internet Computer Protocol (ICP).
  • Automation: Identify and implement opportunities to automate processes through coding, enhancing efficiency and reducing manual intervention.
  • Reliability and Operability: Integrate reliability and operability into the product from the start by participating in design and code reviews, identifying risks, and proposing mitigations.
  • Collaboration: Work with engineering and security teams to establish processes that align with the goals of the Internet Computer while remaining operationally feasible and automatable.
  • Service Level Objectives (SLOs): Collaborate with product owners to define SLOs and implement them in code and observability infrastructure.
  • On-Call Duty: Participate in on-call duties for production services on a 12/7 schedule, split across two sites. On-call duty is approximately 1 week every 6 weeks. Coordinate incident response and ensure resolution, involving engineers from other teams as necessary.
  • Unix Systems: Operate, troubleshoot, and deploy software on Unix systems.
Requirements
  • Observability: Proven experience in monitoring and maintaining large production systems using tools such as Prometheus, Victoria Metrics, Elastic Search, and Grafana.
  • Kubernetes: Proficiency in managing multiple observability stacks across various availability zones, leveraging Kubernetes for deployment orchestration.
  • Rust Coding: Extensive experience in designing and developing moderate-sized applications (up to ~10K lines of code) in Rust. Skilled in setting up automated testing and CI/CD environments. Ability to identify and implement opportunities for automation and process improvement. Experience in developing reliability engineering tools for large open-source projects is highly desirable.
  • Systemic Thinking: Capable of approaching problems methodically and systemically, especially during troubleshooting.
  • Pragmatism: Ability to balance immediate needs with long-term goals, understanding when a solution is 'good enough for the next 12 months.'
  • Incident Response: Expertise in coordinating incident response across multiple teams, with excellent communication skills to clearly understand the situation, next steps, and team responsibilities.
  • Reliability Engineering: Preferable experience in Site Reliability Engineering (SRE) within a crypto environment where decisions are governed by DAOs.
  • Security Background: Experience in building security-sensitive tools and managing security risks in such environments. A background in DevSecOps is highly desirable.
  • Community Interaction: Proven experience in engaging with community members of large open-source projects. Ideally, the candidate is already active within the ICP community.
What You'll Achieve in the First 3 Months
  • Gain a thorough understanding of DFINITY's infrastructure and production environment.
  • Start working on a suitable starter project.
  • Submit improvements to our documentation and processes based on your onboarding experience.

  • Senior GPU Architect

    vor 23 Stunden


    Zürich, Zürich, Schweiz Internet computer Vollzeit

    About the RoleWe are seeking a highly skilled Senior GPU Engineer to join our team at Internet Computer. As a key member of our engineering team, you will play a crucial role in bringing AI hardware acceleration to smart contracts.Key ResponsibilitiesDesign and develop deterministic GPU workloads to accelerate AI computations on the Internet...


  • Zürich, Zürich, Schweiz dfinity Vollzeit

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at DFINITY. As a key member of our engineering team, you will play a critical role in ensuring the stability and reliability of our Internet Computer platform.Key ResponsibilitiesService Management: Design, build, deploy, and maintain services to ensure high availability...


  • Zürich, Zürich, Schweiz DFINITY Foundation Vollzeit

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at the DFINITY Foundation. As a key member of our engineering team, you will play a critical role in ensuring the stability and reliability of our Internet Computer platform.Key ResponsibilitiesService Management: Design, build, deploy, and maintain services to ensure...


  • Zürich, Zürich, Schweiz DFINITY Vollzeit

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at DFINITY. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining the high availability and reliability of our products and services.Key ResponsibilitiesService Management: Develop and implement service management...


  • Zürich, Zürich, Schweiz Open Systems AG Vollzeit

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Open Systems AG. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our software systems.Key ResponsibilitiesDevelop and maintain automation frameworks to reduce manual intervention and improve system...

  • Reliability Engineer

    Vor 6 Tagen


    Zürich, Zürich, Schweiz ETH get hired Vollzeit

    About ETH get hiredETH get hired is a leading video game development company that specializes in creating immersive and engaging simulation and management games. Our team is passionate about pushing the boundaries of innovation and excellence in the gaming industry.About the RoleWe are seeking a highly skilled and experienced Site Reliability Engineer to...

  • Computer Vision Engineer

    vor 4 Stunden


    Zürich, Zürich, Schweiz Leica Geosystems AG Vollzeit

    About the RoleWe are seeking a highly skilled Computer Vision Engineer to join our team at Hexagon Technology Center GmbH. As a key member of our robotics perception team, you will be responsible for evaluating, testing, implementing, and deploying state-of-the-art computer vision algorithms and adapting them to our specific needs.Key...


  • Zürich, Zürich, Schweiz Ergon Informatik AG Vollzeit

    ÜberblickAls Site Reliability Engineer (SRE) bei Ergon Informatik AG bist du für den Aufbau und Betrieb der Airlock SaaS-Plattform verantwortlich. Du arbeitest in interdisziplinären Teams und meisterst agile Vorgehensmethoden wie Scrum. Deine Erfahrung in der Umsetzung anspruchsvoller Projekte und deine Leidenschaft für SRE- und DevOps-Kultur, -Methoden...

  • Software Engineer

    vor 5 Monaten


    Zürich, Zürich, Schweiz Anapaya Systems Vollzeit

    "Anapaya Systems is looking to strengthen its team with a Software Engineer. Play a mission-critical role and assume ownership in the entire lifecycle of our products: design, implementation, quality assurance, deployment, and operation. As part of a growing world-class engineering team you will have the opportunity to develop your skills through constant...


  • Zürich, Zürich, Schweiz Ergon Informatik AG Vollzeit

    Beschreibung der RolleAls Site Reliability Engineer (SRE) bei Ergon Informatik AG bist du für den Aufbau und Betrieb der Airlock SaaS-Plattform verantwortlich. Du arbeitest in interdisziplinären Teams und meisterst agile Vorgehensmethoden wie Scrum. Deine Erfahrung in der Umsetzung anspruchsvoller Projekte und deine Leidenschaft für SRE- und...


  • Zürich, Zürich, Schweiz dfinity Vollzeit

    About the RoleWe are seeking an experienced Front End Software Engineer to join our team at DFINITY, a leading decentralized public computing infrastructure company. As a Front End Engineer, you will play a crucial role in building a suite of web portals, mobile apps, and other communications to make the Internet Computer accessible to a wider audience.Key...


  • Zürich, Zürich, Schweiz Ergon Informatik AG Vollzeit

    ÜberblickAls erfahrener Site Reliability Engineer (SRE) bei Ergon Informatik AG bist du für den Aufbau und Betrieb der Airlock SaaS-Plattform verantwortlich. Du arbeitest in interdisziplinären Teams und meisterst agile Vorgehensmethoden wie Scrum. Deine Erfahrung in der Umsetzung anspruchsvoller Projekte und deine Leidenschaft für SRE- und DevOps-Kultur,...


  • Zürich, Zürich, Schweiz Ergon Informatik Vollzeit

    Beschreibung der PositionWir suchen einen erfahrenen Site Reliability Engineer, der sich auf die Entwicklung und den Betrieb von Cloud-basierten Plattformen spezialisiert hat. Der Kandidat wird Teil eines interdisziplinären Teams sein, das sich auf die Entwicklung von Sicherheitslösungen und die Optimierung von Betriebsprozessen...


  • Zürich, Zürich, Schweiz Lightly AG Vollzeit

    About Lightly AGWe are a cutting-edge technology company specializing in software solutions for deep learning model optimization. Our mission is to help businesses overcome the challenges of large-scale data processing and improve their AI capabilities.Job DescriptionWe are seeking an experienced Machine Learning Engineer to join our team as a Computer...


  • Zürich, Zürich, Schweiz Lightly AG Vollzeit

    About Lightly AGWe are a cutting-edge technology company specializing in software solutions for deep learning model optimization. Our mission is to help businesses overcome the challenges of large-scale data processing and improve their AI-driven applications.Job SummaryWe are seeking an experienced Machine Learning Engineer to join our team as a Computer...


  • Zürich, Zürich, Schweiz Nexxiot Vollzeit

    About NexxiotNexxiot is a leading TradeTech company that specializes in hardware-enabled data solutions, aiming to reduce uncertainty in cargo transportation. With a vision to digitize the global supply chain, Nexxiot operates the largest digital fleet of rail cars and intermodal containers, with a growth plan to quadruple its digitized assets by 2027. The...


  • Zürich, Zürich, Schweiz Tamedia Vollzeit

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Tamedia. As a key member of our Publishing Technology Solutions group, you will play a crucial role in shaping the digital future of our company.Key ResponsibilitiesDesign, build, and maintain core infrastructure pieces to support our publishing solutions, ensuring...


  • Zürich, Zürich, Schweiz ART COMPUTER Vollzeit

    About ART ComputerWe are a leading Apple Premium Reseller in French-speaking Switzerland, serving B2B and education customers since 1992. Our team of experts provides top-notch solutions and services to help our clients thrive in the digital landscape.Job SummaryWe are seeking a highly skilled Junior Network and Systems Engineer to join our IT team. As a...


  • Zürich, Zürich, Schweiz Nexxiot Vollzeit

    About NexxiotNexxiot is a leading TradeTech company that specializes in hardware-enabled data solutions. Our mission is to reduce uncertainty in cargo transportation by providing real-time monitoring and analytics of assets and cargo. We operate the largest digital global fleet of rail cars and intermodal containers, with a vision to quadruple our digitized...


  • Zürich, Zürich, Schweiz Rheinmetall Air Defence AG Vollzeit

    Rheinmetall Air Defence AG is seeking a highly motivated and skilled AI Engineer to contribute to the development of cutting-edge air defence solutions. As an integral member of our team, you will play a crucial role in advancing our state-of-the-art deep learning capabilities for target detection and classification.Your Responsibilities:Extend and optimize...