Site Reliability Engineer
Vor 7 Tagen
The SRE team at DFINITY is charged with creating tools, processes, and frameworks that ensure the stability of the Internet Computer, which is distributed and scalable. As a member of the team you will work with engineering, infrastructure, and security teams to bake reliability and operability into the product from the start, by participating in design and code reviews, identifying risks, problems, and mitigations. This is not a team that exists to be on-call; this is a team that elects to be on-call because it helps do the job better.
Responsibilities
Service Management: Design, build, deploy, and maintain services to ensure the high availability and reliability of DFINITY's products and the Internet Computer Protocol (ICP). Automation: Identify and implement opportunities to automate processes through coding, enhancing efficiency and reducing manual intervention. Reliability and Operability: Integrate reliability and operability into the product from the start by participating in design and code reviews, identifying risks, and proposing mitigations. Collaboration: Work with engineering and security teams to establish processes that align with the goals of the Internet Computer while remaining operationally feasible and automatable. Service Level Objectives (SLOs): Collaborate with product owners to define SLOs and implement them in code and observability infrastructure. On-Call Duty: Participate in on-call duties for production services on a 12/7 schedule, split across two sites. On-call duty is approximately 1 week every 6 weeks. Coordinate incident response and ensure resolution, involving engineers from other teams as necessary. On-call work is compensated with a monetary and a time off compensation. On-Call Philosophy: Our team chooses to be on-call because it enhances our ability to identify and address system alerts, ultimately improving performance. Unix Systems: Operate, troubleshoot, and deploy software on Unix systems.Requirements:
Observability: Proven experience in monitoring and maintaining large production systems using tools such as Prometheus, Victoria Metrics, Elastic Search, and Grafana. Kubernetes: Proficiency in managing multiple observability stacks across various availability zones, leveraging Kubernetes for deployment orchestration. Rust Coding: Extensive experience in designing and developing moderate-sized applications (up to ~10K lines of code) in Rust. Skilled in setting up automated testing and CI/CD environments. Ability to identify and implement opportunities for automation and process improvement. Experience in developing reliability engineering tools for large open-source projects is highly desirable. Systemic Thinking: Capable of approaching problems methodically and systemically, especially during troubleshooting. Pragmatism: Ability to balance immediate needs with long-term goals, understanding when a solution is "good enough for the next 12 months." Incident Response: Expertise in coordinating incident response across multiple teams, with excellent communication skills to clearly understand the situation, next steps, and team responsibilities. Reliability Engineering: Preferable experience in Site Reliability Engineering (SRE) within a crypto environment where decisions are governed by DAOs. Security Background: Experience in building security-sensitive tools and managing security risks in such environments. A background in DevSecOps is highly desirable. Community interaction: Proven experience in engaging with community members of large open-source projects. Ideally, the candidate is already active within the ICP community.Within 1 month, you will:
Gain a thorough understanding of DFINITY's infrastructure and production environment. Start working on a suitable starter project. Submit improvements to our documentation and processes based on your onboarding experience.Within 3 months, you will:
Successfully deliver your starter project. Shadow team members on-call, preparing to join the on-call rotation from month 4 onwards. Proactively identify and propose improvements, initiating projects to implement them. About DFINITY and the Internet Computer:
The Internet Computer is the fastest and only infinitely scalable general-purpose blockchain — incubated and launched by the DFINITY Foundation in May 2021. A team of over 200 world-renowned cryptographers, distributed systems engineers, and programming language experts have taken on the massive technological challenge of building, maintaining, and continuously improving a 'world computer' powerful enough to host Web3 dApps, DeFi, games, NFTs, social media, and metaverse projects.
DFINITY was founded in 2016 by entrepreneur and crypto theoretician, Dominic Williams, and attracted interest and financial contributions from early members of the Ethereum community. Later, top-tier institutions such as Andreessen Horowitz, Polychain Capital, and SV Angel backed the Internet Computer in a collective effort to help build out Web3.
All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
-
Site Reliability Engineer
Vor 5 Tagen
Zürich, Zürich, Schweiz Source Technology VollzeitFor our customer, a major central bank in Basel, we are seeking a Site Reliability Engineer for a long term contract.Working Conditions:50% Office based50% WFHPosition requires relocation8 hour days218 day year114 CHF per hourResponsibilities:Around 7 years of experience in Financial Services (Banking, Insurance Consulting) with at least 3+ years of...
-
Site Reliability Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz Travis Edwards VollzeitDirect message the job poster from Travis EdwardsConnecting Cyber and Tech talent with innovative brands around the world (Over 70+ Recommendations)Site Reliability Engineer (Blockchain) – Switzerland (Hybrid)Company Overview:Join a rapidly growing Blockchain platform at the forefront of innovation With a global team of approximately 150 experts, we are...
-
Site Reliability Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz Nexxiot VollzeitNexxiot is a TradeTech leader with hardware-enabled data solutions and a Vision to Reduce Uncertainty in Cargo. Nexxiot operates the most significant digital global fleet of around 300'000 Rail cars and 800'000 Intermodal containers in 2023 and follows an ambitious growth plan to quadruple the number of digitized assets by 2027. Nexxiot empowers carriers,...
-
Site Reliability Engineer Linux
vor 4 Wochen
Zürich, Zürich, Schweiz Inventx AG VollzeitSite Reliability Engineer Linux 80 - 100% Du wählst - arbeite an unseren Standorten in Chur, The Circle/Zürich, St. Gallen, Bern oder im Home-Office, dabei stehen dir attraktive und flexible Voll- und Teilzeitmodelle offen."Nach zehn Jahren in der Branche habe ich bei Inventx Technologien gesehen, die ich mir nicht hätte träumen lassen – jeden Tag eine...
-
Site Reliability Engineer Linux
vor 4 Wochen
Zürich, Zürich, Schweiz Inventx AG VollzeitSite Reliability Engineer Linux 80 - 100% Du wählst - arbeite an unseren Standorten in Chur, The Circle/Zürich, St. Gallen, Bern oder im Home-Office, dabei stehen dir attraktive und flexible Voll- und Teilzeitmodelle offen. "Nach zehn Jahren in der Branche habe ich bei Inventx Technologien gesehen, die ich mir nicht hätte träumen lassen – jeden Tag...
-
Reliability Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz AYES - Management & Technology Consulting VollzeitPosition: Reliability Engineer – Aeronautics SectorContract: Permanent contract, full-time positionSalary: Competitive, based on experience (to be discussed during the selection process)AYES is a Switzerland-based Engineering and Technology consulting company headquartered in Zürich. Our core business is in the industrial world, with a primary focus on...
-
Site Reliability Engineering Spezialisten
vor 1 Woche
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitQualifikation und ErfahrungUm erfolgreich bei uns zu sein, benötigst du:Eine Ausbildung in Informatik, Stufe Bachelor oder MasterKenntnisse in System Engineering (Cloud, Linux, Windows, Kubernetes, etc.) und/oder Software Engineering (Java oder andere OOP)Erfahrung mit DevOps, Site Reliability Engineering oder in ähnlichem UmfeldGute Deutsch- und...
-
Platform Reliability Engineer
Vor 6 Tagen
Zürich, Zürich, Schweiz kaiko VollzeitAbout kaikoWe're a medical startup working towards a future where doctors can make informed decisions about cancer treatment quickly and accurately.Our team is passionate about AI and machine learning, and we're committed to using these technologies to improve patient outcomes.We believe that collaboration, ownership, and ambition are essential for success,...
-
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitWir suchen Expert*innen, die sich für die Entwicklung von innovativen Technologielösungen begeistern.Als Teil unseres Teams werden Sie bei der Implementierung von Cloud-Native und AI-getriebenen Continuierlichen Delivery-Lösungen unterstützt.Wir arbeiten eng mit unseren Kunden zusammen, um gemeinsam transformative Projekte zu...
-
Reliability Engineering Leader
Vor 5 Tagen
Zürich, Zürich, Schweiz Source Technology VollzeitWe're looking for a reliability engineering leader who can guide our team at Source Technology. The ideal candidate will have around 7 years of experience in Financial Services (Banking, Insurance Consulting) and at least 3+ years of experience as a Site Reliability Engineer or similar role.This role requires strong leadership skills to manage the response...
-
Reliability Engineering Specialist
Vor 2 Tagen
Zürich, Zürich, Schweiz AYES - Management & Technology Consulting VollzeitAbout the RoleWe are seeking a highly skilled Reliability Engineer to join our team in the Aeronautics Sector.This is a full-time position that involves developing and improving reliability processes within the organization.You will provide a global overview of reliability methodologies and strategies, identify and implement design improvements related to...
-
Site Reliability Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz Crypto Finance AG VollzeitJoin us at Crypto Finance Group, where everyone can make a significant impact. We invite you to be part of our journey and help shape the future of our company.About Crypto FinanceCrypto Finance Group, part of Deutsche Börse Group, provides professional digital asset solutions to institutional clients. The Group comprises of Crypto Finance AG, regulated by...
-
CI/CD Engineer
vor 4 Wochen
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitExpertinnen und Experten für Cloud-native und AI-driven Software-Delivery und Operations gesucht Es warten spannende Projekte bei grossen und kleinen Kunden im Bereich CI/CD sowie Site Reliability Engineering (SRE), Observability und AIOpsWer wir sindWir sind spezialisiert darauf, Transformations-Projekte für Cloud-native und AI-driven Continuous Delivery...
-
Observability & AIOps Engineer/Consultant
vor 4 Wochen
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitExpertinnen und Experten für Cloud-native und AI-driven Software-Delivery und Operations gesucht Es warten spannende Projekte bei grossen und kleinen Kunden im Bereich Observability & AIOps sowie Site Reliability Engineering (SRE) und CI/CDWer wir sindWir sind spezialisiert darauf, Transformations-Projekte für Cloud-native und AI-driven Continuous Delivery...
-
DevOps Engineer
vor 3 Wochen
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitWer wir sindWir sind spezialisiert darauf, Transformations-Projekte für Cloud-native und AI-driven Continuous Delivery (CI/CD) sowie Observability & AIOps umzusetzen. Dabei wird Site Reliability Engineering (SRE) als neue DevOps-Disziplin aufgebaut und in Projekten umgesetzt. In all diesen Bereichen arbeiten wir mit Firmen in unterschiedlichen Grössen und...
-
DevOps Engineer
Vor 4 Tagen
Zürich, Zürich, Schweiz Digital Architects Zurich VollzeitWer wir sindWir sind spezialisiert darauf, Transformations-Projekte für Cloud-native und AI-driven Continuous Delivery (CI/CD) sowie Observability & AIOps umzusetzen. Dabei wird Site Reliability Engineering (SRE) als neue DevOps-Disziplin aufgebaut und in Projekten umgesetzt. In all diesen Bereichen arbeiten wir mit Firmen in unterschiedlichen Grössen und...
-
IT Infrastructure Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz K&K social resources and development GmbH VollzeitJob OverviewWe are looking for an IT Infrastructure Engineer to join our team. As an IT Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining IT infrastructure systems.Your key responsibilities will include:Designing and implementing IT infrastructure systems, including UNIX/LINUX, Oracle PLSQL, Shell Scripting,...
-
DevOps and SRE Expert for Blockchain
Vor 2 Tagen
Zürich, Zürich, Schweiz Travis Edwards VollzeitJob DescriptionAs a Site Reliability Engineer, you'll be responsible for ensuring the reliability, availability, and performance of our blockchain platform. This involves designing, building, and maintaining DevOps infrastructure using best practices, automating deployment, monitoring, and incident response to improve operational efficiency, and...
-
Deep Learning Operations Engineer
Vor 6 Tagen
Zürich, Zürich, Schweiz Noimos VollzeitJob DescriptionThe Machine Learning MLOps Engineer will be responsible for designing, implementing, and maintaining MLOps infrastructure on GCP (Terraform, Pub/Sub, Vertex AI, CI/CD, GitHub Actions) to ensure scalability, automation, and reliability.Design and implement scalable MLOps infrastructure using Terraform and other cloud-based tools.Implement...
-
Requirement Engineer Position
Vor 7 Tagen
Zürich, Zürich, Schweiz talendo AG VollzeitTech Requirement Engineer Job DescriptionIn this role, you'll be responsible for maintaining, troubleshooting, and enhancing application workflows while ensuring smooth operations, stability, and reliability.As a Tech Requirement Engineer, you'll collaborate with business users to understand their pain points and translate them into technical requirements...