Senior System Software Engineer, NCCL
Vor 4 Tagen
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications We are looking for a motivated Partner Enablement Engineer to guide our key partners and customers with NCCL. Most DL/HPC applications run on large clusters with high-speed networking (Infiniband, RoCE, Ethernet). This is an outstanding opportunity to get an end to end understanding of the AI networking stack. Are you ready for to contribute to the development of innovative technologies and help realize NVIDIA's vision?
What you will be doing:
Engage with our partners and customers to root cause functional and performance issues reported with NCCL
Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
Document and conduct trainings/webinars for NCCL
Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.
What we need to see:
B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
Experience working with engineering or academic research community supporting HPC or AI
Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
Expert in Linux fundamentals and a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for large clusters. Experience debugging network configuration issues in large scale deployments
Familiarity with CUDA programming and/or GPUs. Good understanding of Machine Learning concepts and experience with Deep Learning Frameworks such PyTorch, TensorFlow
Deep understanding of technology and passionate about what you do
NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. We offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all.
-
Senior Software Engineer
vor 2 Wochen
Zürich, Zürich, Schweiz Rocken AG Vollzeit CHF 130'000 pro JahrSenior Software Engineer Rocken AG Zürich, Switzerland days ago Role details Contract type Permanent contract Employment type Full-time (> 32 hours Working hours Regular working hours Languages English, German Experience level Senior Compensation CHF 130K Job location Zürich, Switzerland Tech stack Artificial Intelligence Cloud Computing Software Quality...
-
Senior Performance Analysis Engineer
Vor 4 Tagen
Zürich, Zürich, Schweiz NVIDIA Vollzeit CHF 120'000 - CHF 180'000 pro JahrNVIDIA is seeking a Senior High Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join our Performance group. In this exciting role, you will profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training focused on collectives communication and networking. You will...
-
Senior System Engineer
vor 2 Wochen
Zürich, Zürich, Schweiz Klyven Vollzeit CHF 80'000 - CHF 120'000 pro JahrRole DescriptionThe Senior System Engineer is responsible for designing, implementing, maintaining, and optimizing complex IT infrastructure and systems that support organizational operations. This role involves leading technical projects, managing system performance, ensuring network security, and mentoring junior engineers. The Senior System Engineer works...
-
Senior Performance Analysis Engineer
Vor 4 Tagen
Zürich, Zürich, Schweiz NVIDIA Vollzeit CHF 120'000 - CHF 150'000 pro JahrIntelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. GPU Deep Learning has provided the foundation for machines to learn, perceive, reason and solve problems. Today, visual computing is a crucial tool in helping people get along with technology, and NVIDIA has...
-
Senior Performance Analysis Engineer
Vor 4 Tagen
Zürich, Zürich, Schweiz NVIDIA Vollzeit CHF 120'000 - CHF 180'000 pro JahrIntelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. GPU Deep Learning has provided the foundation for machines to learn, perceive, reason and solve problems. Today, visual computing is a crucial tool in helping people get along with technology, and NVIDIA has...
-
Senior Performance Analysis Engineer
Vor 4 Tagen
Zürich, Zürich, Schweiz NVIDIA Vollzeit CHF 120'000 - CHF 180'000 pro JahrIntelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. GPU Deep Learning has provided the foundation for machines to learn, perceive, reason and solve problems. Today, visual computing is a crucial tool in helping people get along with technology, and NVIDIA has...
-
Senior Software Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz GlobalEngineer GmbH VollzeitGlobalEngineer is an innovative company operating in various engineering sectors. In mechanical and plant engineering — for example, in the energy industry — we are engaged in the development and implementation of innovative systems and technologies, as well as in testing and type approval. To strengthen our team, we are looking for an interested and...
-
Senior System Engineer
Vor 2 Tagen
Zürich, Zürich, Schweiz Oliver James VollzeitSenior System Engineer - Linux & KubernetesJob Type: Permanent, Full-time %)Location: Zurich - HybridWe've partnered with one of the leading universities in Zurich that's advancing its IT infrastructure. They are looking for an experienced Senior System Engineer - Linux to strengthen their platform operations and play a key role in shaping the future of...
-
Senior Fullstack Software Engineer C#/.NET
Vor 2 Tagen
Zürich, Zürich, Schweiz Maison du Software VollzeitMaison du Software entwickelt massgeschneiderte Softwarelösungen für die Transport- und Lagerlogistik. Mitten im lebendigen Quartier rund um die Hardbrücke Zürich, arbeiten rund 30 Talente mit Leidenschaft an zukunftsweisenden Lösungen. Entwickelt werden Cloud-Anwendungen, die durch fortschrittliche, verteilte Systeme und intelligente Algorithmen die...
-
Senior Software Engineer Java
Vor 2 Tagen
Zürich, Zürich, Schweiz Adnovum AG Vollzeit100% What you're going to do As a Senior Software Engineer at Adnovum, you will work with highly skilled and experienced engineers who have their work measured against the highest standards. The projects you work on will not only solve the customers' problems but deliver the added value that defines all Adnovum' s software solutions. The main task of a...