Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Company: Servicenow
Location: Santa Clara
Posted on: June 2, 2025

Job Description:

Company DescriptionIt all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. Join us as we pursue our purpose to make the world work better for everyone.
Job DescriptionThis position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Employment is contingent upon passing the screening. Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.PLATO (Platform Engineering and AI Technology Organization) at ServiceNow is a customer-focused innovative group building intelligent software using a variety of technology stacks to enable end-to-end, industry-leading work experiences for our customers. We are deeply invested in our customers' success, with expertise in advanced technologies and software engineering best practices. We prioritize robustness, performance, and user experience over specific technologies.We are a team of technology professionals and platform engineers with a dual mission: to build and evolve the AI platform, and to partner with teams to create products and end-to-end AI-powered work experiences. We also focus on foundational research, experimentation, and de-risking AI technologies for future innovations.As a Senior Staff Machine Learning Engineer - Site Reliability Engineer you will:

Design, develop, and implement infrastructure, platform, deployment, and observability features that support AI workloads.
Collaborate with researchers, AI engineers, and infrastructure teams to optimize GPU cluster performance, scalability, and reliability.
Enhance the SRE practice by translating operational use cases into software tooling requirements.
Support deployment activities for AI/ML developers.
Write high-quality, scalable, and reusable code, adhering to best practices like code reviews and unit testing.
Work closely with product owners to understand requirements and oversee your code from design through delivery.
Operate Large Language Models (LLMs) on NVIDIA GPUs.
Mentor colleagues and promote knowledge sharing.QualificationsTo succeed in this role, you should have:
- Experience integrating AI into work processes, decision-making, or problem-solving, including using AI tools, automating workflows, analyzing AI insights, or exploring AI's industry impact.
- 8+ years in infrastructure, platform operations, deployments, SRE, and DevOps, focusing on platform health.
- 6+ years managing highly-available distributed workloads on Kubernetes following DevOps principles.
- 6+ years developing with Python, GoLang, Java, or similar languages.
- Experience with DevOps tools like Helm, Ansible, Kubernetes, Prometheus, Splunk, GitLab CI.
- Strong experience operating distributed Linux-based systems and J2EE applications.
- Knowledge of software-defined networking, infrastructure as code, and configuration management.
- Experience developing compliant and secure software for regulated environments.
- Ability to lead projects with significant technical risks to achieve outcomes.We offer a competitive base salary, equity (when applicable), incentives, and comprehensive benefits, including health plans, 401(k), ESPP, matching donations, flexible time off, and family leave programs. Compensation varies based on geographic location and other factors.Additional InformationWork PersonasWe support flexible, remote, and in-office work arrangements depending on job requirements. .Equal Opportunity EmployerServiceNow is committed to diversity and inclusion. We consider all qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, national origin, age, disability, gender identity, marital status, veteran status, or any other protected category. We also consider applicants with arrest or conviction records in accordance with legal requirements.AccommodationsIf you require accommodations during the application process, please contact .Export Control RegulationsEmployment may be contingent upon obtaining export licenses or approvals if required by law, especially for roles with access to controlled technology.From Fortune. 2024 Fortune Media IP Limited. All rights reserved. Used under license.
  #J-18808-Ljbffr

Keywords: Servicenow, Rancho Cordova , Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer, Engineering , Santa Clara, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let Santa Clara recruiters find you. Post your resume for free!

Get Santa Clara Engineering jobs via email.

View more Rancho Cordova Engineering jobs

Other Engineering Jobs

Applications Engineer, Photos - Apple Vision Pro
Description: Applications Engineer, Photos - Apple Vision ProSunnyvale, California, United States Software and ServicesDescriptionYou will work on Photos a productivity app that ships with every Apple Vision Pro (more...)
Company: Apple Inc.
Location: Sunnyvale
Posted on: 05/25/2025

AI Research Engineer - Post Training
Description: Perplexity is seeking top level AI Research Engineers to continue to improve our in house Online LLMs, the Sonar models. Your job is to take advantage of our rich query/answer dataset to continue to scale (more...)
Company: Up Closets of North Cincinnati
Location: San Francisco
Posted on: 05/24/2025

IT Engineer
Description: Polychain Capital, a fast-moving cryptocurrency hedge fund, is seeking a highly skilled and proactive IT Engineer to join our small but growing team. This hybrid role is based in San Francisco and blends (more...)
Company: Polychain Capital
Location: San Francisco
Posted on: 05/24/2025

Salary in Rancho Cordova, California Area | More details for Rancho Cordova, California Jobs |Salary

Azure DevOps Engineer
Description: Azure DevOps Engineer Sunnyvale CATITLE: Azure DevOps EngineerLOCATION: Sunnyvale, CADURATION: 6 to 12 MonthsRATE: DOEJob Duties: li 8-10 years of experience supporting infrastructure or as SRE in (more...)
Company: Redolent Infotech Pvt. Ltd.
Location: Sunnyvale
Posted on: 05/25/2025

EMC Robotics Engineer
Description: Job DescriptionCapgemini Engineering is looking for an EMC Robotics Engineer. This role will require you to work with multi-functional teams to streamline the EMC design and testing process through utilization (more...)
Company: Capgemini
Location: Santa Clara
Posted on: 05/24/2025

Staff, Engineering Dev Lead - Android
Description: Our mission is to empower people to live healthier lives by leveraging our wearables, smartphones, medical devices, AI, and health services. We research, develop, and commercialize innovative digital (more...)
Company: Samsung Electronics GmbH
Location: Mountain View
Posted on: 05/25/2025

Sr. Backend Engineer
Description: Our Mission:Driving technology always feels old. Not by a little bit. We believe vehicles can be a thousand times smarter, safer, and more connected to the world around us, and our mission is to see it (more...)
Company: Rival
Location: Mountain View
Posted on: 05/25/2025

Mobile Engineer
Description: About The RoleAre you an experienced Mobile Fitter ready for your next career move to a company that values your contribution and offers award-winning training opportunities We believe in empowering (more...)
Company: Sunbelt Rentals Careers
Location: Stockton
Posted on: 05/25/2025

AI Security Engineer
Description: Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time.Job overview: br Omada Health is a leading digital care provider dedicated to empowering individuals (more...)
Company: Omada Health
Location: San Francisco
Posted on: 05/24/2025

Optical Packaging/Component Engineer
Description: About the Role: br br Please read the information in this job post thoroughly to understand exactly what is expected of potential candidates. br We're urgently hiring an Optical
Company: Ryzen Solutions
Location: Sunnyvale
Posted on: 05/25/2025

Loading more jobs...

Senior Staff Machine Learning Engineer - DevOps/Site Reliability Engineer

Didn't find what you're looking for? Search again!

Other Engineering Jobs

Log In or Create An Account