Site Reliability Engineer - Storage
Company: xAI
Location: Palo Alto
Posted on: February 16, 2026
|
|
|
Job Description:
Job Description Job Description About xAI xAI's mission is to
create AI systems that can accurately understand the universe and
aid humanity in its pursuit of knowledge. Our team is small, highly
motivated, and focused on engineering excellence. This organization
is for individuals who appreciate challenging themselves and thrive
on curiosity. We operate with a flat organizational structure. All
employees are expected to be hands-on and to contribute directly to
the company's mission. Leadership is given to those who show
initiative and consistently deliver excellence. Work ethic and
strong prioritization skills are important. All employees are
expected to have strong communication skills. They should be able
to concisely and accurately share knowledge with their teammates.
About the role As a Site Reliability Storage Engineer, you will
play a pivotal role in designing, building, and operating exascale
storage systems to manage our cutting-edge AI research data with
unparalleled scalability and reliability across multiple regions.
This role's core responsibility is to make sure our heterogenous
storage systems in on-prem cloud are reliable and performant. We're
seeking engineers with expertise in exascale data management
systems or distributed filesystems to join our mission-driven team.
What you'll do Develop and optimize software to manage exascale
data, enabling efficient and reliable access for xAI researchers
working on advanced AI models. Enhance the reliability,
performance, and cost-effectiveness of xAI's storage infrastructure
to support large-scale AI research workloads. Collaborate closely
with researchers to understand their data use cases and tailor
storage solutions to meet their needs. Implement robust security
measures to safeguard critical datasets, ensuring data integrity
and confidentiality. Ideal Experience You'd be an exceptional
candidate if you possess some (or all) of the following: Writing
scalable, high-performance code in Rust or Go for storage-related
applications or tooling. Managing storage infrastructure with IaC
tools like Pulumi, Terraform, or Ansible. Past experience working
with storage vendors facilitating partnership alignment, and
integrating their tooling within xAI's Infrastructure. Familiarity
with Kubernetes storage primitives (e.g., Persistent Volumes, CSI
drivers) and integrating storage with containerized workloads.
Bonus: Experience with AI/ML data pipelines, including handling
large datasets for training and inference. Tech Stack Kubernetes
Pulumi Rust and Go Interview Process After submitting your
application, the team reviews your CV and statement of exceptional
work. If your application passes this stage, you will be invited to
a 45 minute interview ("phone interview") during which a member of
our team will ask some basic questions. If you clear the initial
phone interview, you will enter the main process, which consists of
four technical interviews: Coding assessment in Python, Golang, or
Rust Systems hands-on: Demonstrate practical skills in a live
problem-solving session. Coding assessment or system design
discussion based on the candidate's background. Project deep-dive:
Present your past exceptional work to a small audience. Every
application is reviewed by a member of our technical team. All
interviews will be conducted via Google Meet. We do not condone
usage of AI in interviews and have tools to detect AI usage. Annual
Salary Range $180,000 - $440,000 USD Benefits Base salary is just
one part of our total rewards package at xAI, which also includes
equity, comprehensive medical, vision, and dental coverage, access
to a 401(k) retirement plan, short & long-term disability
insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer. For details on data
processing, view our Recruitment Privacy Notice.
Keywords: xAI, Woodland , Site Reliability Engineer - Storage, IT / Software / Systems , Palo Alto, California