HPC Systems Administrator
Linux administrators are back-end IT specialists who install, configure, and maintain Linux operating systems in a variety of organizations. This role involves server-side operations and troubleshooting tasks that support business-critical and development activities. Linux administrators also configure user access and monitor system stability and security through deployment and everyday use.
Maximize HPC compute resource availability and efficiency.
Deploy / refresh / decommission hardware as needed.
Assist data scientists with job submission and lifecycle.
Troubleshoot any issues with software, networking, storage, and compute.
Participate on a team of administrators and architects.
Constantly refine and improve process.
Leverage automation frameworks where possible.
Routinely interface with data scientists / engineers.
Bachelor’s degree preferred.
ITSM foundation training required.
3+ years of IT experience.
General Linux system administration.
Linux kernel and module parameter tuning especially related to CPU and GPU compute.
Linux network tuning with ethtool and related tools.
Linux disk subsystem tuning such as I/O schedulers.
Linux enterprise integration such as system auth with MS Active Directory.
Cluster file systems such as IBM GPFS, RedHat GlusterFS.
Various storage technologies such as EXTFS, BTRFS, MD-RAID, LVM.
Linux security / Best Practices.
Linux health and subsystem monitoring with enterprise systems such as ZABBIX, SolarWinds, Nagios, etc.
Various container frameworks (docker, LXC/LXD, etc).
Container orchestration with kubernettes.
Paravirtualization via KVM/QEMU .
Bash and Python scripting.
Experience with distributed computing in HPC environments.
Experience with job scheduling with frameworks such as SLURM / PBS.
Understanding of basic AI frameworks such as PyTorch.
Fluent English language skills required: verbal and written communication.
Experience working with multicultural teams and customers.
Self-motivated and disciplined.
Excellent meeting facilitation skills, including teleconference and web conference.
Ability to make decisions and resolve incidents.
Ability to explain technical concepts to Management, team members and customers.
Participate in a rotating 24/7 on-call schedule responding to critical issues.
Medical, Dental and Life Insurance. Savings Fund, Vacations Bonus, Christmas Bonus, Grocery Bonus, Annual Bonus.
Vacations and Holidays.
Save on commute
Paid office parking.
Medical related discounts.
In the heart of Puebla, with views of Popocatepetl volcano, restaurants and amenities close by.
Team social events and Christmas Dinner.
Join your colleagues in various sport activities in the area.
Eat & Drink
Enjoy a kitchen stocked with drinks, coffee, and snacks free of charge.