Job Description

Role: AI Service Hosting AIOps Engineer

Location: Atlanta, GA Remote

Type: Contract

Description:

Our client is one of the largest MSOs (Multiple Systems Operator) in the US. We are working with their AI infrastructure team and they building a new team of platform specialists (third party labor) to support and enhance high-performance AI services. These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads.
As an AI Platform Specialist, these roles will provide application and GPU support. The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution. The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services. Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.

Position's General Duties and Tasks

In these roles you will be responsible for:

Platform Support & Incident Response
- Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
- Troubleshoot and resolve issues related to GPU utilization, and service performance.
- Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues.
GPU Infrastructure & AI Services Management
- Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
- Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).
Observability & Documentation
- Maintain detailed operational documentation, runbooks, and troubleshooting guides.
- Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.
Process Improvement & Collaboration
- Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services.
- Contribute to CI/CD pipelines, automation, service, and security best practices.
- Track and communicate work through task management platforms (ServiceNow and Jira).

Requirements for this role include:

Hybrid Cloud In-depth knowledge of private (on-premises) and public (GCP & AWS) cloud architectures and services.
AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.
AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).
Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.
Exposure to AI coding assistants like Codeium, Copilot, or Tabnine.
Technical Support & Troubleshooting Proven ability to diagnose and resolve customer and platform issues in production environments.
Strong Communication & Documentation Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams.
Time Management & Accountability Ability to work independently, prioritize tasks, and manage workload effectively.

Preferences: - Optional (nice-to-have's)

Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.

Required schedule availability for this position is Monday-Friday (09:00am to 05:00pm EST). The shift timings can be changed as per client requirements. Additionally, resources may have to do overtime and work on weekend's basis business requirement.

Job Tags

Contract work, Remote job, Shift work, Weekend work, Monday to Friday,

Similar Jobs

Maison D'Enfants

Full Time Nanny / Housekeeper for East Hampton Family #FT0146 Job at Maison D'Enfants

...Full Time live-out Housekeeper/Nanny required for an East Hampton family with one toddler starting ASAP. This role is a full time hybrid role as their child is in daycare, Monday through Friday. Specific duties to be negotiated with the family, but in general, you will...

Canterbury English

ESL TEFL Jobs, English Teachers Needed in Exciting Madrid, Spain! Job at Canterbury English

...and would you like to begin a new career path and life? English Teacher: Are you already an English teacher with our without a TEFL or... ...Certificate? First Job: Are you only 18 and dream of becoming an ESL teacher? Retired: Are you retired or semi-retired and ready...

Prairie View A&M University

Adjunct Instructor ( Criminal Justice) Job at Prairie View A&M University

...the Criminal Justice Department. This is a part-time non-tenure track position within the... ...will teach undergraduate course(s), hold office hours, grade papers, mentor students and... ...more courses as needed. Keep records, post grads (mid-term & final, submit attendance...

Palo Alto County Health System

EMT, Paramedics, and Ambulance Drivers - Graettinger, Ruthven, West Bend, and Emmetsburg Job at Palo Alto County Health System

...Position Overview: Palo Alto County Health System (PACHS) is seeking a dedicated PRN Ambulance Driver to assist EMTs with providing out-of-hospital care appropriate to the age of each patient per protocols established by the medical director. This role involves driving...

EVS Inc

Customer Service - Entry Level Job at EVS Inc

...Now accepting applications & interviewing for Customer Service positions with our company. We are now hiring candidates looking for a long term career in customer service. Our clients... ...marketing and sales openings. No experience in advertising or public relations needed...

AI Service Hosting AIOps Engineer Job at VDart Inc, Remote

L01qKytJalhpMTVHVk5xSldPS29SU1lLSkE9PQ==