Job Description
Job Description:
Key Responsibilities:
- Server Installation & Configuration: Install, configure, and deploy servers in data center environments, ensuring they are correctly set up for optimal performance and scalability.
- Hardware Maintenance: Perform regular maintenance and health checks on servers, including monitoring hardware performance, updating firmware, and replacing or upgrading components.
- Troubleshooting & Repairs: Diagnose and resolve hardware and software issues related to the servers, ensuring minimal downtime and maintaining system integrity.
- Performance Optimization: Monitor server performance and implement corrective actions to optimize hardware's efficiency, stability, and reliability.
- Documentation & Reporting: Maintain accurate records of server configurations, maintenance schedules, and troubleshooting efforts. Generate regular reports on server health, performance, and issues.
- Collaboration: Work closely with IT infrastructure teams, network engineers, and other technical staff to ensure seamless server operations and integration with existing infrastructure.
Required Skills and Qualifications:
- Required 5 years of experience in IT support, Troubleshooting etc.
- Bachelor’s degree/High School Diploma.
- Proven experience working with servers or similar high-performance computing hardware.
- Understanding of server hardware, including CPU, memory, storage, networking components, and cooling systems.
- Solid understanding of networking concepts, protocols, and configurations (TCP/IP, DNS, DHCP, etc.).
Preferred Qualifications:
- Experience with NVIDIA-specific hardware and software solutions, including GPUs, CUDA, and other NVIDIA technologies.
- Familiarity with GPU server configurations and use cases, particularly in AI, machine learning, and high-performance computing environments.
- Knowledge of server management frameworks like IPMI, iLO, or similar.
- IT certifications (e.g., CompTIA A+, Cisco CCNA, or similar) are a plus.
- Familiarity with cloud platforms (AWS, Google Cloud, Azure) and their interaction with on-premises server infrastructure.
Additional Information:
Ability to lift heavy hardware components and perform physical installations and repairs in a data center environment. Ability to lift up to 30 pounds regularly.
Ability to bend, stoop, crawl, kneel, crouch, reach, stand for long periods , and move about production and warehouse facilities.
The environment is temperature controlled, but otherwise, it is a typical production environment with loud noises.