The Senior Systems Administrator completes efforts and projects related to the design, installation, operation, support, upgrades and maintenance of infrastructure, including software, servers, networking and storage. Administers, maintains and supports complex and critical infrastructure. This role provides user support for server hosting requests, problem resolution, system changes and upgrades. The incumbent analyzes infrastructure performance and resolves problems or makes appropriate recommendations. This role ensures that all aspects of operability are delivered as part of the implementation process while also ensuring that existing service level is maintained or improved. Contributes to and ensures full compliance to operational standards, procedures and best practices.
- This position is responsible for the operation and maintenance of Northwestern University’s Research Computing Infrastructure, consisting of a Supercomputing (HPC) system, multiple on premise server and application environments, and cloud based research computing hosted in Amazon Web Services (AWS). Actively participate in the acquisition, installation, and management of hardware (compute, storage, and network), operating systems (Linux), research and analytical software tools, and Cloud software delivery solutions. Operational aspects of role involve maintaining the environment to optimal working efficiency, scripting and bringing to bear automation tools, administration and monitoring, facilitating and executing “scheduler” related activities, handling support requests, and resolving hardware and software related events. Collaborate regularly with the research computing consulting team, and participate in the creation and execution of standard operating procedures to maintain the integrity and security of on premise and cloud based solutions.
- Leads planning, development, and coordination of operations and projects for current and future infrastructures.
- Anticipates impact of growth and changes in operations, and recommends design and/or process changes.
- Participates in disaster recovery/business continuity planning including backup and recovery procedures and higher availability configurations.
- Maintains awareness of new technologies through publications, outside contacts, and ongoing professional development
- Ensures data/media recoverability by implementing a schedule of system backups and database archive operations.
- Serves as project liaison – implementations or upgrades, acts as the focal point for communications between our team and the Project Leader.
- Facilitates coordination and a thorough understanding of requirements, attends project meetings, creates written meeting notes, creates appropriate ticketing, etc.
- Maintains policies and standard procedures to increase system uptime.
- Identifies training needs and keeps current on application technologies.
- Monitors security alerts and ensures that appropriate patches are applied in an automated and timely fashion; works with developers to patch or upgrade custom code for security compliance.
- Documents and maintains system standards; researches and recommends innovative approaches for system administration tasks.
- Creates and maintains standard OS installation images for virtualization templates.
- Administration and support of Supercomputing (HPC) hardware (servers, network components, firewalls), operating system (Linux), utilities, and analytical software tools, storage, and backup system
- Facilitation of HPC and research compute “scheduler” toolset and activities
- Cloud computing services configuration and support
- Monitors application performance on servers.
- Evaluates and manages appropriate software and hardware allocations to achieve an optimum performance level.
- Performs capacity planning for projecting future growth.
- In collaboration with development project teams, builds, rebuilds, and/or updates servers and configures hardware and virtual machines (VM), applications, peripherals, services, networking, storage.
- Participation in the implementation and support of cloud based research computing services (Primarily in AWS)
- Leads troubleshooting of application, operating system(s) server hardware, network communications and storage problems within infrastructure.
- Provides a second level of support; leads service incident and problem resolution efforts to support entrusted applications/products.
- Provides user support on deployed servers.
- Consults on best practices to users.
- Provides data and metrics to support sizing requirements and performance tuning decisions. Participate in related decision processes with managers and leads.
- Positive collaborative nature when working with others on team, colleagues from schools, and vendors
- Participate in 24x7 On-Call Rotation Schedule
- Provides work direction and/or supervises staff such as team members, subordinates, contractors, vendors, students, etc.
- Recommends staff hires/terminations
- Coaches and mentors staff
- Manages projects ensuring timelines and deliverables are met and meet expectations
- Performs other duties as assigned.
- Successful completion of a full 4-year course of study in an accredited college or university leading to a bachelor's or higher degree or 2 years equivalent experience; OR appropriate combination of education and experience.
- 4 years System Administration or equivalent experience required.
- Please see information highlighted below.
- Infrastructure (extends across applications): Amazon Web Services (AWS), GlobusOnline, Hadoop, MapReduce
- High Performance Computing (HPC), information security, Linux Operating System, Microsoft Office (Word, Excel, Powerpoint, Access, Outlook), MOAB, Nagios, Solarwinds, or similar, Puppet/Chef/Ansible, Server hardware, Storage hardware, Symantec NetBackup, Windows Operating System.
- Programming Languages and Frameworks: Python, Shell Scripting, Scripting and Automation.
- Analytical: Critical thinking, decision making, judgment, problem solving, read & interpret technical drawings
- Project: collaboration and teamwork, evaluate resources, facilitate collaboration, functional documentation, organizational skills, planning.
Minimum Competencies: (Skills, knowledge, and abilities.)
- Excellent verbal and written communication skills, including ability to communicate technical details to non-technical audience
- Decision Making
- Meets deadlines
- Crisis Management
- Analytical and conceptual Ability
- Linux operating system skills (Services, security, networking, and file system)
- Network device and card configurations
- Directory Services knowledge
- Systems monitoring – Commercial tools and Linux log file analysis
Preferred Qualifications: (Education and experience)
- Cloud platform administration knowledge and experience (AWS)
- Ability to maintain composure and work effectively in a high pressure setting
- Intel server hardware support experience
- Scripting and automation tool experience
Preferred Competencies: (Skills, knowledge, and abilities)
- Problem solving
- Process and procedure creation
- Results Driven
- Strategic Thinking
As per Northwestern University policy, this position requires a criminal background check. Successful applicants will need to submit to a criminal background check prior to employment.
Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes, including veterans and individuals with disabilities. Women, racial and ethnic minorities, individuals with disabilities, and veterans are encouraged to apply. Hiring is contingent upon eligibility to work in the United States.