The Senior Systems Administrator completes efforts and projects related to the design, installation, operation, support, upgrades and maintenance of infrastructure, including software, servers, networking and storage. Administers, maintains and supports complex and critical infrastructure. This role provides user support for server hosting requests, problem resolution, system changes and upgrades. The incumbent analyzes infrastructure performance and resolves problems or makes appropriate recommendations. This role ensures that all aspects of operability are delivered as part of the implementation process while also ensuring that existing service level is maintained or improved. Contributes to and ensures full compliance to operational standards, procedures and best practices.
This position is responsible for the operation and maintenance of Northwestern University’s Research Computing Infrastructure, consisting of a Supercomputing (HPC) system, multiple on premise server and application environments, and cloud based research computing hosted in Amazon Web Services (AWS). Actively participate in the acquisition, installation, and management of hardware (compute, storage, and network), operating systems (Linux), research and analytical software tools, and Cloud software delivery solutions. Operational aspects of role involve maintaining the environment to optimal working efficiency, scripting and bringing to bear automation tools, administration and monitoring, facilitating and executing “scheduler” related activities, handling support requests, and resolving hardware and software related events. Collaborate regularly with the research computing consulting team, and participate in the creation and execution of standard operating procedures to maintain the integrity and security of on premise and cloud based solutions.
Leads planning, development, and coordination of operations and projects for current and future infrastructures.
Anticipates impact of growth and changes in operations, and recommends design and/or process changes.
Participates in disaster recovery/business continuity planning including backup and recovery procedures and higher availability configurations.
Maintains awareness of new technologies through publications, outside contacts, and ongoing professional development
Ensures data/media recoverability by implementing a schedule of system backups and database archive operations.
Serves as project liaison – implementations or upgrades, acts as the focal point for communications between our team and the Project Leader.
Facilitates coordination and a thorough understanding of requirements, attends project meetings, creates written meeting notes, creates appropriate ticketing, etc.
Maintains policies and standard procedures to increase system uptime.
Identifies training needs and keeps current on application technologies.
Monitors security alerts and ensures that appropriate patches are applied in an automated and timely fashion; works with developers to patch or upgrade custom code for security compliance.
Documents and maintains system standards; researches and recommends innovative approaches for system administration tasks.
Creates and maintains standard OS installation images for virtualization templates.
Administration and support of Supercomputing (HPC) hardware (servers, network components, firewalls), operating system (Linux), utilities, and analytical software tools, storage, and backup system
Facilitation of HPC and research compute “scheduler” toolset and activities
Cloud computing services configuration and support
Monitors application performance on servers.
Evaluates and manages appropriate software and hardware allocations to achieve an optimum performance level.
Performs capacity planning for projecting future growth.
In collaboration with development project teams, builds, rebuilds, and/or updates servers and configures hardware and virtual machines (VM), applications, peripherals, services, networking, storage.
Participation in the implementation and support of cloud based research computing services (Primarily in AWS)
Leads troubleshooting of application, operating system(s) server hardware, network communications and storage problems within infrastructure.
Provides a second level of support; leads service incident and problem resolution efforts to support entrusted applications/products.
Provides user support on deployed servers.
Consults on best practices to users.
Provides data and metrics to support sizing requirements and performance tuning decisions. Participate in related decision processes with managers and leads.
Positive collaborative nature when working with others on team, colleagues from schools, and vendors
Participate in 24x7 On-Call Rotation Schedule
Provides work direction and/or supervises staff such as team members, subordinates, contractors, vendors, students, etc.
Recommends staff hires/terminations
Coaches and mentors staff
Manages projects ensuring timelines and deliverables are met and meet expectations
Performs other duties as assigned.
Successful completion of a full 4-year course of study in an accredited college or university leading to a bachelor's or higher degree or 2 years equivalent experience; OR appropriate combination of education and experience.
4 years System Administration or equivalent experience required.
Please see information highlighted below.
Infrastructure (extends across applications): Amazon Web Services (AWS), GlobusOnline, Hadoop, MapReduce
High Performance Computing (HPC), information security, Linux Operating System, Microsoft Office (Word, Excel, Powerpoint, Access, Outlook), MOAB, Nagios, Solarwinds, or similar, Puppet/Chef/Ansible, Server hardware, Storage hardware, Symantec NetBackup, Windows Operating System.
Programming Languages and Frameworks: Python, Shell Scripting, Scripting and Automation.
Minimum Competencies: (Skills, knowledge, and abilities.)
Excellent verbal and written communication skills, including ability to communicate technical details to non-technical audience
Analytical and conceptual Ability
Linux operating system skills (Services, security, networking, and file system)
Network device and card configurations
Directory Services knowledge
Systems monitoring – Commercial tools and Linux log file analysis
Preferred Qualifications: (Education and experience)
Cloud platform administration knowledge and experience (AWS)
Ability to maintain composure and work effectively in a high pressure setting
Intel server hardware support experience
Scripting and automation tool experience
Preferred Competencies: (Skills, knowledge, and abilities)
Process and procedure creation
As per Northwestern University policy, this position requires a criminal background check. Successful applicants will need to submit to a criminal background check prior to employment.
Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes, including veterans and individuals with disabilities. Women, racial and ethnic minorities, individuals with disabilities, and veterans are encouraged to apply. Hiring is contingent upon eligibility to work in the United States.
Northwestern University is a major private research university with 12 academic divisions located on three campuses in Evanston, Chicago, and Education City in Doha, Qatar. We have approximately 2,500 full-time faculty members, 17,000 graduate and undergraduate students, and over 5,700 full and part-time staff. Northwestern University combines innovative teaching and pioneering research in a highl...y collaborative environment. It provides students and faculty exceptional opportunities for intellectual, personal and professional growth.