27323- Site Reliability Engineer
Chicago, IL 60606 US
Job Description
Site Reliability Engineer
#JP – 27323
Location: CHICAGO, IL
Duration: 12 MONTHS+
Job Description:
- The Systems Engineer combines software development, networking and system engineering expertise to build and run large-scale, distributed, fault-tolerant software systems and infrastructure.
- This Site Reliability Engineer role is critical in supporting technology advances in our eCommerce platforms.
- This individual will work on systems that must achieve a unique blend of ultra-low latency performance, the capacity to seamlessly facilitate the busiest days in the world economy, and rock-solid reliability and integrity, all while undergoing rapid release cycles.
- Achieving these goals will require an understanding both the underlying technology and the continuous integration and delivery of the applications.
- The candidate will want to solve problem creatively, communicate effectively, and possess the ability to lead others to achieve the critical mission of the team.
- The individual is heavily involved in the development of automated systems to enable the scale and speed necessary to deliver high-performing applications.
Responsibilities:
- Engage in and improve the Continuous Integration/Continuous Delivery Platform Team in the design and development of software tools to reliably manage application delivery.
- Works closely with our Technical Operations and Business Operations teams on critical issues by evaluating the problem, proposing and implementing resolutions, and partnering with other teams as needed for the resolution.
- Developing tools and process for different levels of support to more proactively eliminate issues.
- Support services before go live by actively participating in design consulting, developing platforms and platforms and frameworks, capacity planning and launch reviews.
- Maintain service once launched into production by measuring and monitoring availability, latency, and overall system health to help eliminate issues.
- Practice sustainable incident response and a blameless postmortems.
Qualifications:
- In-depth Linux/Unix system administrator experience where the individual has had root access in a large (10,000+ servers) critical production environment.
- Experience as a cross-functional team member with focus on accountability and collaboration.
- Experience codifying and automating Continuous Integration/Continuous Delivery (CI/CD) processes and solutions.
- Experience working in a Cloud environment, supporting applications and deployments.
- Ability to provide effective and timely solutions.
- Excellent verbal and written communication skills.
- Experience with Kubernetes and Service mesh like Istio.
- Any Programming/Development experience (any exposure to Java, C#, Golang, or scripting languages such as such as Python, Ansible and Terraform is helpful)