포지션 상세

About CoupangPay : Coupang Pay focuses on delivering innovative payment and financial services solutions to everyone who uses the Coupang app — from customers buying products on Coupang.com, to marketplace vendors, and restaurants that offer their services via Coupang Eats. We develop solutions with our latest tech innovations to serve the growing needs of Coupang’s customers in Korea and Taiwan. This includes Coupay, an online wallet with a proprietary one-touch payment capability.

About the role : As a Staff Site Reliability Engineer (SRE) in CoupangPay, you will play a pivotal role in ensuring the reliability, scalability, and performance of our critical systems and services. You will be a technical leader, driving the design, implementation, and optimization of complex systems that meet the demands of a high-availability environment. This role requires deep expertise in the Observability Engineering (OE) stack—including Mimir, Loki, Tempo, and Grafana—and Terraform-based automation. Experienced in setting up, tuning, and scaling observability platforms to support business-critical services with high reliability and performance. As a Staff SRE engineer, you will be involves collaborating with cross-functional teams to architect solutions, identify and resolve system bottlenecks, and establish best practices in operational excellence. With a focus on automation, observability, and incident management, you will also mentor junior engineers, foster a culture of reliability, and contribute to the strategic direction of our product engineering initiatives. This is a unique opportunity to make a significant impact on the stability and scalability of our technology ecosystem.

주요업무

System Reliability and Performance
• Ensure the reliability, availability, and performance of critical systems and services.
• Proactively identify and address system bottlenecks, failures, and performance issues.

Technical Leadership
• Lead the design, implementation, and optimization of scalable and fault-tolerant architectures.
• Provide guidance and mentorship to junior engineers, fostering technical growth.

Automation and Tooling
• Develop and enhance automation tools to streamline operational processes and improve efficiency.
• Champion automation-first principles to reduce manual toil and operational overhead.

Observability and Incident Management
• Build and operate OE stack. Involve in performance tuning, cost optimisation and observability initiatives to best serve the interest of the business.
• Drive incident response, root cause analysis, and post-incident reviews to improve systems.

Collaboration and Best Practices
• Partner with cross-functional teams (e.g., development, product, and infrastructure) to build robust systems.
• Define and implement best practices for reliability engineering, including CI/CD pipelines and infrastructure as code.

Strategic Contributions
• Influence the strategic direction of infrastructure and platform engineering initiatives.
• Evaluate and implement new technologies to enhance system resilience and operational capabilities.

Operational Excellence
• Drive continuous improvement in operational processes, reducing time to resolution for incidents.
• Promote a culture of accountability, innovation, and reliability throughout the engineering organization.

자격요건

• Strong proficiency in programming languages such as Python, Go, or similar.In-depth knowledge of Linux/Unix systems, networking, and distributed systems.
• Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration tools (e.g., Kubernetes, Docker).
• Strong understanding of observability tools (e.g., Prometheus, Grafana, or Datadog).
• Proficiency in Infrastructure as Code (IaC) using Terraform
• Expertise in scaling and tuning Mimir and Loki for high-throughput workloads.
• Familiarity with distributed tracing using Tempo
• Knowledge of performance optimization techniques for high-availability systems.
• Strong collaboration skills with the ability to work across cross-functional teams.
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
• 8+ years of experience in Site Reliability Engineering or related roles in high-availability environments.

[쿠팡페이] Staff, Back-end Engineer (Fintech SRE)

포지션 상세

주요업무

자격요건

기술 스택 • 툴

태그

마감일

근무지역