Senior Site Reliability Engineer
Location: Seattle, WA
Date Posted: 9/18/2018
Concord Technologies Cloud Fax and Cloud Workflow is the industry’s offers best-in-class services across software as a service (SaaS), fax as a service (FaaS) helping organizations drive innovation and business transformation by lowering costs and reducing IT complexity.
Our team leverages a follow the sun model providing 24/7 coverage. Our platform is expanding exponentially providing great growth and highly rewarding opportunities working in our expanding Platform Operations team.
A unique opportunity to join a rapidly growing team focused on maturing our platform operations and engineering the services and infrastructure which make up the Concord Cloud Platform.
As a Senior Site Reliability Engineer you will be part of the Platform Operations team whose mission is the full stack ownership of a collection of services and/or technology areas shared with our development team. Your deep understanding of service topologies and service dependencies is a key requirement to troubleshoot incidents and define mitigations.
Scope & Complexity
As a Sr. SRE, you will be responsible for solving complex problems defining, designing, deploying, and troubleshooting Concord’s Cloud Service platform and infrastructure. As a Sr SRE you will be an expert at articulating technical characteristics of your services and the dependencies between services, and guide development teams to engineer and add features to the Concord Cloud Service.
- As a Sr. SRE you are the ultimate authority and are accountable for the end-to-end performance and operability of the services you own. You will understand the end-to-end design, configuration, technical dependencies, and overall behavior of the production services you own. In partnership with the development team, you will share the responsibility of ensuring services are designed and delivered as mission critical with a focus on security, resiliency, scale, and performance.
- You will be called in during major incidents as a key subject matter expert when the source of a problem is unknown or unclear
- You will partner with the development team in defining and implementing improvements in service architecture, both current and future.
- Manage the platform with reliability, scalability, resilience, performance and security at the forefront of your approach.
- You will understand and be able to communicate the scale, capacity, security, performance attributes and requirements of the services you own.
- You are a Sr SRE able to understand and communicate every characteristic of your service including:
- degradation and behavior under load of the services and their dependencies
- end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate
- Instrumentation and metrics that clearly describe the service behaviors
- scaling requirements and patterns
- resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
- You will have a clear understanding of automation and orchestration principles, and will be eager to automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt.
- After resolving an incident, you will investigate and document how to more quickly get to root cause and solve the problem next time.
Job-Specific Experience, Education & Skills
- BS or MS in Computer Science, OR Equivalent (Real World Experience and skill count as much as a CS degree)
- 7+ year experience of running large scale customer facing web services.
- Expertise in incident and problem management including timely problem identification, successful resolution, and root-cause analysis.
- Expertise in defining and documenting technical architecture of complex and highly scalable products
- Networking and TCP/IP
- Standard Internet services, such as DNS, HTTP, etc.
- Scripting languages, such as PowerShell, Python, Ruby, Bash, etc.
- REST APIs
- Cloud computing patterns
- Load balancing technologies, DNS and L7 routing
- IT Security and compliance
- Linux internals