Posted 13 Jul 2022, 0:15 pm

Site Reliability Engineer at Cordial

POSITION SUMMARY

We are looking for a motivated and talented Site Reliability Engineer to join us from our remote European team to help us monitor, develop, and scale the Cordial platform. Our goal is to provide our clients with a delightful experience in their day to day interaction with the platform and to create trust that the expected jobs and background processes will run without issue. You will work with our DevOps and Product teams to ensure that bugs are squashed, performance is optimized, and blind spots are revealed through comprehensive monitoring. This position is fully remote with no physical Cordial office located in Portugal.

YOU WILL

  • Utilize your knowledge of Web, App, Network, Server, Storage and Security technologies to administer, monitor and troubleshoot application and network components in our cloud based environment
  • Actively contribute to Infrastructure Design and Implementation discussions
  • Provide production support for the Product Development teams
  • Participate in an on-call rotation
  • Work with the team to develop and deploy monitoring and alerting architecture, and implement monitoring/logging solutions
  • Troubleshoot complex issues in a timely manner as necessary to maintain the performance and stability of our Production Application environment
  • Help build out SLOs and document and monitor SLAs

ABOUT YOU

  • 3+ years UNIX/Linux Systems (Unix/Linux) & Network Administration (DNS, IPsec, VPN, Load Balancing, process tracing)
  • Experience with AWS (we use EC2, EKS)
  • Experience with monitoring, logging and alerting tools
  • Previous positions held as a SRE and/or DevOps role
  • Software development experience
  • Experience with Docker/containers & Kubernetes
  • Comfortable working in a globally distributed team across time zones
  • Strong teamwork and communication skills
  • A genuine desire to learn new technologies and grow
  • Fluent in verbal and written English

BONUS

  • Experience with MongoDB
  • Experience deploying and/or maintaining Kubernetes/EKS clusters
  • Experience with Prometheus/Grafana/Datadog
  • Experience implementing SLOs, reliability targets, error budgets


Be sure to mention the word **CHEER** and tag RMTk1LjIwLjI0MS40OQ== when applying to show you read the job post completely. RMTk1LjIwLjI0MS40OQ==This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.

The offering company is responsible for the content on this page / the job offer.
Source: Remote Ok