Posted 5 Jan 2023, 6:51 pm
Site Reliability Engineer at Syndica
At Syndica, big things happen. Every day, we’re translating vision into reality by tackling new and exciting challenges head-on. This is a breakthrough stage in our company, and you’ll experience firsthand the infectious enthusiasm of our employees and leadership team. You’ll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in crypto.
This role will have primary accountability for maintaining and operating Syndica’s blockchain infrastructure platform. Golang knowledge is a necessity! The team operates with a “run what you write” philosophy and each engineer is responsible for deploying and operating the code they write.
A successful candidate must have demonstrable experience in at least one programming language (preferably Go, Rust or C++), and previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS and GCP hosted infrastructure as well as management of our bare metal RPC nodes. You will be responsible to ensure the environment is configured, managed, and monitored correctly to support the business. You will drive decisions on the right-sizing of servers and storage, troubleshooting performance issues, ensuring the highest level of reliability for the platform, and tuning the environment for maximum scalability, cost efficiency, and security. The ideal candidate will also have prior experience developing applications on either of the three major cloud platforms - AWS, Azure, or GCP via Kubernetes.
- Design, creation, and provisioning of infrastructure
- Administer overall site availability, security, latency and system health
- Responsible for effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructure
- Administer the state of all components in our cloud and bare metal environments
- Deploy, manage, and operate the cloud environments
- Design, build, manage and operate the infrastructure and configuration of SaaS applications with a focus on automation and infrastructure as code
- Design, manage and operate the infrastructure as a service layer (hosted and cloud-based platforms) that supports the different platform services
- Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar tools
- Create the environments and tooling that enables the development team to release code quickly and reliably
- Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, to operating environment, network, and application
- Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plans
- Troubleshoot and solve customer RPC issues
- Ensure that SLAs are met in executing operational tasks
- Work with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the start
- Conduct periodic on-call duties
- Great collaborator with 5+ years of experience in a DevOps or SRE role
- Deep understand of infrastructure-as-code (Terraform, etc.) and deploying large-scale systems reliably
- Strong experience with Infrastructure as Code and Configuration Management tools
- Experience with Prometheus/Grafana for metrics aggregation/visualization
- Configuration of CI/CD pipelines
- Experience using Kubernetes
- Experience with automation tools/platforms
- Experience with alerting and monitoring tools
- Strong knowledge of monitoring and performance analytics tools (DataDog, New Relic, etc.)
- Commitment to implementing reliability and security best practices
- Capacity planning experience, including resource optimization and load testing
- Experience working in a highly distributed company is a plus
- Align a portion of your day with the business hours of Central Time Zone - UTC -6
- Working knowledge of information security issues
- Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
- Systematic problem-solving approach, combined with a strong sense of ownership and drive
- Firm grasp of at least one modern programming language, beyond advanced scripting (Shell or Python)
- Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
- Experience writing automation tools & eagerness to "automate all the things"
What does success in this role look like?
- In three months, you have become our infrastructure administrator with respect to overall site availability, security, latency, system health, customer accounts, and billing. You’ll have taken on independent code review responsibilities and are collaborating on the design of new features
- In six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the team
- In twelve months, you have established a cadence of predictable, on-time delivery without cutting corners
Please mention the word **DILIGENCE** and tag RMzQuODIuMjA2LjIy when applying to show you read the job post completely (#RMzQuODIuMjA2LjIy). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
The offering company is responsible for the content on this page / the job offer.
Source: Remote Ok