Pyth: Reliability Engineer

Operating Pyth Network is a nontrivial challenge. Our price feeds run 24x7. DeFi applications depend on the accuracy and availability of these feeds; an inaccurate price or offline feed can cause serious financial losses. Each feed in turn depends on many different services, some of which are run by our data providers and some by us. It’s a complex system with many different failure modes, but it has to work correctly all the time.

We also run a variety of off-chain services, such as the backend for the pyth.network website, and tools for logging historical data. These services run in a Kubernetes cluster that is managed using Terraform. We also need to ensure these services are running and healthy at all times.

We’re looking for people to help us operate this system and improve its reliability over time. This job has many different aspects, including providing front-line support for incidents, developing automation to manage our infrastructure, and defining deployment plans for high availability.

About us and the Job

We are a small team. About half the team is technical; the other half manages relationships with data providers, developers, and the broader community. (Building a network requires talking to people!)
We are mostly remote. Team members live across the world, in the US, Europe, and Asia. We do have offices in some locations (Porto, Chicago, London, Amsterdam, Singapore) for those who prefer in-office work.
Our team communicates with each other and external developers in English. Strong spoken and written English skills are required.
We operate like a startup in the rapidly-growing and changing DeFi ecosystem. In order to be successful, we must adapt to meet the current needs of the market. Good candidates will help our organization adapt; they are flexible problem solvers who are willing and able to jump on whatever the occasion demands.
Most of our software development is open source. You can look at our github repositories to understand what we typically work on.
We offer a competitive salary and generous benefits package. Furthermore, where applicable, employees may be eligible for token allocations as part of Pyth Network’s employee incentive program.

What You'll Do:

Provide front-line response to incidents and outages, such as unavailable price feeds, or website downtime.
Develop automation tools to provision and manage our infrastructure, including cloud services and Kubernetes clusters. We currently use Terraform to manage our infrastructure, but we’re not married to it and may use different tools in the future. Some of our tools are written in Python and others in Go.
Design and implement operational plans to achieve high availability guarantees for our price feeds and web services. Build redundant service deployments, monitoring solutions, dashboards, and alerting tools to ensure that critical services are running continuously. Support services on development and production environments, from before launch through launch. Benchmark application resource consumption to allocate capacity.
Measure and monitor application metrics (availability, latency, etc.) to understand the health of the system. Work with developers to add metrics and logging to their applications in order to facilitate Grafana dashboards and alerts. Develop logging practices and libraries to standardize metric reporting and alerting across multiple programming languages.

Skills You'll Need:

Comfortable developing software. Writing software is a big part of the job, as we write lots of tools to automate processes and monitor deployments.
Solid understanding of Linux fundamentals, such as processes and permissions, along with an understanding of containers (Docker) and cloud deployments.
Experience troubleshooting, monitoring and debugging cloud-native applications and distributed systems.
Ability to handle shared operational and periodic on-call duties
1+ years of experience supporting critical production environments. Work in financial and crypto markets is a plus.
Predictable and reliable availability.

Please mention the word **HAPPIER** and tag RNDQuMjM0LjE1MS4xMzY= when applying to show you read the job post completely (#RNDQuMjM0LjE1MS4xMzY=). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.

Name	Domain	Expiration	Description	Type
cc_cookie_d1	nomadswork.com	1 Year	Storage of the selection in the cookie layer.	Cookie
_csrf	nomadswork.com	10 minutes	Protection against counterfeiting through cross-website requirements.	Cookie
connect.sid	nomadswork.com	Session	Login session for nomadswork.com	Cookie
hmt_id	hcaptcha.com	1.30 Days	Used for strictly necessary anonymous service-related statistics and for other technical purposes such as availability assistance.	Session
INGRESSCOOKIE, __cfduid, __cflb, session, sessionid	hcaptcha.com	Varies; up to 30 days	Used for strictly necessary technical purposes: load balancing, routing. See further details.	Session
hc_accessibility	hcaptcha.com	Varies; up to 30 days	Used for strictly necessary technical purposes: enables the user to use the accessibility. See further details.	Session
__stripe_mid	stripe.com	Session	Fraud prevention and detection	Cookie
__stripe_sid	m.stripe.com	Session	Fraud prevention and detection	Cookie
m	m.stripe.com	Session	Fraud prevention and detection	Cookie
session	stripe.com	2 months, 29 days	Login session for Stripe Dashboard	Cookie
lsession	stripe.com	7 days	Login session for Stripe Express Dashboard, for Stripe Express users	Cookie
stripe.csrf	stripe.com	1 year	Protection against counterfeiting through cross-website requirements, for users of Stripe Dashboard	Cookie
cliauth_secret	stripe.com	Session	To confirm authentication for the Stripe CLI	Cookie
art_token, cbt_token, cct_token, cdt_token, ect_toksvt_token, lc_token, prt_token, act_token	stripe.com	Session	To confirm authentication for account recovery, bank account changes, login challenges, password resets, support requests, adding an email or a new device	Cookie
NID	stripe.com	Session	Used by reCAPTCHA, an extra security measure that is sometimes used when logging into Stripe.	Cookie
locale	stripe.com	Session	Localization setting for the language used on the website and in the documents	Cookie
country	stripe.com	Session	Localization settings for the country to customize the availability of the product and features	Cookie
lang	stripe.com	Session	Programming language for the code examples in Stripe documents	Cookie
has_intentionally_selected_curl	stripe.com	Session	Displays the code examples in Curl in Stripe documents	Cookie
persisted-tab-#{id}	stripe.com	Session	When the page is updated, it remembers which document tab you are on	Cookie
disable_cmd_f_override	stripe.com	Session	Deactivates the search shortcut cmd + f / ctrl + f for stripe documents and uses the standard behavior of the browser instead (only searches the current page)	Cookie
double_cmd_f_uses	stripe.com	Session	Tracks the use of the shortcut cmd + f / ctrl + f in Stripe documents; to improve usability by not showing the user again a function that he has already used	Cookie
expanded-topics	stripe.com	Session	When page updates are made, remembers which topics are expanded in Stripe documents	Cookie
checkout-test-session, checkout-live-session	stripe.com	Session	To provide the memory function of Legacy Checkout	Cookie
_ga, _gat, _gat_UA-12675062-5, _gid	stripe.com	Session	Google Analytics cookies for analysis and to improve services	Cookie
cid	stripe.com	Session	Stripe analytics "Client ID" to improve services	Cookie
site_sid, __stripe_id	stripe.com	2 hours, 30 minutes	description ...	Cookie
__stripe_orig_props	stripe.com	Session	To assess the effectiveness of marketing campaigns	Cookie
__utma, __utmb, __utmc, __utmt, __utmz	runkit.com	10 minutes	Runkit’s Google Analytics	Cookie
_mkto_trk	marketo munchkin	Session	Tracks page views and the effectiveness of email campaigns	Cookie
muc	twitter	Session	Stripe Atlas Twitter Marketing Campaigns	Cookie
_fbp	facebook.com	Session	Facebook advertising	Cookie
fr	facebook.com	Session	Facebook advertising	Cookie
bcookie, bscookie, lang, Li_sugr, lidc, UserMatchHistory	linkedin.com	Session	LinkedIn advertising	Cookie
IDE	google.com	Session	Google advertising	Cookie
Lidc, Li_sugr	linkedin.com	Session	LinkedIn Insights Tag for Marketing Solutions	Cookie

Name	Domain	Expiration	Description	Type
__tawkuuid	tawk.to	10 years, 2 days	This cookie is placed when using the customer support chat.	Cookie
TawkConnectionTime	tawk.to	Session	This cookie measures the time spent on the Website	Cookie
twk_60e84b5f649e0a0a5ccb6065	tawk.to	10 Jahre, 2 Tage	This cookie is placed when using the customer support chat.	Cookie

Posted 12 May

Reliability Engineer at Pyth

Remote Senior Backend Python Developer

CloudDevs

Testing Engineer

Trial Library

Full Stack Software Engineer

Daisychain