Site Reliability Engineer
- Μαρούσι, Αττικής
- Μόνιμη
- Πλήρης Απασχόληση
- Ensure reliability and uptime of critical production services and infrastructure.
- contribute in the design of scalable monitoring and alerting systems.
- Develop tools and automation to eliminate manual and repetitive tasks.
- Participate in incidents management process.
- contribute root cause analysis and postmortem documentation for production issues.
- Collaborate with software and system engineers to improve system design for reliability.
- Identify and fix system weaknesses or bottlenecks.
- 3+ years of experience in a Site Reliability Engineer, DevOps, or similar role.
- Proficiency in scripting languages (ie. Python etc.).
- Experience with cloud platforms (ie. Azure) and container orchestration (Kubernetes, Docker).
- Strong understanding of Linux systems, networking, and troubleshooting.
- Familiarity with infrastructure as code tools (Terraform, Ansible, etc.).
- Familiarity with observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).
- Strong problem-solving skills and a passion for reliability and performance.