Site Reliability Engineer

HKT

Αθήνα
Μόνιμη
Πλήρης Απασχόληση

Πριν 1 μήνα

PCCW Global is a leading telecommunications provider, offering the latest voice and data solutions to multi-national enterprises and communication service providers. Our truly global coverage combined with local, on the ground knowledge has helped us build best in class connections across the globe linking Asia-Pacific, Europe, the Americas, the Middle East and Africa.Job Title: Site Reliability Engineer (SRE)Location: EU/UK/USAbout Us:We are at the forefront of software defined networking innovation, dedicated to delivering exceptional digital experiences to our customers. Our commitment to reliability, scalability, and agility in our services is unwavering. We are seeking a Site Reliability Engineer (SRE) who shares our passion for excellence and innovation to join our team.Job Description:As an SRE, you will be instrumental in ensuring the optimal performance and reliability of our systems and services. Your role will bridge the gap between development and operations, focusing on creating automated solutions, managing system infrastructures, ensuring robust security, and maintaining high availability. You will be pivotal in enhancing our operational capabilities and driving continuous improvement across our platforms.Key Responsibilities:

System Reliability: Oversee the availability, performance, and scalability of edge cloud services and infrastructure, ensuring they meet or exceed our service-level objectives and agreements.
Incident Management: Lead the response to service incidents and outages, including participating in on-call rotations, resolving issues efficiently, and conducting thorough post-incident analyses.
Performance Optimization: Continuously monitor and optimize system performance, identifying and addressing bottlenecks to improve efficiency and reduce latency.
Capacity Planning: Conduct capacity planning and forecasting to accommodate system growth and peak loads, ensuring system resilience and performance.
Automation: Develop and implement automation strategies for operational tasks and deployment processes to enhance system stability and reduce manual errors.
Disaster Recovery: Design and manage disaster recovery plans, ensuring data integrity and system resilience against potential threats.
Security: Enforce robust security policies and practices, regularly audit systems for vulnerabilities, and apply necessary security patches and updates.
Collaboration: Work closely with development teams and other stakeholders to ensure the reliability and scalability of systems and services.
Continuous Improvement: Lead initiatives to continuously improve processes, practices, and systems, ensuring the highest levels of reliability and efficiency.
Documentation: Create and maintain detailed documentation for system architectures, configurations, and operational procedures.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
Proven experience of 3+ years as a Site Reliability Engineer, DevOps Engineer, or similar role in a complex networking environment.
Strong background in Linux/Unix administration and scripting languages such as Python or Bash.
Experience with automation/configuration management tools (e.g., Ansible, Git).
Familiarity with cloud services (AWS, GCP, Azure) and container orchestration tools (e.g., Kubernetes).
Deep understanding of network protocols and services (DNS, HTTP/S, SSH, FTP).
Excellent problem-solving, troubleshooting, and communication skills.
Ability to work in a fast-paced, evolving environment and collaborate effectively with a diverse team.

Ref ID: 47935Location:Athens, A1, GRBusiness Unit: PCCW GlobalFull Time/ Part Time: Full TimeJob Function: TechnologyFeatured Job Category::

HKT

Αίτηση