Do IT Now provides High Performance Computing and Artificial Intelligence services, offering consulting, installation, optimization, and support to companies of all sizes, from SMEs to large multinationals. Our clients include Formula One teams, aerospace firms, and institutions in pharma and life sciences. We collaborate with the most innovative leaders in the high-tech industry.
Join us and be part of a technical-oriented, independent team at the forefront of HPC innovation and research. Enjoy the flexibility of 100% remote work, thrive in a multinational and multicultural environment, and benefit from our strong growth. Participate in team-building activities and work with cutting-edge technologies to make a real impact in high-performance computing.
Job Definition
Join our dynamic and innovative team as a Site Reliability Engineer ! Be part of our cutting-edge projects where you'll collaborate seamlessly with cross-functional teams to ensure the reliability, performance, and scalability of our infrastructure and services, with a special focus on our High-Performance Computing (HPC) environments and AI-driven applications. You'll play a crucial role in designing, implementing, and maintaining robust systems that support our company's growth and technological advancement in the realms of HPC, Cloud and AI.
As a passionate member of our team, you'll embody a continuous improvement mindset. Embrace the ever-evolving fields of DevOps, HPC, and AI infrastructure, seizing opportunities to optimize our systems and enhance the performance of our compute-intensive workloads. Be an integral part of our journey towards operational excellence, ensuring AI models and HPC clusters run efficiently and reliably. Let your enthusiasm for cutting-edge technology, reliability, and automation propel you to new heights in a collaborative and forward-thinking environment!
Skills and Experience
Essential Skills:
- High degree (Master or PhD level) in Computer Science, Information Technology, or related field
- Minimum of 3 years’ experience in SRE or DevOps roles
- Proficiency in at least one scripting language (Python, Bash)
- Good knowledge of Linux systems and at least one cloud platforms (AWS, GCP, or Azure)
- Experience with containerization technologies (Docker, Kubernetes)
- Expertise in monitoring and observability tools
- Solid understanding of networking concepts and protocols
- Excellent problem-solving and troubleshooting skills
Preferential requirements:
- Experience with Infrastructure as Code (Terraform, Ansible)
- Knowledge of CI/CD pipelines and tools (Jenkins, GitLab CI, GitHub Actions)
- Familiarity with database systems and their optimization
- Experience with log management and analysis tools
- Understanding of security best practices in cloud environments
Language skills:
Fluency in French and English for effective communication
Personal Attributes
- Team player with a proactive attitude and strong communication skills
- Ability to work independently (especially if remote) and manage multiple priorities
- Adaptability and eagerness to learn new technologies (mandatory)
Why work with us?
- Technology-driven company culture
- 100% remote work opportunity
- Rapid company growth and career advancement possibilities
- Continuous learning and development programs
- At the forefront of the SRE practices
- Startup and Multicultural environment
- Regular team building activities
Join us in our mission to build and maintain highly reliable, scalable, and efficient systems that power our business. If you're passionate about automation, problem-solving, and creating robust infrastructure, we want to hear from you!
Conditions: Permanent – Montpellier/Remote
Remuneration: 50-60k€ We offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living from where the candidate is based.
Occasional on-call duties and business travel