A site reliability engineer (SRE) will:
· Spend 80% of their time doing handling incidents tickets and issues such as booking issues and job failures requiring review and manual intervention.
· Spend the other 20% of their time on development tasks such as monitoring and automation improvements; identifying code defects and deficiencies and suggesting improvements.
· The ideal SRE candidate is a programmer who also has operational, systems or networking knowledge, and likes to whittle down complex tasks.
· The SRE encourages developers and product owners to move quickly by reducing the frequency , cost, and impact of failure.

General Responsibilities:
· Understand business processes and data flows for multiple systems.
· Triage support tickets. Triage, debug and troubleshoot reported issues for a variety of products across the enterprise.
· Monitor, detect and troubleshoot issues during code rollouts to production systems.
· Analyze real-time data to determine issue severity and impact and advise appropriate product teams. Isolate cause of failure from log analysis using tools like Dynatrace and Splunk.
· Find ad-hoc solutions and workarounds for critical issues. Create test scripts to recreate and proof out solutions.
· Determine relationships to known issues and create defect summaries that isolate defects for repair by product development teams; identify duplicate incidents and avoid duplicate defects.
· Communicate incidents and solutions, etc. accurately, clearly, and effectively in writing, in person, and on the phone across interdisciplinary teams
· Verify operational readiness prior to project deployments.
· Identify process gaps and implement process improvements to increase operational efficiency across production systems.
· Participate in the development of tools, systems and processes aimed at improving product supportability and overall support productivity.
· Work with different groups to develop and improve monitors for covered products and systems.
· Working with operations teams to ensure applications and services are highly available and reliable.
· Participate in a 24/7 on-call rotation schedules and handle emergency support as needed.

Required skills/competencies:
· Strong skills in at least one development language (C, Java, RPG, JavaScript, etc.)
· Understanding of relational database concepts and strong SQL language skills.
· Able to formulate, communicate, and implement technical solutions.
· Strong debugging and problem-solving skills.
· End to End Troubleshooting across the tech stack.

Preferred Skills:
· Iseries(AS400) general experience
· RPG language skills a major plus
· Knowledge of all or parts of Spark, Mesos, Akka, Cassandra, and Kafka platform is a plus
· NoSQL experience
· Servers and Cloud technologies (AWS)
· Microservices
· SOAP & Restful Webservices
· MVC & Mobile frameworks
· Linux/Unix basics
· Java 7+ AND J2EE
· MS SQL (Microsoft SQL server), JDBC, DB2, ORACLE
· WAS 7.0/8.0, TOMCAT app
· Knowledge of Agile development practices
· Incident and Ticket system tools (Remedy/ServiceNow/PagerDuty)
· Defect and Development Tracking tools (Jira/ALM)

Preferred Education:
· MS or BS in Computer Science
· 3-5 years of experience in production support or in development of web applications
· Demonstrates strong knowledge and capabilities within specific area of responsibility