•Monitor multiple cloud services • Monitor any issue related to Infrastructure nd resolve at the earliest • Incident Management • Involve required teams in PTB/TB/SWAT, direct activities and provide next steps towards resolution whenever needed. • Regularly provide status to stakeholders on chat as well as conference bridges. • Document the PTB/TB/SWAT Details in Incident Reports Database. • Create/Review/Modify the Runbooks/Situation documents, which includes the technical solutions as well as operational processes.
Job Description
• The person work within 24x7 Global Operations Control Center. Typical services include, BSS, IBM Verse, Box Relay, Watson Workspace, WISPR, DAC. • Analytical, consulting and communication skills. •Llinux, Cloud and DevOps • Linux & Shell Scripting • Self motivated individual who can work independently to identify different alerts required for effective monitoring and developing them • These resources will monitor services and alerts and resolve any issues that arise. • If the OCC is unable to resolve, they would engage relevant teams to address issues. • Update all the stakeholders regularly
Key Technical Skill
Llinux, Cloud and DevOps, Linux & Shell Scripting
Desired Skills
Candidate should know at least 4-5 among the following : • AWS / Softlayer • Cassandra • DB2 • MongoDB • Elastic Search • Pager Duty • New Relic • Redis • Kafka • Messaging • SMTP • Micro Services • Docker • Kubernetes • GitHub • Python • Node.js • Angular.js • Networking • F5 • Akamai • TCP/IP • LDAP • Jira • Grafana • Jenkins • Chef • Ability to run conference calls and manage a service event • Ability to troubleshoot • Ability to track and monitor Alerts • Artistic skills in html / js / css.