Cloud/devops Production Support Specialist Resume
Plano, TX
SUMMARY
- Qualified Professional with 5 plus years of experience in the areas of DevOps, AWS cloud computing, VMware, Linux administration and IT Production Support.
- Familiar with all aspects of software development life cycle (SDLC) such as Analysis, Planning, Developing, Testing and Implementing, Monitoring and Post - production analysis of the projects.
- Experience in a UNIX environment, Windows and expertise in several flavors of Linux including Red Hat, CentOS, and Ubuntu
- Good experience onDevOpstools such as ANT, Maven, Chef, Vagrant, Virtual Box, Jenkins, and Docker.
- Experience in working with Version control systems like SVN, GIT, GitHub&GitLab.
- Proficiently experienced in Amazon Web Services cloud platform and its features like EC2, S3, RDS, Route53, EBS, ELB, SNS, Auto-Scaling, AMI, IAM, Cloud Watch and VPC cloud formation throughAWSConsole andAWS CLI.
- Conducted installation of AWS CLI to control various AWS services through SHELL/Bash scripting.
- Experienced in Implementing Software Configuration Management processes on projects including setting up and supporting Continuous Integration.
- Experienced in Configuring Puppetto perform automated deployments. Expert in User Management and Plugin Management for Puppet.
- Architected, planned, developed & maintained Infrastructure as code using CI/CD deployments using Puppet.
- Proficient with container systems likeDockerand container orchestration likeEC2 Container Service,Kubernetes, worked withTerraform.
- Expert in object-oriented programming OOP concepts.
- Good understanding of the principles and best practices of Software Configuration Management (SCM) in Agile, scrum, and Waterfall methodologies.
- Experienced in configuration and administration of tools such as Jenkins for setting up and configuring projects, define scheduling policies, install Master/slave agents, and perform server upgrades.
- Experienced in database technology suchSQL, MySQLincluding NoSQL databases likeMongoDB.
- Expertise in Installing/Configuring and Managing NexusRepository Manager and all the Repositories.
- Experience using Tomcat, JBOSS and Web Sphere Application servers for deployments.
- A highly motivated team player who can work efficiently to deliver the product on schedule.
- Experienced Working with customers to design and configure the Jiraand Confluence applications.
- Experience in supporting multiple applications and focus on 24x7 incident management including troubleshooting, triage leadership (driving the incident resolution calls) and escalation, execution of scheduled activities and re-hydrations.
- Analyze production issues to determine root cause and provides fix recommendations to the development team
- Support applications in production. Note interruptions or bugs in operation and perform problem solving exercise to determine problem and ensure continued use of the application.
- Create, develop, and track solutions to application errors reported
- Assist with troubleshooting production incidents requiring detailed analysis of issues on web and desktop applications, Autosys batch jobs, and databases (both relational and VSAM for mainframe) and issue resolution relating to current applications, providing assistance to the development
- Provide ongoing internal reporting of performance measures and service levels to continually improve the quality of services delivered and customer satisfaction
- Manage and coordinate hot fix and maintenance releases.
- Provide support to the business during day-to-day activities and ad-hoc requests.
- Develop, implement and/or improve the application production support knowledge management repository(s) to ensure all are documented, process & procedures are clear and periodic reviews are conducted
- Working knowledge of Nagios monitoring software for remote and onsite devices both physical and virtual.
- Experienced on Monitoring Dashboard Tools such as Splunk, Kibana, Datadog.
- Installed, Configured, Managed Monitoring Tools for Resource Monitoring/Network/Log Trace servers.
- Experience in setting up Nagios, Splunk, New Relic and ELK monitoring on both Linux and Windows systems.
- Knowledge in providing technical and operation support to multiple solutions.
- Involved in administrating, configuring and troubleshooting various services like DNS, DHCP, NFS, LDAP Apache Web Server, SSH, package management environments.
- Good understanding knowledge on ticketing systems like Service now, HP Service manager change management process workflow.
- Assist with planning and testing of application, configuration and database changes, and installation of upgrades and patches and update production support documentation
- Excellent verbal & Grammar skills and Expertise in Documentation forinfrastructure standards and diagrams.
TECHNICAL SKILLS
Operating Systems: Linux, Redhat, Ubuntu, Windows
Cloud Services: Amazon Web Services, Azure, Google Cloud Platform
Source Control/Versioning: SVN, GIT
WEB/Application Servers: Web logic, Apache tomcat, WebSphere, JBoss
Build Tools: Maven, Ant, MS Build
CI Tools: Hudson, Jenkins
Automation Tools: Chef, Puppet, Ansible, APLUS, Urban Code Deploy
Container Tools: Docker, Docker Compose, Kubernetes
Databases: MySQL, SQL Server, NoSQL, MongoDB
Programming: Python, Java, C#, Bash, Shell Scripting
Monitoring Tools: Splunk, Kibana, Nagios XI, Zabbix, Data Dog, AppDynamics, Splunk, Cloud Watch, ELK, New Relic
PROFESSIONAL EXPERIENCE
Cloud/Devops Production Support Specialist
Confidential, Plano TX
Responsibilities:
- Working knowledge of public cloud providers (Amazon AWS/EC2, Google Compute Platform) and their technology offering, APIs and enterprise integration points.
- Monitoring the ON Premises servers and servers on the hosted services (AWS) continuously to detect and investigate the problems before any Impact on the business.
- Leading and supporting production system deployment while ensuring SLAs are met.
- Effectively manage troubleshooting and recovery of the complex production incidents, ranging from low to critical impact.
- Handling Business as Usual (BAU) which consists of Monitoring, troubleshooting, validation, running SQL queries, Splunk queries, Working with AWS Services.
- Providing for the software and systems behind all of the CapitalOne’s external and internal customer facing services with an ever-watchful eye on their availability, latency performance and capacity.
- Run technical bridges to understand the problem and root cause of the issue. Troubleshoot the issues by reproducing it in the testing environment and review them periodically.
- Gather different teams into the call while Incident Bridge is running. Perform technical steps along with the different teams on the call and narrow down the issue caused due to various reasons.
- Automating the day-to-day activities using python. Writing scripts from scratch, modifying the existing scripts, updating scripts, Pushing to the GitHub for merge approval.
- Debugging, Refactoring the python code to fix any errors before deploying to production.
- Creating and setting up the dashboards in Splunk to monitor the log generated by systems and applications also created visualization charts, gauges, maps within the Splunk for visualizing the logs.
- Experience in working with other monitoring tools like NewRelic, Grafana, OneDash, Kibana, ELK, DataDog, Zabbix.
- Experience in monitoring EC2 instances using Nagios.
- Integrated PagerDuty with monitoring tools to push the alerts and also Slack integration with pagerDuty for notification purpose.
- Create, manage and utilize appropriate technical procedural documentations/ Confluence.
- Working within ITIL standards and structured and formal environments.
- Actively participate in team's agile stories to streamline and enhance day to day operations of the team.
- Leading team to design, write and deliver technical and process automations to improve the availability, scalability, latency, and efficiency of CapitalOne’s services.
- Troubleshooting networking problems with in-depth knowledge and understanding of network theory including various concepts such as networking protocols (TCP/IP, UDP) MAC address, IP packets, DNS, OSI layer and load balancing.
- Good understanding knowledge on ticketing systems like Service now, HP Service manager change management process workflow
- Maintain and manage the AWS resources, Creating & managing EC2 instances, using Cloud Formation templates to spin up resources, Monitoring the resources with Cloud Watch, Creating VPC and deploying resources into those VPC’s, Run technical meetings to mitigate the issues in AWS.
- Troubleshooting the Jenkins issues like Build failure, Connectivity issues from master to slaves nodes, Docker Images pull issues, Plugins failure, Input requesting issues, Jenkins File web hook problems etc. Worked on resolving these issues with the Jenkins team and provided technical support for the Jenkins.
- Effectively configured MYSQL replication as part of HA solution.
- Contribution in designing database, creating database schema, data migration, HA.
- Managing MySQL processes in the Linux environment, security management, and queries optimization.
- Experience in writing DockerFile to build custom Docker images, Docker compose to deploy multi containers for an application, Docker swarm for Orchestration and managing the large number of containers.
- Experience in working with Docker hub as a public registry to upload custom build images.
- Experience working on private docker registry using JFROG artifactory to Pull and Push Images to the Private HUB.
- Dockerized Multiple Applications written in python and GO. Applications are running in custom built Docker images with restart policy updated to the containers for high availability of that container in case of restarts etc.
Environment: AWS (EC2, Cloud Formation, VPC, RDS, ELB, S3, Route 53, Elastic Bean Stalk, Lambda, Terraform, SNS, SES, Cloud Watch), Azure (Compute, Web & Mobile, Blobs, Resource Groups, Azure SQL, Cloud Services, ARM), Puppet, UNIX, JSON, Jenkins, Docker, Kubernetes, GIT, Jira, Splunk, New Relic, DataDog, Kibana, IBM MQ, ServiceNow.
DevOps Engineer / Production Support
Confidential, Coppell TX
Responsibilities:
- Experience in designing, deploying and Monitoring AWS Solutions using EC2, S3, EBS, Elastic Load balancer (ELB) and auto scaling groups.
- Troubleshooting and monitoring of various applications using Cloud Watch in the Amazon Web Services (AWS) environment and servers hosted at on Premises.
- Performed all Linux operating system, Windows Operating system, disk management and patch management configurations, onAWSEnvironment.
- Maintenance of GIT repository and assisted developers with establishing and applying appropriate branching, merging conventions using GIT.
- Developed build and deployment scripts using ANT as build tools in Jenkins to move from one environment to other environments.
- Create the Virtual Machine cluster and load balance for VM's using PowerShell in SQL & Windows Environment.
- Wrote PowerShell scripts to automate application deployment and data collection.
- Script, debug and automate PowerShell scripts to reduce manual administration tasks and cloud deployments.
- Utilized Configuration Management Tool Puppet & created Modules using recipes to automate system operations.
- Responsible for build and deployment of the application in Jenkins using automation Shell scripting on Linux servers. Created Jenkins Pipelines for effective functionality of the code in lower environments.
- Worked on WCBD (WebSphere Commerce Build & Deployment) using Jenkins. Configured and maintained Jenkins to implement the CI process and integrated the tool with Ant to schedule the builds.
- Continuous Delivery is being enabled through Deployment into several environments of Test, QA, Stress and Production using Jenkins.
- Interacted with developers and Enterprise Configuration management team for deciding the changes to best practices and tools to eliminate non-efficient practices and bottlenecks.
- Implemented project migration and portfolio migration tasks.
- Installed WebSphere applications and spearheaded deployment activities.
- Experience deploying and maintaining multi-container applications through Docker.
- Worked on several Docker Components like Docker Engine, Hub, Machine, creating Docker Images, Docker Registry.
- Experience in Blue-green deployment that reduces downtime and risk by running two identical production environments.
- Worked on Docker Container snapshots, attaching to a running container, removing images, managing Directory structures and managing containers.
- Managed and monitored the server and network infrastructure using Splunk.
- Installed Splunk in production servers for logging purpose. Built Splunk dashboards for application monitoring. Configured alerts for operational purpose.
- Used Splunk data Connector between Splunk Enterprise and Relational Database.
- Worked with development/testing, deployment, systems/infrastructure and project teams to ensure continuous operation of build and test systems.
- Implemented and maintained the assigned enterprise infrastructure systems to ensure successful deployment and operation support (24x7). Supported systems integration testing and user acceptance testing.
- Maintained environments by applying necessary fix packs/feature packs.
- Monitored application server performance and respond appropriately. Monitored application server performance and respond appropriately.
- Worked with DBAs to implement required DB2 configurations and data.
- Managed all the bugs and changes into a production environment using the Jira tracking tool.
Environment: AWS (EC2, Cloud Formation, VPC, RDS, ELB, S3, Route 53, Elastic Bean Stalk, Lambda, Terraform, SNS, SES, Cloud Watch), Puppet, JSON, Jenkins, Ansible, Docker, Kubernetes, GIT, Jira, Splunk, ServiceNow.
DevOps Engineer
Confidential
Responsibilities:
- Provided Configuration Management and Build support for applications, built and deployed to the production and lower environments.
- Defined and Implemented Configuration Management and Release Management Processes, Policies and Procedures.
- Analyses and resolve compilation and deployment errors related to code development, branching, merging and building of source code.
- Designed and Provisioned Public Cloud Infrastructure using services like VPC, ELB, EC2 and RDS instances.
- Perform data dump provide system administration support for a client with 120 instances hosted in AWS Cloud environment.
- Setup and build AWS infrastructure Design using various resources, VPC EC2, S3, IAM, EBS, Security Group, Auto Scaling, and RDS in Cloud formation YAML templates
- Automate Public Cloud Infrastructure Monitoring using Lambda, CloudWatch Events, Schedules.
- Implementing new projects builds framework using Jenkins & maven as build framework tools and also Integrated Docker build as a part of Continuous Integration process and deployed local Docker registry server.
- Used Coded UI Builder to identify the Web elements in the application.
- Implementing a Continuous Delivery framework using Jenkins.
- Created and maintained Subversion repositories, branches and tags and Experience in Administering SVN.
- Acted as an integrator for performing merge, rebase and baseline operations.
- Developed applications in python for multiple platforms.
- Involved in editing the existingANT/MAVENfiles in case of errors or changes in the project requirements.
- Perform periodic system audits on all environments to ensure that only controlled versions of software reside all environments.
- Coordinate/assist developers with establishing and applying appropriate branching, labelling/naming conventions using Subversion source control.
- Created and maintained continuous build and continuous integration environments in SCRUM and Agile projects.
- Used these scripts to replicate production build environments on a local dev boxes using Vagrant and Virtual box.
- Troubleshoot the automation of installing and configuring applications in the testing and production environments.
- Deployed code on Web Sphere application servers for Production, QA, and Development environments.
- Performance tuning of Web sphere application server including JVM, Garbage Collection, JDBC along with its server logs.
- Involved in sprints and planned releases with the team using JIRA and Confluence.
Environment: Maven, Java, SVN, ANT, Cruise Control, Tomcat, Eclipse, Linux, Windows.