We provide IT Staff Augmentation Services!

Senior Linux System Administrator Resume Profile

3.00/5 (Submit Your Rating)

SUMMARY:

  • Redhat Certified Engineer RHCE , Linux Professional Institute Certified LPI , Certified Ethical Hacker CEH
  • Extensive experience architecting large scale performance monitoring solutions with Nagios 1x, 2x and 3x, up to 10k hosts and 100k services per monitoring instance.
  • Expert in correlating seemingly non-related abstract data through standard performance monitoring tools to quickly determine root causes of common system issues.
  • Author of an eight level monitoring severity matrix, built to be compatible with any environment big or small, be it 100 servers or 1 Million Servers.
  • Expert in open source performance monitoring, using nagios, nrpe, cacti, and snmp polls/traps.
  • Performance troubleshooting with sar, vmstat, iostat, tcpdump, aide, traceroute, iperf, hdparm, netperf, pathrate, tcptrace, pmap,iptraf, bonnie , dbench, fstress, wireshark, netstat, top, esxtop
  • Extensive experience profiling and performance monitoring different layers of the lamp stack such as Apache, MySQL, PHP, perl and python
  • Hands on knowledge of RHEL ES/AS 2,3,4,5, Gentoo Linux, and Solaris 2.6,5.8,5.9,5.10
  • Performance tuning through various loadtesting techniques memslapd, siege and jmeter.
  • Extensive familiarity with the bash and bourne shell
  • Good Familiarity with tuning GNU/Linux internals through different kernel parameters.
  • Automation Experience with cfengine, puppet, chef and cobbler

EXPERIENCE:

CONFIDENTAIL

Linux Engineer DevOps

  • Created a nagios instance to help monitor the production provider portal which gave business deep insight into the overall health of the production environment.
  • Responsible for the uptime and maintenance of 200 Dev/QA/Staging and Production VMware servers.
  • Worked with developers and the business to enable a more sane, standards based agile release environment averaging about 10 releases a day to the JBOSS QA Environment.
  • Coordinated with developers through every phase of the development cycle in deploying JBOSS Artifacts to the production environment.
  • Assisted a team-member in creating an automated war deployment web app to enable off shore resources to deploy JBOSS releases during off hours.

CONFIDENTAIL

Infrastructure Engineer DevOps

  • Responsible for the uptime of a distributed splunk instance which indexed 900 gigs of data per day.
  • Automated with the splunk python sdk, the on boarding of PCI complaint data to the companies internal splunk instance.
  • Created capacity planning graphs using reverse polish notation to predict when resources would become exhausted.
  • With no access to cron itself, automated the scheduling of tasks through the cron like program autosys to ensure splunk instance uptime.
  • Assisted in the strategy, deployment and implementation of the splunk python software development kit for splunk.
  • Creation of a conduit between the IBM net cool perl api and splunk to allow splunk to send alerts to Netcool.
  • Crafted inputs for splunk using python to parse the JIRA API
  • Worked with customers to help visualize mission critical data originating from redhat/solaris and windows production servers through creation of custom splunk dashboards.
  • Crafted and optimized numerous regular expressions while on boarding customer data to the firm's splunk instance.

CONFIDENTAIL

Systems Engineer DevOps

  • Responsible for systems release management of proprietary company tools worldwide using puppet.
  • Created python scripts to interact with BOFA's custom database model to provide daily delivered accurate server inventory.
  • Created a perl script to parse various performance metrics from an ESXi host which were then made available through a webservice to higher level tools for creation of heatmaps, capacity ceilings, and inventory capacity management.
  • Using nagios as a transport layer architected a horizontal performance graphing system which was able to horizontally scale in a linear fashion.
  • Utilized Vagrant to collaborate with developers to ensure consistency, and speedy deployment of virtual machine builds.
  • Tasked with creating custom plugins to monitor the health of puppet clients as well the puppetmaster server
  • Used Perl and the VMware API to facilitate common virtual machine operations within VMware Virtual Center.

CONFIDENTAIL

Systems Engineer

  • Creation of a version control system for Groundwork
  • Created numerous PHPWeathermaps to help visualize interesting traffic through the identification of choke points on the customer's network
  • Built first monitoring system version control system to feature LOKS encryption to ensure protection of Intellectual Property
  • Creation of custom reports for predictive analysis of various metrics being monitored on customer's sites.
  • Responsible for the coordinating with Project Managers and Network Engineer to ensure uniformity of devices being monitored
  • Built rrd graphs to aide in Customer Capacity planning by using rrdtool's Reverse Polish Notation RPN and the sum of least squares.

CONFIDENTAIL

Senior Linux System Administrator Team Lead

  • Built a nagios monitoring instance from scratch called NJN NJN is not just nagios which monitors the entire company infrastructure 900 hosts and 15,000 services with a 4 second average check latency.
  • Patched nagios to be able to multiplex active checks with nrpe, to allow for up to 10k hosts and 100k services from one Linux machine.
  • Nagios instance had 99.97 availability for the year of 2010.
  • Created an 8 level severity matrix for the nagios instance which assigning unique values to each level, as well as assigning SLAs to for users to be able to quickly judge how quick to respond to every alert based on the alert severity.
  • Created numerous event handlers to display troubleshooting information through the use of sar and top when an application would register a soft or hard state alarm.
  • Was able to to display 15k graphs at a 1 minute resolution through the use of asynchronous writes to rrd graphs.
  • Wrote 100 custom bash/Perl/python/ruby health checks for VMware VSphere, Centos 3x-5x, VMware ESXi, and core network infrastructure.
  • Wrote a bash shell script to determine the type of server and display ascii graphs on the command line of KPI such as disk i/o, context switching, processes being swapped in and out, and of CPU usage.
  • Responsible for the availability of 100 VMware ESX 4.1 servers, and the 700 guest vm's.
  • Created multi-line rrd graphs to measure response time of apache servers running advance's custom frontend application SSF
  • Responsible for maintaining the uptime of Advance Internet's affiliate newspaper sites which take in 300 million hits monthly.
  • Tuned apache web servers based upon the amount context switching being seen.
  • Created python scripts to monitor various statistics about a Citrix Netscaler's through its SOAP API
  • Used JMX to monitor java application servers, resin and tomcat for generic metrics such as frequency of GC and heap memory usage
  • Worked with developers to create a condiut through xml-rpc to embedd links in Nagios alarms which corresponded to Confluence Wiki links.

CONFIDENTAIL

Senior Linux System Administrator

  • Responsible for monitoring all servers from Development, QA, to Production.
  • Created Cacti graphs of the entire environment from the CSS 10503 Load balancer to the development servers, Over 1, 5000 graphs in all.
  • Created custom cacti graphs to provide granular analysis of company revenue for the previous 15 minutes, through bash shell scripting and a mysql query.
  • Leveraged CFengine to provide system automation and uniformity throughout ad marketplace's infrastructure.
  • Used seige for performance optimization of the companies main tomcat cluster consisting of 25 tomcat servers, which received on average 300 million http requests a day.
  • Built a nagios solution from scratch with over 800 metrics for 60 centos 4x and and 6 centos 5x servers.
  • Doubled performance of tomcat 5x servers through java heap memory tuning as well as Linux kernel parameter tuning.
  • Implemented a logging schema which enabled centralized logging of over 2 terabytes of tomcat logs.
  • Responsible for the research, purchasing, and deployment of a complete server and network migration for ad marketplace's infrastructure when it was decided an upgrade of the existing datacenter partner was necessary.

CONFIDENTAIL

Linux System Administrator

  • Responsible for maintaining uptime and stability for three different production environments, Dada USA, UPOC Networks, and a joint venture between Dada Entertainment and Sony.
  • Architected a nagios distributed monitoring system for Dada USA, for its Italian, USA and Chinese nagios branches.
  • Implemented a nagios instance from scratch on a gentoo Linux server for Dada USA which included over 1,110 monitoring metrics for 85 gentoo servers.
  • Implemented a nagios instance from scratch on a Solaris 5.10 server for UPOC Networks which included over 900 monitoring metrics for 110 Solaris servers.
  • Created an office nagios to monitor QA, Staging, and Development servers, as well as other office peripherals.
  • Implemented two cacti servers for graphing various metrics, one for Dada USA, one for UPOC, for a combined 195 servers featuring 2,500 performance graphs.
  • Wrote bash scripts to automate nagios service check creation through usage of an snmp index as an array to monitor production SMS binds to major cell phone carriers.
  • Installed a gforge server for bug tracking and collaboration between developers and the operations department on a Centos 5.0 server.
  • Created various event handlers to restart production services in case a critical failure was detected.
  • Lead the effort to create a more uniform documentation schema in conjunction with developers using the WikiMedia wiki platform for more efficient troubleshooting
  • Created a visio of how SMS is aggregated through UPOC's online SMS social community
  • Created a custom Perl script to monitor if the SRP connection was ever lost between UPOC's BES server and the RIM network.

CONFIDENTAIL

Linux System Administrator IP Protection

  • Responsible for administration of over 700 gentoo production Linux servers.
  • Developed bash shell scripts for automating troubleshooting and repair of production servers.
  • Responsible for wiring, configuring, and networking 250 production servers during a planned move in the data center.
  • Rebuilt the production gentoo image from scratch and implemented on all 700 gentoo production servers.
  • Implemented a bugzilla on a windows 2003 server.
  • In charge of extensive technical documentation pertaining to troubleshooting overpeer's custom application
  • Created a customized Linux livecd to further automate troubleshooting down production boxes
  • Implemented distributed nagios with active and passive monitoring of over 100 production metrics through a mixture ssh, snmp, nrpe, and nsca.

We'd love your feedback!