Hortonworks Dataflow And Data Platform (hdf And Hdp) Engineer Resume
Sleepy Hollow, NY
SUMMARY:
Extensive experiences in AWS Cloud and Hadoop Administration activities such as installation and configuration of clusters using Apache and Hortonworks with services including HDFS, Spark, Yarn, MapReduce2, HBase, Oozie, Hive, Ranger, Kafka, Zookeeper, NIFI, Drill, etc. Cluster performance tuning. Amazon services including EC2, Route53, VPC, VPN, RDS, IAM, S3, ELB, CloudWatch, Auto Scaling, Cloudformation, SES, SNS, etc. Possess high working qualities with good interpersonal skills, high motivation, fast learner, good team player and very proactive in problem solving and provide best solutions.
PROFESSIONAL EXPERIENCE:
Confidential, Sleepy Hollow, NY
Hortonworks DataFlow and Data Platform (HDF and HDP) Engineer
Responsibilities:
- Review existing Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) clusters in AWS Cloud and perform NiFi performance tuning to improve the performance and stability.
- Modified AWS instance type of the nodes to meet the load of data flow processors in the clusters. Add new datanodes to the clusters to meet the requirement of the jobs, performance and capacity planning.
- Provide support to developers who have cluster related issues. Troubleshoot cluster related issues, work with network team to troubleshoot connectivity to other AWS service endpoints, etc.
- Provision AWS instances using Ansible Tower and use Hortonworks Cloudbreak to build clusters to AWS instances.
- Responsible for implementation and administration of Hortonworks infrastructure.
- Document the setup procedures, new parameters for optimal performance, SOP for cluster alerts and health check. All documentations are edited and stored in Confluence.
- Setup alerts to monitor cluster healthy.
- Work with data delivery and processing teams to setup new users in Linux, Kerberos principles, setup permission to access services by using Ranger.
- Troubleshoot data processes to/from HDFS and S3, troubleshoot processes in Airflow and Airflow errors.
Confidential, Lawrenceville, NJ
Senior AWS Infrastructure Engineer/Big Data System Administrator
Responsibilities:
- Create and edit CloudFormation Templates to create Apache Hadoop Clusters, Apache Spark Clusters, Kafka Zookeeper and Apache Drill Clusters.
- Build Zookeeper and Kafka clusters, create new topics and troubleshoot any issues including data flows, capacity, etc. Worked with developers to optimize the performance.
- Manage Hadoop, Spark, Kafka, Zookeeper and Drill Clusters in all environments. Perform setup, administrate and troubleshoot any failure components. Performs root cause analysis on failed components and implements corrective measures. Modified configuration files for Hadoop, Spark and Kafka to improve performance and stabilities.
- Create Auto - Scaling Group for Spark clusters, spin-up additional Spark workers for nightly Spark jobs and spin-down after the jobs are completed to reduce AWS costs.
- Built NameNode/ResourceManager HA Cluster using Ambari server. Administrate Cloudera Production Cluster.
- Installed Datastax Cassandra clusters in AWS. Modified Cassandra configuration files to improve performance. Upgrade DSE and Opscenter, managing the Clusters and troubleshooting issues.
- Setup Databricks Enterprise Platform environment, created cross-account role in AWS for Databricks to provision Spark clusters.
- Created S3 buckets, configure and generated policies for different environments (Dev, QA, Staging and Production) in Databricks platform. Added PassRole to cross-account role for Databricks Spark clusters to access S3 buckets.
- Deploy Spark jobs to Databricks platform from Corporate GitHub repositories by using Python tools.
- Setup VPC peering for Databricks Spark clusters to receive streaming data from Kafka clusters and access Datastax clusters in different AWS Production account.
- Created users in Databricks platform and assigned ACLs to users.
- Install, upgrade Tableau Servers in AWS.
- Deploy Spark jobs to QA, Staging and Production from Corporate GitHub repositories.
- Set up and configure AWS VPCs and components--Subnets, IGW, Security Groups, and EC2 Instances.
- Experienced in creating multiple VPC’s and public, private subnets as per requirement and distributed them as groups into various availability zones of the VPC.
- Created EBS volumes for storing application files for use with EC2 instances whenever they are mounted to them.
- Experienced in creating RDS instances to serve data through servers for responding to requests.
- Created snapshots to take backups of the volumes and also images to store launch configurations of the EC2 instances.
- Created NAT gateways and instances to allow communication from the private instances to the internet through bastion hosts.
- Experienced with installation of AWS CLI to control various AWS services through SHELL/BASH scripting used IAM to create roles, users, groups and also implemented MFA to provide additional security to AWS account and its resources.
- AWS Cost Analysis & Control.
- Built Virtual Machines in Armor and Hadoop/Spark clusters in VMWare hosts (Dell Servers) in Corporate Datacenter for data that have PII information.
Confidential, Poughkeepsie, NY
Senior Big Data and Analytics System Administrator/Linux System Administrator
Responsibilities:
- Use xCAT toolkit to perform RHEL 6.5, RHEL 7.1 and RHEL 7.2LE OS provisioning on Confidential POWER Systems and Tyan POWER systems for Hadoop clusters. Configure ethernet bonding interfaces on all data nodes which are connected to 10 Gb or 40 Gb network switches. Test disk and network I/O throughput. Install Confidential BigInsights 4.1 to the clusters and configure SSDs in each node for Spark's scratch space. Performance tuning for Hadoop and Spark jobs.
- Install, setup and administrate Confidential BigInsights hadoop clusters which has Apache components of Ambari 2.1, HDFS, YARN, Spark, Flume, Kafka, HBase, Hive, Knox, Oozie, Pig, Solr, Sqoop, Zookeeper, etc.
- Install, setup and administrate Data Lake with 1 PB HDFS Hadoop cluster and configure multi-tenancy (Capacity Scheduler) and permissions (including HDFS ACLs) for multiple customers to submit YARN applications.
- Setup NameNode HA, ResourceManager HA and multiple HBase Masters by using Hortonworks Ambari console.
- Cluster maintenance, add/remove nodes from Hadoop clusters. Add/move Hadoop services. Troubleshoot failed jobs. Create HDFS snapshots.
- Setup Spark environments and submit Spark Terasort for validation and performance tuning. Setup Spark master/slaves and Spark over YARN Client.
- Create AWS VPC and EC2 instance to run test cases. Setup NAT, IAM, AMI and S3 in AWS.
- Install and test Hortonworks HDP on Confidential Power systems.
- Evaluate new POWER server line for IDE-HS ( Confidential Data Engine for Hadoop and Spark). Hardware and Hadoop performance tests. Suggest cluster configurations for IDE-HS to the Design and Management teams.
- Install Hortonworks on new server model and test performance.
- Upgrade Cloudera CDH and administrate the cluster.
- Knowledge of Python scripts to handle and analysis data provided by customers. Install and setup Jupyter Notebooks to increase productivity. Knowledge of webHDFS, HBase, Pig, Hive, etc.
- Linux OS and Hadoop performance tuning, analysis performance data using nmon and Ganglia at OS level and YARN and MapReduce pararmeters at Hadoop level.
- Administrate PDOA v1.0 and v1.1 (PureData System for Operational Analytics) and work with sales technical team to perform demos to customers. Maintain and troubleshoot AIX 7.1 and Storwize V7000 in PDOA to keep the cluster at optimal condition.
Confidential, Union, NJ
Senior Unix Consultant
Responsibilities:
- Designed, installed and setup encrypted-POS servers integrated with Ab Initio in Corporate Datacenter and 4690 controllers in the stores. The encrypted-POS servers are Confidential p720 with AIX 7.1 and Confidential 4765 Crypto-Card.
- Administrate NIM server for the AIX environment to perform scheduled mksysb backup, new builds, AIX migration v5 to v6 and v6 to v7 and TL/SP upgrades. Upgrade firmware for p520, p570.
- Create and administrate Virtual RHEL Machines on VMware ESXi 5.1.
- Install and configure RHEL Cluster servers for critical applications.
- Built RHEL PXE server for new builds through Network Installtion.
- Built Secondary HMC v7 which connects with p570s, p520s and p720s for redundancy.
- Administrate AIX 6.1 and 7.1, RHEL 5.x and 6.x servers with EMC SAN storages which are installed DB2, Oracle, Websphere, SCI (Supply Chain Intelligence), i2, Confidential, Confidential, Ab Initio, Stibo, TIBCO, Teradata, etc.
- Administrate Confidential Tivoli Workload Scheduler), add new users, jobs and schedules.
- Administrate Confidential Tivoli Storage Manager) with Confidential TS3500 Tape Library for AIX servers, DB2 and Teradata backup. Manage offsite tapes for DR. Plan DR plans and tests.
- Troubleshoot Netbackup backup issues for Linux and AIX. Create policies and add clients to policies.
- Work with EMC team on Data Domain storage and setup Netbackup Master Server to perform Teradata, DB2 and AIX backup.
- Work with Application team to evaluate Apache Hadoop cluster for corporate data.
Confidential, Bronx, NY
Senior AIX Consultant
Responsibilities:
- Accomplished projects involving P2V (physical-virtual) server moves.
- Planned Data Center moves of standalone physical server to VIO and LPARS
- Planned to build DR Data Center at Sungard and worked with Application teams for DR exercises.
- Accomplished storage data migration, EMC Symmetrix, CLARiiON, VMax and Hitachi Storage.
- Installation, configuration, administration and maintenance of all v7 HMCs. Dual HMCs connected to all pSeries systems.
- Create and administrate LPARs by using HMC v7. Reallocate CPU and Memory among LPARs. Virtualized VIOS Servers Map virtual SCSI and virtual storage to VIO client LPAR's, Configure Shared Ethernet Adapters (SEA).
- Schedule weekly backups of HMC critical console data, VIO Server backups and system backups via NIM.
- Upgrade system firmware, upgrade AIX TL, upgrade AIX to 6.1 from 5.3 by using alt disk install to minimize downtime. Setup NIM for AIX maintenance and upgrading on LPARs.
- Install, configure and administrate GPFS for Oracle RAC environment.
- Administrate Confidential servers (v5.2 and v5.4) for backup and restore of all AIX servers. Define new nodes, domain name, policy and management class. Define Tape Library. Design and plan for Disaster Recovery for AIX environment in Data Center.
- Administrate and configure RedHat and Oracle Linux servers for Oracle database and applications.
- Create AIX network print queues for Oracle users to print from their application.
- Install and administrate VPSX print server.
- Install and setup PowerHA (formerly HACMP) for new systems, planning for migration from VCS Clustering.
- Administrate Intersystems Caché - manage 8 EMR systems (Quadramed with Caché Databases) for all NYC public hospitals. Daily tasks are create users and roles, DB backup, integrity check, run DataCheck to ensure production/shadow databases are in-sync, monitoring, troubleshooting, perform DR exercise between 2 datacenters, etc. Build VPSX print server for prescription prints for security.
Confidential
Consultant
Confidential, Tarrytown, NY
Responsibilities:
- Install, setup and administrator Windows 2003 cluster servers and manage accounts in Active Directory.
- Administrate NetApp Filers (NAS) with Data ONTAP 7.0.
- Design and implement Disaster Recovery process.
- Install, setup and administrator AIX 5.2 on pSeries. Duties include: configured mirror disks, upgraded mirror disks, configured Gigabit EtherChannel, backup and restore, manage filesystems, user accounts, cron jobs, manage paging space, setup NFS, network troubleshooting, monitor and performance tuning, applied Fix Packs and Updates, etc.
- Administrate Confidential 5.2.
- Duties include: upgraded DLT tape drives to StorageTEK 9940 tape drives in StorageTEK tape library. Built and executed data migration plan to migrate data from DLT to 9940 tapes.
- Administrator Solaris that had in-house application installed for field engineers. Worked with field engineers to troubleshoot Solaris-based devices installed in hospitals and clinics.
Consultant
Confidential, East Fishkill, NY
Responsibilities:
- Administrate and troubleshoot Websphere Advanced Servers, DB2 and Confidential HTTP Servers. Duties: Installation, configuration and administration a test environment before production upgrade. Coordinate with development team for application deployment. Utilize Log Analyzer and log files for troubleshooting and problem determination. Performed WebSphere performance tuning. Monitor and administrate Websphere servers in 24x7 production environment.
- Administrate and troubleshoot 500+ servers with multiple OS platforms, including Confidential AIX with HACMP and SAN environment on pSeries with RAID, Red Hat Linux on xSeries with RAID, Windows 2000/NT Server on xSeries with RAID and Sun Solaris, for industry-leading Confidential 300mm Chip Fab Facility and R&D Center (24x7 production environment. Install OS from CD and NIM (AIX) and configure the environment. Confidential configuration and troubleshoot for both server and clients.