Hadoop Admin/dev Resume
PROFESSIONAL SUMMARY:
- US citizen, over 10+ years of IT experience in various domains like Finance, Banking and Insurance company with different technologies.
- Including Big Data Hadoop with different distribution like Cloudera, Hortonworks and Apache in different platforms such as VMware and Clouds.
- Excellent understanding of Hadoop architecture and underlying framework including storagemanagement in private Cloud and Public Cloud.
- Knowledge of multiple distributions/platforms (Apache,Cloudera,Hortonworks).
- Experienced in using various Hadoop ecosystems such as MapReduce,HBase, Pig, Hive, MongoDB,Sqoop,and Flume.
- Worked with Pyspark, Scala Spark for in memory computing.
- Knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Expert in using Sqoop for fetching data from different RDMS Database to analyze in HDFS.
- Developed Map and Reduce codes as per the business requirements.
- Very Good understanding of RDMS, Informatica ETL and Data Center Technologies.
- Experienced in Java, Python, Ruby, XML, Scala,SQL, PL/SQL, and Shell Scripting.
- Configured Hadoop cluster in private and public cloud.
- Experienced with Virtualization technologies including Installed, Configured and administered VMware.
- Ability to learn and adapt quickly to the emerging new technology paradigms.
TECHNICAL SKILLS:
Operating Systems: Windows Server 2012/2008/2005 , UNIX/Linux, IBM Mainframe
Ecosystem: Hive, HBase, Pig, MongoDB,Zookeeper,Oozie, Kafka, Sqoop, Flume and Apache Spark
Methodologies:: Agile/Scrum, Light Agile Development (LAD), Waterfall, Iterative
Database / DB Tools: Oracle, Microsoft SQL Server, MySQL, MongoDB
Languages: JavaScript, Python, Scala, Java, Ruby on Rails,SQL, XML
Cloud: AWS, vCenter Vmware, Private cloud, Public cloud, Hybrid cloud
Network Protocols: NFS, NTP, DNS, TCP/IP
Security: Active Directory, Kerberos, LDAP
Backup / Monitoring: Veeam, Splunk
PROFESSIONAL EXPERIENCE:
Hadoop Admin/Dev
Confidential
Responsibilities:
- Worked on setting up high availability for major production cluster and designed automatic failoverClouderacluster.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Developed MapReduce code for Hortonworks spark in Python and Scala.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre - process the data.
- Prepared technical documentation of systems, processes and application logic for existing data sets.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Clouderalog files.
- Worked on Yarn MapReduce 2.0 in cluster environment for interactive querying and parallel batch processing.
- Tested raw data and executed performance scripts.
- Performed Analyzing/Transforming data with Hive and Pig.
- Provided assistance for troubleshooting and resolution of problems relating to Hadoop jobs and custom applications.
- Assisted in designing, development and architecture of Ecosystem and domain.
- Participated in installation, updating and maintenance of Cloudera software applications.
- Configured and maintained multi node cluster environment.
- Created, cloned Linux Virtual Machines, templates using VMware Virtual Client.
- Added SAN using multipath and creating physical volumes, volume groups, logical volumes.
- RPM and YUM package installations, patch and other server management.
- Automated processes for application of new technologies as per latest changes.
- Configured access domains.Required for Device Group and Template administrators.
- Configured Admin Role profiles.Required if you are assigning a custom role to the administrator.
Environment: Cloudera Manager,MapReduce, HDFS, Hive, Hbase, MongoDB, Java, Oracle, Pig, Sqoop, Oozie, Tableau. Apache Spark, Kafka, Spark R, MLlib, vCenter VMware, ESXi server.
Hadoop Admin/Dev
Confidential
Responsibilities:
- Installed and configured various components of Hadoop ecosystem and maintained their integrity
- Managed Hadoop clusters: setup, install, monitor, maintain.
- Planned for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Commissioned DataNodes when data grew and decommissioned when the hardware degraded.
- Migrated data across clusters using DISTCP.
- Experience in creating shell scripts for detecting and alerting problems system.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
- Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS usingFlume and Kafka.
- Conducting root cause analysis and resolve production problems and data issues.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2.0 cluster.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
- Implemented HDFS snapshot feature.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2.0
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Configured custom interceptors in Flume agents for replicating and multiplexing data into multiple sinks.
- Administered Tableau Server backing up the reports and providing privileges to users.
- Worked on Tableau for generating reports on HDFS data.
- Installed Ambari on existing Hadoop cluster.
Environment: Hadoop, MapReduce, HDFS, Hive, Hbase, MongoDB, Java, Oracle, Cloudera Manager, Pig, Sqoop, Oozie, Tableau.
ETL Developer
Confidential, Falls Church, VA
Responsibilities:
- Modified and added edits in Informatica Mapping followed by SDLC methodology.
- Created Pseudo code based on client requirements, prepared Decision Matrix, created Test Data, developed Code, and performed testing. If it pass the test then implemented in production.
- Prepared Change Code Documents, Technical Peer Review, Functional Demonstration documents and all project related deliverables.
- Created PIG and retrofitted in production.
- Validated the Transmission Files and splitting valid files using AIX Shell Scripting and informatica mapping.
- Created or modified of UNIX shell script and crontab file to maintain automation and execute specific process in cycle run.
- Created Parameter in mapping for passing parameter values into mappings to meet the frequently changing business requirements.
- Created reusable batch file to create catalog in DB2 database.
- Loaded test data in TED STG database and performed Informatica Mappings to process the received records.
- Wrote SQL queries to check test result data in TED ODS database.
- Prepared Test Case and Test Data and executed the test cases during Unit, System and User Acceptance Testing.
Environment: Informatica Power Center 9.5, IBM DB2, MS-Visio, Windows XP/2008, AIX 5.3, Business Objects, ERWin, Advanced Query Tool, Toad, Quest Central.