Senior Hadoop Developer/administrator Resume
Des Moines, IowA
SUMMARY
- Over 8 years of experience in Information Technology which includes experience in Big data, HadoopEcosystem like HDFS, MapReduce, Yarn, Pig Hive, HBase, Sqoop, Oozie, Flume, Zookeeper.
- Cloudera Certified Administrator for Apache Hadoop (CCAH)
- 5 Years experience installing, configuring, testing Hadoop ecosystem components.
- Experience in Writing Map Reduce programs in Java.
- Excellent work Experience with 5 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
- Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experience in managing Hadoop cluster using Cloudera Manager and AMBARI.
- Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Hands on experience working with Java project build managers Apache MAVEN.
- Knowledge of UNIX and shell scripting.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance, Polymorphism, Exception handling and Templates and Development experience with Java technologies.
- Experienced in configuring Workflow scheduling using Oozie.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
- Designed, and implemented a hybrid cloud virtual data center utilizing AWSto provide servers, storage, networks.High - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
- Worked on NoSQL databases including Hbase, Cassandra and Mongo DB.
- Cluster planning and engineering of POC and Production Clusters.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
- Used Apache Kafka for tracking data ingestion to Hadoopcluster.
TECHNICAL SKILLS
Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX
Languages and Technologies: C, C++, Java, SQL, PLSQL
Scripting Languages: Shell scripting
Databases: Oracle, MySQL, Postgre SQL
IDE: Eclipse and Net Beans, SBT
Application Servers: Apache Tomcat server, Apache HTTP webserver
Hadoop Ecosystem: Hadoop MapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2
Apache Spark: Spark, Spark SQL, Spark Streaming.SCALA, spark with python
Cluster Mgmt.& Monitoring: Cloudera Manager, Horton works Ambari, Ganglia and Nagios.
Security: Kerberos.
PROFESSIONAL EXPERIENCE
Confidential, DES Moines IOWA
Senior Hadoop Developer/administrator
Responsibilities:
- Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
- Implemented nine nodes CDH5 Hadoop cluster on Red hat LINUX.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked in AWSenvironment for development and deployment of Custom HADOOPApplications.
- Developed Pig UDF s to pre-process the data for analysis.
- Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive.
- Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
- Dealing with high volume of data in the cluster.
- Collected logs data from web servers and integrated into HDFS using Apache Flume.
- Experience working on Datameer 5, Datameer 6.0 Versions.
- Tuned the developed ETL jobs for better performance.
- Imported logs from web servers with Flume to ingest the data into HDFS
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
- Good troubleshooting skills on Hue, which provides GUI for developer’s/business users for day to day activities
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Used Oozie and Zookeeper for workflow scheduling and monitoring.
- Created Hive Managed and External tables defined with static and dynamic partitions.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Spark Streaming is used to get the Web server log files.
- Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERA MANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HOROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.
Confidential, Seattle, WA
HADOOP DEVELOPER/administrator
Responsibilities:
- Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode highAvailability, capacity planning, and slots configuration.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoopprograms using Oozie.
- Works with ETL workflow, analysis of big data and loaded them into Hadoopcluster
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Developed Spark Application by using Scala
- Load and transform large sets of structured, semi structured and unstructured data
- Configured MySQL Database to store Hive metadata.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
- Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
- Responsible for loading unstructured data intoHadoopFile System (HDFS).
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on Map reduce Joins in querying multiple semi-structured data as per analytic needs.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
Environment: Hadoop, Pig, Hive, Java, SQOOP, Kafka, HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.
Confidential - Weehawken, NJ
Hadoop developer/ Administrator
Responsibilities:
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Installed Oozie workflow engine to run multiple Hive Jobs
- Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
- Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Worked on Hue interface for querying the data
- Automating system tasks using Puppet.
- Created Hive tables to store the processed results in a tabular format.
- Utilized cluster co-ordination services through ZooKeeper.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
- Configuring Sqoop and Exporting/Importing data into HDFS
- Configured NameNode high availability and NameNode federation.
- Experienced in loading data from UNIX local file system to HDFS.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Configured NameNode high availability and NameNode federation.
- Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
- Data analysis in running Hive queries.
- Generated reports using the Tableau report designer.
Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Puppet, Tableau, and Java.
Confidential - Atlantic City, NJ
Hadoop/ Java Developer
Responsibilities:
- Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped thedevelopersto implement successfully.
- Designed theHadoopjobs to create the product recommendation using collaborative filtering.
- Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
- Integrated the Order Capture system with Sterling OMS using JSON Web service
- Configured the ESB to transform the Order capture XML to Sterling message.
- Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
- Mentored and implemented the test driven development (TDD) strategies.
- Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
- Developed the Data transformation script using hive and MapReduce.
- Designed and developed User Defined Function (UDF) for Hive.
- Loading the data to HBASE using bulk load and HBASE API.
- Designed and implemented the Open API using Spring REST webservice.
- Proposed the integration pipeline testing strategy-using cargo.
Environment: Java, JSP, Spring, JSF, Rest Web service, IntelliJ, WebLogic, Subversion, Oracle, Hadoop, Sqoop, Hbase, Hive, Sterling OMS, TDD and Agile
Confidential
Linux Administrator
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Installing and maintaining the Linux servers
- Extensive use of REDHAT Enterprise Linux 5.X.
- Operating system backup and upgrades from RHEL 5.4 to 5.5/6. x.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recovery by implementing system and application level backups.
- Performed various configurations that include networking and IPTables, resolving host names and SSH keyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks through the use of scripting and Job scheduling using CRON.
- Monitoring System Metrics and logs for any problems.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Maintaining the MySQL server and Authentication to required users for databases.
- Creating and managing Logical volumes using LVM.
- Installing and updating packages using YUM.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations