Senior Hadoop Developer/administrator Resume DES Moines IOWA - Hire IT People

SUMMARY

Over 8 years of experience in Information Technology which includes experience in Big data, HadoopEcosystem like HDFS, MapReduce, Yarn, Pig Hive, HBase, Sqoop, Oozie, Flume, Zookeeper.
Cloudera Certified Administrator for Apache Hadoop (CCAH)
5 Years experience installing, configuring, testing Hadoop ecosystem components.
Experience in Writing Map Reduce programs in Java.
Excellent work Experience with 5 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS.
Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks HDP.
Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
Experience in managing Hadoop cluster using Cloudera Manager and AMBARI.
Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
Hands on experience working with Java project build managers Apache MAVEN.
Knowledge of UNIX and shell scripting.
Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance, Polymorphism, Exception handling and Templates and Development experience with Java technologies.
Experienced in configuring Workflow scheduling using Oozie.
Developed Spark applications using Scala for easy Hadoop transitions.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
Designed, and implemented a hybrid cloud virtual data center utilizing AWSto provide servers, storage, networks.High - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
Worked on NoSQL databases including Hbase, Cassandra and Mongo DB.
Cluster planning and engineering of POC and Production Clusters.
Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
Used Apache Kafka for tracking data ingestion to Hadoopcluster.

TECHNICAL SKILLS

Operating systems: WINDOWS, LINUX (Fedora, CentOS), UNIX

Languages and Technologies: C, C++, Java, SQL, PLSQL

Scripting Languages: Shell scripting

Databases: Oracle, MySQL, Postgre SQL

IDE: Eclipse and Net Beans, SBT

Application Servers: Apache Tomcat server, Apache HTTP webserver

Hadoop Ecosystem: Hadoop MapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2

Apache Spark: Spark, Spark SQL, Spark Streaming.SCALA, spark with python

Cluster Mgmt.& Monitoring: Cloudera Manager, Horton works Ambari, Ganglia and Nagios.

Security: Kerberos.

PROFESSIONAL EXPERIENCE

Confidential, DES Moines IOWA

Senior Hadoop Developer/administrator

Responsibilities:

Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
Implemented nine nodes CDH5 Hadoop cluster on Red hat LINUX.
Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Worked in AWSenvironment for development and deployment of Custom HADOOPApplications.
Developed Pig UDF s to pre-process the data for analysis.
Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive.
Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
Dealing with high volume of data in the cluster.
Collected logs data from web servers and integrated into HDFS using Apache Flume.
Experience working on Datameer 5, Datameer 6.0 Versions.
Tuned the developed ETL jobs for better performance.
Imported logs from web servers with Flume to ingest the data into HDFS
Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
Good troubleshooting skills on Hue, which provides GUI for developer’s/business users for day to day activities
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Used Oozie and Zookeeper for workflow scheduling and monitoring.
Created Hive Managed and External tables defined with static and dynamic partitions.
Load the data into SparkRDD and do in memory data Computation to generate the Output response.
Spark Streaming is used to get the Web server log files.
Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.

Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERA MANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HOROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.

Confidential, Seattle, WA

HADOOP DEVELOPER/administrator

Responsibilities:

Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode highAvailability, capacity planning, and slots configuration.
Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
Experienced on adding/installation of new components and removal of them through Ambari.
Monitoring systems and services through Ambari dashboard to make the clusters available for the business
Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoopprograms using Oozie.
Works with ETL workflow, analysis of big data and loaded them into Hadoopcluster
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Developed Spark Application by using Scala
Load and transform large sets of structured, semi structured and unstructured data
Configured MySQL Database to store Hive metadata.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
Responsible for loading unstructured data intoHadoopFile System (HDFS).
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Worked on Map reduce Joins in querying multiple semi-structured data as per analytic needs.
Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.

Environment: Hadoop, Pig, Hive, Java, SQOOP, Kafka, HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.

Confidential - Weehawken, NJ

Hadoop developer/ Administrator

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
Collected the logs data from web servers and integrated into HDFS using Flume.
Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
Installed Oozie workflow engine to run multiple Hive Jobs
Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
Worked on Hue interface for querying the data
Automating system tasks using Puppet.
Created Hive tables to store the processed results in a tabular format.
Utilized cluster co-ordination services through ZooKeeper.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
Configuring Sqoop and Exporting/Importing data into HDFS
Configured NameNode high availability and NameNode federation.
Experienced in loading data from UNIX local file system to HDFS.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Configured NameNode high availability and NameNode federation.
Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
Data analysis in running Hive queries.
Generated reports using the Tableau report designer.

Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Puppet, Tableau, and Java.

Confidential - Atlantic City, NJ

Hadoop/ Java Developer

Responsibilities:

Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped thedevelopersto implement successfully.
Designed theHadoopjobs to create the product recommendation using collaborative filtering.
Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
Integrated the Order Capture system with Sterling OMS using JSON Web service
Configured the ESB to transform the Order capture XML to Sterling message.
Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
Mentored and implemented the test driven development (TDD) strategies.
Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
Developed the Data transformation script using hive and MapReduce.
Designed and developed User Defined Function (UDF) for Hive.
Loading the data to HBASE using bulk load and HBASE API.
Designed and implemented the Open API using Spring REST webservice.
Proposed the integration pipeline testing strategy-using cargo.

Environment: Java, JSP, Spring, JSF, Rest Web service, IntelliJ, WebLogic, Subversion, Oracle, Hadoop, Sqoop, Hbase, Hive, Sterling OMS, TDD and Agile

Confidential

Linux Administrator

Responsibilities:

Installation and configuration of Linux for new build environment.
Installing and maintaining the Linux servers
Extensive use of REDHAT Enterprise Linux 5.X.
Operating system backup and upgrades from RHEL 5.4 to 5.5/6. x.
Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
Ensured data recovery by implementing system and application level backups.
Performed various configurations that include networking and IPTables, resolving host names and SSH keyless login.
Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
Automate administration tasks through the use of scripting and Job scheduling using CRON.
Monitoring System Metrics and logs for any problems.
Adding, removing, or updating user account information, resetting passwords, etc.
Maintaining the MySQL server and Authentication to required users for databases.
Creating and managing Logical volumes using LVM.
Installing and updating packages using YUM.
Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations

We provide IT Staff Augmentation Services!

Senior Hadoop Developer/administrator Resume

Des Moines, IowA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship