We provide IT Staff Augmentation Services!

Hadoop Developer/administrator Resume

0/5 (Submit Your Rating)

Fort Lauderdale, FL

PROFESSIONAL SUMMARY:

  • Over 4 years of experience in Information Technology which includes experience in Big data, HadoopEcosystem like HDFS, MapReduce, Yarn, Pig Hive, HBase, Sqoop, Oozie, Flume, Zookeeper.
  • 4 Years experience installing, configuring, testing Hadoop ecosystem components.
  • Experience in Writing Map Reduce programs in Java.
  • Excellent work Experience with 30 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS.
  • Strong experience on Hadoop distributions Hortonworks & Cloudera.
  • Experience in developing Shell Scripts and Python Scripts for System Management.
  • Extensive experience in Extraction, Transformation and loading (ETL) of data from multiple sources in to Data Warehouse.
  • Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
  • Expertise in shell scripting on UNIX platform.
  • Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
  • In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance, Polymorphism, Exception handling and Templates and Development experience with Java technologies.
  • Experienced in configuring Workflow scheduling using Oozie.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Implemented different machine learning techniques in Scala using Scala machine learning Library.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
  • Responsible for developing PIG Latin scripts for extracting required data using JSON Reader function.
  • Designed, and implemented a hybrid cloud virtual data center utilizing AWSto provide servers, storage, networks.high - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
  • Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
  • Cluster planning and engineering of POC and Production Clusters.
  • Assisted monitoring Hadoop cluster using Gangila.
  • Participated in building CDH4 test cluster for implementing Kerberos authentication
  • Developed flow XML files using Apache NiFi, workflow automation tool to ingest data in to HDFS.
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds
  • Worked on TALEND to import/Export data from RDBMS to HADOOP.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
  • Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
  • Developed ETL parsing and analytics using SCALA/Spark to build a structured data model in
  • Elasticsearch for consumption by the API and UI.
  • Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Experience in capacity planning, hardware recommendations, performance tuning and benchmarking.
  • Used Apache Kafka for tracking data ingestion to Hadoopcluster.
  • Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
  • Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
  • Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.

TECHNICAL SKILLS:

HadoopEcosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase, Oozie, Flume.

Programming Languages: Java, C, C++, SCALA

Databases: NoSQL, MySQL, Oracle

Unix Tools: Apache, YUM. IDE Tools: Eclipse, Netbeans, STS, IntelliJ.

Build Tools: Maven, Ant, SBT Operating Systems: Windows, Linux, Unix

Methodologies: Agile, UML, Design Patterns.

PROFESSIONAL EXPERIENCE:

Confidential, Fort Lauderdale, FL

Hadoop Developer/Administrator

Responsibilities:

  • Experience using Cloudera and HortonWorks platform and their eco systems. Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
  • Extensive experience in both MapReduce MRv1 and MapReducev2(YARN).
  • Implemented nine nodes CDH4 Hadoop cluster on Red hat LINUX.
  • Worked in AWSenvironment for development and deployment of Custom HADOOPApplications.
  • Wrote Map Reduce jobs to generate reports for the number of activities created on a particular day, during dumped from the multiple sources and the output written back to HDFS.
  • Developed Pig UDF s to pre-process the data for analysis.
  • Worked on writing and as well as read data from csv and excel file formats with Perl/Python scripts.
  • Worked on connecting to oracle database and fetch the data with Perl/Python.
  • Wrote Python/Perl scripts to parse XML/JSON documents and load the data in database.
  • Load the data into SparkRDD and do in memory data Computation to generate the Output response.
  • Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
  • Exploring with the sparkimproving the performance and optimization of the existing algorithms in Hadoop using sparkContext, spark-SQL, Data Frame, Pair RDD's, sparkYARN.
  • Managed and reviewed Hadoop log files to identify issues when job files.
  • Spark Streaming is used to get the Web server log files.
  • Collected logs data from web servers and integrated into HDFS using Apache Flume.
  • Experience working on Datameer4, Datameer 5, Datameer 6.0 Versions.
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Created Hive Managed and External tables defined with static and dynamic partitions.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • HiveQL scripts to create, load, and query tables in a Hive.
  • Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.

Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERA MANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.

Confidential, Marlborough, MA

HADOOP DEVELOPER/Administrator

Responsibilities:

  • Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode highAvailability, capacity planning, and slots configuration.
  • Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
  • Implemented CDH4, CDH5 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Configured MySQL Database to store Hive metadata.
  • Developing HTML reports with Perl CGI.
  • Experience in reading and writing xml reports with Perl XML modules.
  • Extracting data from Teradata server by creating Perlmodules.
  • Implemented Hive tables from Datameer logic.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
  • Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for loading unstructured data intoHadoopFile System (HDFS).
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Exported data from DB2 to HDFS using SQOOP.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
  • Generate final reporting data using TABLEAU for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Used APACHE KAFKA for Integration and data processing pipelines

Environment: CDH4, CDH5, Hadoop, Pig, Hive, Java, SQOOP, Kafka, HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.

Confidential, Jacksonville, FL

Hadoop/Java Developer

Responsibilities:

  • Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped thedevelopersto implement successfully.
  • Designed theHadoopjobs to create the product recommendation using collaborative filtering.
  • Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
  • Integrated the Order Capture system with Sterling OMS using JSON Web service
  • Configured the ESB to transform the Order capture XML to Sterling message.
  • Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
  • Mentored and implemented the test driven development (TDD) strategies.
  • Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
  • Developed the Data transformation script using hive and MapReduce.
  • Designed and developed User Defined Function (UDF) for Hive.
  • Loading the data to HBASE using bulk load and HBASE API.
  • Designed and implemented the Open API using Spring REST webservice.
  • Proposed the integration pipeline testing strategy-using cargo.

Environment: Java, JSP, Spring, JSF, Rest Web service, InteliJ, Weblogic, Subversion, Oracle, Hadoop, Sqoop, Hbase, Hive, Sterling OMS, TDD and Agile

We'd love your feedback!