Hadoop Developer/Administrator Resume Fort Lauderdale, FL - Hire IT People

PROFESSIONAL SUMMARY:

Over 4 years of experience in Information Technology which includes experience in Big data, HadoopEcosystem like HDFS, MapReduce, Yarn, Pig Hive, HBase, Sqoop, Oozie, Flume, Zookeeper.
4 Years experience installing, configuring, testing Hadoop ecosystem components.
Experience in Writing Map Reduce programs in Java.
Excellent work Experience with 30 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS.
Strong experience on Hadoop distributions Hortonworks & Cloudera.
Experience in developing Shell Scripts and Python Scripts for System Management.
Extensive experience in Extraction, Transformation and loading (ETL) of data from multiple sources in to Data Warehouse.
Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
Expertise in shell scripting on UNIX platform.
Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance, Polymorphism, Exception handling and Templates and Development experience with Java technologies.
Experienced in configuring Workflow scheduling using Oozie.
Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
Developed Spark applications using Scala for easy Hadoop transitions.
Implemented different machine learning techniques in Scala using Scala machine learning Library.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
Responsible for developing PIG Latin scripts for extracting required data using JSON Reader function.
Designed, and implemented a hybrid cloud virtual data center utilizing AWSto provide servers, storage, networks.high - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
Cluster planning and engineering of POC and Production Clusters.
Assisted monitoring Hadoop cluster using Gangila.
Participated in building CDH4 test cluster for implementing Kerberos authentication
Developed flow XML files using Apache NiFi, workflow automation tool to ingest data in to HDFS.
Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds
Worked on TALEND to import/Export data from RDBMS to HADOOP.
Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
Developed ETL parsing and analytics using SCALA/Spark to build a structured data model in
Elasticsearch for consumption by the API and UI.
Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
Experience in capacity planning, hardware recommendations, performance tuning and benchmarking.
Used Apache Kafka for tracking data ingestion to Hadoopcluster.
Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.

TECHNICAL SKILLS:

HadoopEcosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase, Oozie, Flume.

Programming Languages: Java, C, C++, SCALA

Databases: NoSQL, MySQL, Oracle

Unix Tools: Apache, YUM. IDE Tools: Eclipse, Netbeans, STS, IntelliJ.

Build Tools: Maven, Ant, SBT Operating Systems: Windows, Linux, Unix

Methodologies: Agile, UML, Design Patterns.

PROFESSIONAL EXPERIENCE:

Confidential, Fort Lauderdale, FL

Hadoop Developer/Administrator

Responsibilities:

Experience using Cloudera and HortonWorks platform and their eco systems. Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
Extensive experience in both MapReduce MRv1 and MapReducev2(YARN).
Implemented nine nodes CDH4 Hadoop cluster on Red hat LINUX.
Worked in AWSenvironment for development and deployment of Custom HADOOPApplications.
Wrote Map Reduce jobs to generate reports for the number of activities created on a particular day, during dumped from the multiple sources and the output written back to HDFS.
Developed Pig UDF s to pre-process the data for analysis.
Worked on writing and as well as read data from csv and excel file formats with Perl/Python scripts.
Worked on connecting to oracle database and fetch the data with Perl/Python.
Wrote Python/Perl scripts to parse XML/JSON documents and load the data in database.
Load the data into SparkRDD and do in memory data Computation to generate the Output response.
Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
Exploring with the sparkimproving the performance and optimization of the existing algorithms in Hadoop using sparkContext, spark-SQL, Data Frame, Pair RDD's, sparkYARN.
Managed and reviewed Hadoop log files to identify issues when job files.
Spark Streaming is used to get the Web server log files.
Collected logs data from web servers and integrated into HDFS using Apache Flume.
Experience working on Datameer4, Datameer 5, Datameer 6.0 Versions.
Used Oozie and Zookeeper for workflow scheduling and monitoring.
Created Hive Managed and External tables defined with static and dynamic partitions.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
HiveQL scripts to create, load, and query tables in a Hive.
Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.

Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERA MANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.

Confidential, Marlborough, MA

HADOOP DEVELOPER/Administrator

Responsibilities:

Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode highAvailability, capacity planning, and slots configuration.
Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
Implemented CDH4, CDH5 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
Configured MySQL Database to store Hive metadata.
Developing HTML reports with Perl CGI.
Experience in reading and writing xml reports with Perl XML modules.
Extracting data from Teradata server by creating Perlmodules.
Implemented Hive tables from Datameer logic.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
Responsible for loading unstructured data intoHadoopFile System (HDFS).
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
Exported data from DB2 to HDFS using SQOOP.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
Generate final reporting data using TABLEAU for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
Used APACHE KAFKA for Integration and data processing pipelines

Environment: CDH4, CDH5, Hadoop, Pig, Hive, Java, SQOOP, Kafka, HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.

Confidential, Jacksonville, FL

Hadoop/Java Developer

Responsibilities:

Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped thedevelopersto implement successfully.
Designed theHadoopjobs to create the product recommendation using collaborative filtering.
Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
Integrated the Order Capture system with Sterling OMS using JSON Web service
Configured the ESB to transform the Order capture XML to Sterling message.
Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
Mentored and implemented the test driven development (TDD) strategies.
Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
Developed the Data transformation script using hive and MapReduce.
Designed and developed User Defined Function (UDF) for Hive.
Loading the data to HBASE using bulk load and HBASE API.
Designed and implemented the Open API using Spring REST webservice.
Proposed the integration pipeline testing strategy-using cargo.

Environment: Java, JSP, Spring, JSF, Rest Web service, InteliJ, Weblogic, Subversion, Oracle, Hadoop, Sqoop, Hbase, Hive, Sterling OMS, TDD and Agile

We provide IT Staff Augmentation Services!

Hadoop Developer/administrator Resume

Fort Lauderdale, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship