Hadoop Developer/administrator Resume
Fort Lauderdale, FL
PROFESSIONAL SUMMARY:
- Over 4 years of experience in Information Technology which includes experience in Big data, HadoopEcosystem like HDFS, MapReduce, Yarn, Pig Hive, HBase, Sqoop, Oozie, Flume, Zookeeper.
- 4 Years experience installing, configuring, testing Hadoop ecosystem components.
- Experience in Writing Map Reduce programs in Java.
- Excellent work Experience with 30 TB data on Data migration, data preprocessing, validations and data Analysis in HDFS.
- Strong experience on Hadoop distributions Hortonworks & Cloudera.
- Experience in developing Shell Scripts and Python Scripts for System Management.
- Extensive experience in Extraction, Transformation and loading (ETL) of data from multiple sources in to Data Warehouse.
- Strong knowledge in Administration and installation of Hadoop ecosystem in a multi node cluster.
- Expertise in shell scripting on UNIX platform.
- Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object Oriented features like Inheritance, Polymorphism, Exception handling and Templates and Development experience with Java technologies.
- Experienced in configuring Workflow scheduling using Oozie.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Implemented different machine learning techniques in Scala using Scala machine learning Library.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop File System (HDFS).
- Responsible for developing PIG Latin scripts for extracting required data using JSON Reader function.
- Designed, and implemented a hybrid cloud virtual data center utilizing AWSto provide servers, storage, networks.high - availability, backup and disaster recovery, demand forecast, capacity planning, and performance management.
- Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
- Cluster planning and engineering of POC and Production Clusters.
- Assisted monitoring Hadoop cluster using Gangila.
- Participated in building CDH4 test cluster for implementing Kerberos authentication
- Developed flow XML files using Apache NiFi, workflow automation tool to ingest data in to HDFS.
- Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds
- Worked on TALEND to import/Export data from RDBMS to HADOOP.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Developed ETL parsing and analytics using SCALA/Spark to build a structured data model in
- Elasticsearch for consumption by the API and UI.
- Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Experience in capacity planning, hardware recommendations, performance tuning and benchmarking.
- Used Apache Kafka for tracking data ingestion to Hadoopcluster.
- Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
- Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
TECHNICAL SKILLS:
HadoopEcosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase, Oozie, Flume.
Programming Languages: Java, C, C++, SCALA
Databases: NoSQL, MySQL, Oracle
Unix Tools: Apache, YUM. IDE Tools: Eclipse, Netbeans, STS, IntelliJ.
Build Tools: Maven, Ant, SBT Operating Systems: Windows, Linux, Unix
Methodologies: Agile, UML, Design Patterns.
PROFESSIONAL EXPERIENCE:
Confidential, Fort Lauderdale, FL
Hadoop Developer/Administrator
Responsibilities:
- Experience using Cloudera and HortonWorks platform and their eco systems. Hands on Experience in installing, configuring and using ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop and Flume.
- Extensive experience in both MapReduce MRv1 and MapReducev2(YARN).
- Implemented nine nodes CDH4 Hadoop cluster on Red hat LINUX.
- Worked in AWSenvironment for development and deployment of Custom HADOOPApplications.
- Wrote Map Reduce jobs to generate reports for the number of activities created on a particular day, during dumped from the multiple sources and the output written back to HDFS.
- Developed Pig UDF s to pre-process the data for analysis.
- Worked on writing and as well as read data from csv and excel file formats with Perl/Python scripts.
- Worked on connecting to oracle database and fetch the data with Perl/Python.
- Wrote Python/Perl scripts to parse XML/JSON documents and load the data in database.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Used SQOOP to import customer information data from MySQL database into HDFS for data processing.
- Exploring with the sparkimproving the performance and optimization of the existing algorithms in Hadoop using sparkContext, spark-SQL, Data Frame, Pair RDD's, sparkYARN.
- Managed and reviewed Hadoop log files to identify issues when job files.
- Spark Streaming is used to get the Web server log files.
- Collected logs data from web servers and integrated into HDFS using Apache Flume.
- Experience working on Datameer4, Datameer 5, Datameer 6.0 Versions.
- Used Oozie and Zookeeper for workflow scheduling and monitoring.
- Created Hive Managed and External tables defined with static and dynamic partitions.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Developed multiple Kafka Producers and Consumers from base by using low level and high level API’s.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- HiveQL scripts to create, load, and query tables in a Hive.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Environment: AWS, HADOOP, CDH4, CDH5, CLOUDERA MANAGER, HORTONWORKS, MAPREDUCE, HIVE, PIG, SPARK, SCALA, HROTONWORKS DATA PLATFORM, TALEND, SQL, SQOOP, FLUME and ECLIPSE.
Confidential, Marlborough, MA
HADOOP DEVELOPER/Administrator
Responsibilities:
- Worked on installing cluster, commissioning & decommissioning of Datanode, Namenode highAvailability, capacity planning, and slots configuration.
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
- Implemented CDH4, CDH5 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Configured MySQL Database to store Hive metadata.
- Developing HTML reports with Perl CGI.
- Experience in reading and writing xml reports with Perl XML modules.
- Extracting data from Teradata server by creating Perlmodules.
- Implemented Hive tables from Datameer logic.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
- Developed Spark code using Scala and spark-SQL/Streaming for faster testing and processing of data.
- Responsible for loading unstructured data intoHadoopFile System (HDFS).
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
- Exported data from DB2 to HDFS using SQOOP.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
- Generate final reporting data using TABLEAU for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Used APACHE KAFKA for Integration and data processing pipelines
Environment: CDH4, CDH5, Hadoop, Pig, Hive, Java, SQOOP, Kafka, HBase, NoSQL, Oracle, SPARK, SCALA, STORM, ELASTIC SEARCH, ZOOKEEPER, OOZIE, RED HAT LINUX, TABLEAU.
Confidential, Jacksonville, FL
Hadoop/Java Developer
Responsibilities:
- Re-architected all the applications to utilize the latest infrastructure in a span of three months and helped thedevelopersto implement successfully.
- Designed theHadoopjobs to create the product recommendation using collaborative filtering.
- Designed the COSA pretest utility Framework using JSF MVC, JSF Validation, Tag library and JSF Baking beans.
- Integrated the Order Capture system with Sterling OMS using JSON Web service
- Configured the ESB to transform the Order capture XML to Sterling message.
- Configured and Implemented Jenkins, Maven and Nexus for continuous integration.
- Mentored and implemented the test driven development (TDD) strategies.
- Loaded the data from Oracle to HDFS (Hadoop) using Sqoop.
- Developed the Data transformation script using hive and MapReduce.
- Designed and developed User Defined Function (UDF) for Hive.
- Loading the data to HBASE using bulk load and HBASE API.
- Designed and implemented the Open API using Spring REST webservice.
- Proposed the integration pipeline testing strategy-using cargo.
Environment: Java, JSP, Spring, JSF, Rest Web service, InteliJ, Weblogic, Subversion, Oracle, Hadoop, Sqoop, Hbase, Hive, Sterling OMS, TDD and Agile