Sr. Hadoop Developer Resume
Phoenix, AZ
SUMMARY
- Having 7 years of experience in Analysis, Design, Development, Integration, Testing and maintenance of various applications using JAVA /J2EE technologies along with around 5 years of Big Data /Hadoop experience.
- Experienced in building highly scalable Big - data solutions using Hadoop and multiple distributions i.e. Cloudera, Horton works and NoSQL platforms (HBase& Cassandra).
- Expertise in big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Pig, Zookeeper, Oozie, Kafka, Flume, Avro, Impala and Apache Spark.
- Hands on experience on performing Data Quality checks on petabytes of data
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
- Set up aGCPFirewall rules in order to allow or deny traffic to and from theVM'sinstances based on specified configuration and usedGCPcloudCDN(content delivery network) to deliver content fromGCPcache locations drastically improving user experience and latency.
- Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.
- Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, and EBS.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Experience in Designing, Architecting and implementing scalable cloud-based web applications usingAWSandGCP.
- Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
- Built AWS secured solutions by creating VPC with public and private subnets.
- Expertise in developing PIG and HIVE scripts for data analysis
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non- standard formats.
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
- Involved in the Ingestion of data from various Databases like DB2, SQL-SERVER using Sqoop.
- Experience working with Flume to handle large volume of streaming data.
- Good working knowledge on Hadoop hue ecosystems.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts.
- Good knowledge in evaluating big data analytics libraries (ML lib) and use of Spark-SQL for data exploratory.
- Experienced in using Apache ignite for handling streaming data.
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm- Kafka
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC for HIVE Querying and Processing
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Experience using IDEs tools Eclipse 3.0, My Eclipse, RAD and Net Beans.
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology.
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies
TECHNICAL SKILLS
Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Zookeeper, Kafka, Impala, Apache Spark, hue, Ambari. Apache ignite.
Hadoop Distributions: Cloudera (CDH4/CDH5), Horton Works
Languages: Java, C, SQL, PYTHON, PL/SQL, PIG-Latin, HQL
Cloud Computing Tools: Amazon AWS, (S3, EMR,GCP, EC2, Lambda, VPC, Route 53, Cloud Watch), Google Cloud
IDE Tools: Eclipse, IntelliJ
Framework: Hibernate, Spring, Struts, Junit
Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu, CentOS
Application Servers: J Boss, Tomcat, Web Logic, Web Sphere, Servlets
Reporting Tools/ETL Tools: Tableau, Power view for Microsoft Excel, Informatica
Databases: Oracle, MySQL, DB2, Derby, PostgreSQL, No-SQL Database (HBase, Cassandra)
PROFESSIONAL EXPERIENCE
Confidential, Phoenix, AZ
Sr. Hadoop Developer
Responsibilities:
- Developed Spark Applications by using Scala, Java, to process customer credit cards data of size in terms of TB/PB received from various sources like mobile, internet banking, merchant services.
- Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Implemented Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table for more efficient data.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Designed and Developed Spark and HBase ingestion data pipelines for all various credit card offers data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed Oozie workflow engine to run multiple Hive, Pig, Sqoop and Spark jobs.
- Real time streaming the data using Spark with Kafka.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data
- Used Spark transformations for Data Wrangling and ingesting the real-time data of various file formats.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Supports UAT and Production Implementations
- Build CI/CD pipelines
- Developed Common Drivers to handles similar implementations like Read/Write/Logging operations.
Environment: Hadoop, MapR, HDFS, Hive, Hbase, Oozie, Kafka, Spark, Scala, Java, Bitbucket.
Confidential, Phoenix, AZ.
Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analysing and reviewing Hadoop log files.
- Installed and configured Hadoop Map reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume.
- Installed and ConfiguredHadoopcluster using Amazon Web Services (AWS) for POC purposes.
- Migrated an existing on-premises application to AWS.
- Migrate mongo dB shared/replica cluster form one data centre to another without downtime.
- Manage and Monitor large production MongoDB shared cluster environments having terabytes of the data.
- Worked on AmazonAWSconcepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Worked on Importing and exporting data from RDBMS into HDFS with Hive and PIG using Sqoop.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala, Python.
- Setting up MongoDB Profiling to get slow queries
- Configuring HIVE and Oozie to store metadata in Microsoft SQL Server.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Responsible for maintaining and expandingAWS(Cloud Services) infrastructure usingAWS.
- Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need
- Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
- Developed the batch scripts to fetch the data fromAWSS3 storage and do required transformations in Scala using Spark framework.
- Developed Spark scripts by using Scala shell commands as per the requirement
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
- Developed a data pipeline to store data into HDFS.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Expertise in deployment of Hadoop Yarn, Spark and Storm integration with Cassandra, ignite and Kafka etc.
- Move data between clusters using distributed copy. Support and maintenance of Sqoop jobs and programs. Designed and developed Spark RDDs, Spark SQLs.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
Environment: Hadoop, Map Reduce, HDFS,AWS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, Kafka, Impala, Akka, Apache Spark, Spark Streaming Horton Works, HBase, MongoDB
Confidential, Phoenix, AZ
Hadoop Developer/Admin
Responsibilities:
- Hands on experience on Scala, Spark, Hive, Kafka, Shell, SQL, Tableau, Rally.
- Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase).
- Installing and configuring of various components of Hadoop ecosystem such as Flume, Hive, Pig, Sqoop, Oozie, Zookeeper, Kafka, and Storm and maintained their integrity.
- Have a good knowledge on Confidential internal data sources such as Cornerstone, WSDW, IDN, and SQL.
- Migrated an existing on-premises application to Amazon Web Services (AWS) and used its services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.
- We use Apache Kafka Connect for streaming data between Apache Kafka and other systems.
- Experience in performing advanced procedures like text analytics using in-memory computing capabilities of Spark using Scala.
- Partitioning data streams using Kafka. Designed and configured Kafka cluster to accommodate heavy throughput.
- Imported data from AWS S3 intoSparkRDD, performed transformations and actions on RDD's.
- Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Experienced in writing real-time processing and core jobs usingSparkStreaming with Kafka as a data pipeline system.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Implemented Sqoop jobs to import/export large data exchanges between RDBMS and Hive platforms.
- Working with application teams to install Hadoop updates, patches, version upgrades as required.
- Using Kafka Connect is a utility for streaming data betweenMapR Event Store for Apache Kafkaand other storage systems.
- Worked on visualization tool Tableau for visually analyzing the data.
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
- Migrated Map Reduce programs into Spark transformations using Scala.
- Application deployment and scheduling on cloud Spark/Ambari.
- To create a user interface to access the data from landing zone tables and automate the SQL queries to provide flexibility to the users.
Environment: MapReduce, Scala, Springframeworks2.1.3, AWS, Oracle 11.2.0.3, Kafka connectors,Maven 4.0, Spark, Hive Sql, Node Js V8.11.1, no Sql, Java Version 1.8, Tableau,Ambari user views, sparkreal time data source, cloud platform, consumers.
Confidential, Dearborn,Michigan
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Worked on analyzing/transforming the data with Hive and Pig.
- Load and transform large sets of structured, semi structured and unstructured data.
- Successfully loaded files to Hive and HDFS from traditional databases.
- Gained good experience with NOSQL database like HBase.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Experienced in managing and reviewingHadooplog files.
- Involved in writing Hive queries to load and process data in Hadoop File System.
- Experience in writing custom UDFs for Hive and Pig to extend the functionality.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
Environment: Cloudera CDH4.3, Hadoop, MapReduce, HDFS, Hive, pig, Java (jdk1.6), Impala, Tableau
Confidential
Java/ Hadoop Developer
Responsibilities:
- Involving in sprint planning as part of monthly deliveries.
- Involving in daily scrum calls and standup meetings as part of agile methodology.
- Good hands on experience on Version One tool to update the work details and working hours for a task.
- Involving in the designing part of views.
- Involving in Writing Spring Configuration Files and Business Logic based on Requirement.
- Involved in code-review sessions.
- Implementing Junit tests based on the business logic w.r.t to assigned backlog in sprint plan.
- Implementing the Fixtures to execute the Fitness test tables.
- Good experience on creating the Jenkins CI jobs and Sonar jobs.
Environment: Core Java, spring, Maven, XMF Services, JMS, Oracle10g, PostgreSQL, 9.2, Eclipse, SVN