Sr. Hadoop/spark Developer Resume
GA
SUMMARY
- Having overall 5 plus years of IT Experience as a Hadoop/Spark Developer with experience in all phases of Software Application requirement analysis, design, development and maintenance ofHadoop/Big Data application and web applications using java/J2EE technologies with specializing in Finance, Health care, Insurance, Retail and Telecom Domains.
- Strong Knowledge of Software Development Life Cycle (SDLC) and the Role of Hadoop/Spark developer in different developing methodologies like Agile and Waterfall.
- Expertise in all components of Hadoop Ecosystem - Hive, Pig, HBase, Impala, Sqoop, HUE, Flume, Zookeeper, Oozie and Apache Spark.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experienced in integrating Kafka with Spark streaming for high speed data processing.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV format
- Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Expertise in developing PIG Latin Scripts and Hive Query Language for data Analytics.
- Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Used Oozie job scheduler to schedule MapReduce jobs and automate the job flows and Implemented cluster coordination services using Zookeeper.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Knowledge on working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Knowledge in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Experience in working with different relational databases like MySQL, MS SQL and Oracle.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Proficient in software documentation and technical report writing.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, ApacheSpark, Apache STORM, Apache Kafka, Sqoop, Flume.
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming Languages: C, C++, Java, Python, SCALA
Scripting Languages: Shell Scripting, Java Scripting
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, SQL, PL/SQL, Teradata
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Build Tools: Ant, Maven, sbt
Development IDEs: NetBeans, Eclipse IDE
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Cloud: AWS
Version Control Tools: SVN, Git, GitHub,BitBucket
Packages: Microsoft Office, putty, MS Visual Studio
PROFESSIONAL EXPERIENCE
Confidential, GA
Sr. Hadoop/Spark Developer
Responsibilities:
- Developed data pipeline using Kafka, Sqoop, Hive and Java to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developing design documents considering all possible approaches and identifying best of them.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring withSpark forimproving the performance and optimization of the existing algorithms in Hadoop.
- Documented ETL test plans, test cases, test scripts, and validations based on design specifications for unit testing, system testing, functional testing, prepared test data for testing, error handling and analysis.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, ETL Processes for data warehouses.
- Extensively used Informatica Power Center Data Validation tool to unit test the ETL mappings.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced withSparkContext,Spark -SQL, Data Frame, Pair RDD's,SparkYARN.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDD and Scala.
- Implemented a Continuous Delivery pipeline with Docker, Bamboo, GitHub and AWS AMI’s such that whenever a new GitHub branch gets created, Bamboo automatically attempts to build a new Docker container from it.
- Wrote SQL queries and Stored Procedures for interacting with the PostgreSQL database.
- Modified MicroService architecture to AWS Lambda functions to spin up instances only when there is an event to consume data.
- Used GIT, BitBucket for source control and Bamboo for continuous integration Wrote test case using Junit, Mockito frameworks.
- Worked with release management to deploy applications into production systems, build plans and deploy activities.
- Used Amazon Cloud Watch to monitor AWS services and Amazon Cloud Watch logs to monitor application.
- Developed Java API to interact with the Amazon SQS used in sending bulk emails.
- Selecting the appropriate AWS service based on compute, data, or security requirements.
- Integration of Amazon Web Services (AWS) with other applications infrastructure.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Imported results into visualization BI tool Tableau to create dashboards.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Involved in gathering the requirements, designing, development and testing.
Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, AWS,Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase.
Confidential, St. Louis, MO
Sr. Hadoop Developer
Responsibilities:
- Involved in loading and transforming large sets of structured, semi-structured and unstructured data.
- Adept in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Developed data pipeline using Flume, Sqoop, Pig, Java map reduce and Spark Scala jobs to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP.
- Extensive experience in writing HDFS & Pig Latin commands.
- Responsible for writing Pig Latin scripts.
- When we are writing Pig Latin scripts, we use the Python concepts
- When transferring data from one source to other we use python scripting language.
- Develop UDF's to provide custom hive and pig capabilities and apply business logic on that data.
- Created Hive internal/external tables with proper static and dynamic partitions.
- Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
- Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster.
- Using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Performance tuning using Partitioning, bucketing of HIVE tables.
- Experience in NoSQL database such as HBase.
- Created HBase tables, loading large data sets coming from Linux, NoSQL, and MySQL.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Very capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Deploying and maintaining production environment using AWS EC2 instances and ECS with Docker.
- Amazon EC2 is also used for deploying and testing the lower environments such as Dev, INT and Test.
- Object storage service Amazon S3 is used to store and retrieve media files such as images.
- Cloud Watch is used to monitor the application and to store the logging information.
- Created the Docker containers and Docker swarm consoles for managing the application life cycle. Developed Docker images to support Development and Testing Teams and their pipelines Introduced pipeline and automation best practices, putting together an introduction to Docker and Kubernetes on AWS.
- Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently with time and data availability.
- Experienced in Requirement gathering, create Test Plan, constructed and executed positive/negative test cases in-order to prompt and arrest all bugs within QA environment.
Environment: HDFS, Map Reduce, CDH5, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Zookeeper, AWS, MySQL, Java, Linux Shell Scripting, XML.
Confidential, Durham, NC
Hadoop Developer
Responsibilities:
- Interacting with the Business Requirements and the design team and preparing the Low-Level Design and high-level design documents.
- Provide in-depth technical and business knowledge to ensure efficient design, programming, implementation and on-going support for the application.
- Involved in identifying possible ways to improve the efficiency of the system.
- Developed multiple MapReduce jobs in java for log data cleaning and preprocessing and scheduled the job to collect aggregate the log on an hourly basis.
- Implemented MapReduce programs using Java.
- Logical implementation and interaction with HBase.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed Map Reduce jobs to automate transfer of data from/to HBase.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Had knowledge on Kibana and Elastic search to identify the Kafka message failure scenarios.
- Implemented to reprocess the failure messages in Kafka using offset id.
- Used flume and Kafka to collect all the web log from the online ad-servers and push into HDFS.
- Implemented Map/Reduce job and execute the Map/Reduce job to process the log data from the ad-servers.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on MongoDB, and Cassandra.
- Prepared multi-cluster test harness to exercise the system for better performance.
Environment: Hadoop, HDFS, MapReduce, HBase, Hive, Kafka, Flume, Cassandra, Hadoop distribution of Hortonworks, Cloudera, Eclipse (Juno), Java Batch, SQL* PLUS and Oracle 10g.
Confidential
Programmer Analyst
Responsibilities:
- Involved in understanding the functional specifications of the project.
- Assisted the development team in designing the complete application architecture
- Involved in developing JSP pages for the web tier and validating the client data using JavaScript.
- Developed connection components using JDBC.
- Designed Screens using HTML and images.
- Cascading Style Sheet (CSS) was used to maintain uniform look across different pages.
- Involved in creating Unit Test plans and executing the same.
- Did the documents/code reviews and knowledge transfer for the status updates of the ongoing project developments
- Deployed web modules in Tomcat web server.
Environment: Java, JSP, J2EE, Servlets, Java Beans, HTML, JavaScript, JDeveloper, Tomcat Webserver, Oracle, JDBC, XML.