Hadoop Developer Resume
Bentonville, ArkansaS
SUMMARY:
- Over 7+ years of overall software development experience on Big Data Technologies, Hadoop Ecosystem and Java/J2EE Technologies.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
- Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, Cloud Formation, Cloud Watch, SQS, IAM), focusing on high - availability, fault tolerance, and auto-scaling.
- In depth and extensive knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn, Resource Manager, Node Manager and Map Reduce.
- A very good understanding of job workflow scheduling and monitoring tools like Oozie.
- Worked on HDFS, Name Node, Job Tracker, Data Node, Task Tracker and the MapReduce concepts.
- Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Zookeeper, Soler and Kafka.
- Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
- Solid Experience in optimizing the Hive queries using Partitioning and Bucketing techniques, which controls the data distribution, to enhance performance.
- Experience in Importing and Exporting data from different databases like MySQL, Oracle into HDFS and Hive using Sqoop.
- Experience working in environments using Agile (scrum) and Waterfall methodologies.
- Expertise in database modeling and development using SQL and PL/SQL, MySQL, Teradata.
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- Experience in NoSQL Databases HBase, Cassandra and it's integrated with Hadoop cluster.
- Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and SQL Server.
- Experience in building, deploying and integrating applications in Application Servers with ANT, Maven and Gradle.
- Significant application development experience with REST Web Services, SOAP, WSDL, and XML.
- Strong hands-on development experience with Java, J2EE (Servlets, JSP, Java Beans, EJB, JDBC, JMS, Web Services) and related technologies.
- Experience in working with different data sources like Flat files, XML files and Databases.
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedure's PL/ SQL, packages and triggers in Oracle and MongoDB on Unix/Linux.
TECHNICAL SUMMARY:
Programming Languages: Java, SQL, PL/SQL, Pig Latin, Hive QL, Python, Scala
Big Data System: Hadoop, Map Reduce, Yarn, Hive, Spark, oozie, Kafka, zookeeper, Cloudera.
Hadoop Platform: Cloudera, Horton Works, AWS EMR.
Spark Technologies: RDD, Data frames, Datasets, Spark-SQL, Streaming.
Web Technologies: JEE (JDBC, JSP, SERVLET, JSF, JSTL), AJAX, JavaScript
RDBMS: Oracle, MySQL, SQL Server, PostgreSQL, Teradata
NoSQL Databases: HBase, MongoDB, Cassandra
Frameworks: Struts 2, Hibernate, Spring 3.x
Version Control: GIT, SVN, Bitbucket.
IDEs: NetBeans, IntelliJ IDEA, PyCharm, Ecclise, VS Studio.
AWS Services: EC2, Lambda, S3, SNS, Cloud watch, RDS etc.
Operating Systems: Windows, Linux Mac OS.
Devops Tools: Jenkins, Jira, Docker.
PROFESSIONAL EXPERIENCE:
Confidential, Bentonville, Arkansas
Hadoop Developer
Responsibilities:
- Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
- Designed AWS, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.
- Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera.
- Load the data into spark RDD and performed in-memory data computation to get faster output response and implemented spark SQL queries on data formats like Text file, CSV file and XML files.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
- Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
- Mastered major Hadoop distributes like Hortonworks and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Developed Hive Scripts, Pig scripts, Unix Shell scripts, Spark programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, and Data Frames.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts. Design of Redshift Data model, Redshift Performance improvements/analysis
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce, Worked with Spark with both Spark and Python.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios Environment : Hadoop, Spark, Cassandra, Hive, Redshift, HDFS, MySQL, Sqoop, Oozie, pig, Cloudera Manager, MapReduce, HBase, Zookeeper, Spark, Unix, Kafka, JSON, Python, Jenkins., AWS(EC2, RDS, S3, Lambda, EMR etc).
Confidential, Louisville, Kentucky
Hadoop Developer
Responsibilities:
- Developed Spark jobs written in Scala to perform operations like data aggregation, data processing and data analysis.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Used Spark for series of dependent jobs and for iterative algorithms. Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS.
- Performance Tuning for Hive and Pig Job's performance parameters along with native MapReduce parameters to avoid excessive disk spills, enabled temp file compression between jobs in the data pipeline to handle production size data in a multi-tenant cluster environment.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Environment: Hadoop HDFS, Flume, Pig, Hive, Oozie, Zookeeper, HBase, Spark, Storm, Spark SQL, Scala, Kafka, MongoDB, Linux, Sqoop, Hive, AWS.
Confidential, Chicago, Illinois
Hadoop Developer
Responsibilities:
- Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration project.
- Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
- Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
- Developed and designed application to process data using Spark.
- Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data for Marketing & Offers platform.
- Developed and designed automate process using shell scripting for data movement and purging.
- Installation & Configuration Management of a small multi node Hadoop cluster.
- Installation and configuration of other open source software like Pig, Hive, Flume, Sqoop.
- Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Importing and exporting data into Impala, HDFS and Hive using Sqoop.
- Responsible to manage data coming from different sources.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access. Developed Hive tables to transform, analyze the data in HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Writing the script files for processing data and loading to HDFS.
Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Oracle 11g/10g, PL/SQL, SQL*PLUS, Linux, Sqoop, Hive.
Confidential, Houston, Texas
Java Developer
Responsibilities:
- Created enterprise deployment strategy and designed the enterprise deployment process to deploy Web Services, J2EE programs on more than 7 different SOA/WebLogic instances across development, test and production environments.
- Designed user interface HTML, Swing, CSS, XML, Java Script and JSP.
- Involved in developing, testing and implementation of the system using Struts, JSF, and Hibernate.
- Developing, modifying, fixing, reviewing, testing and migrating the Java, JSP, XML, Servlet, SQLs, JSF.
- Updated user-interactive web pages from JSP and CSS to Html5, CSS, and JavaScript for the best user experience. Developed Servlets, Session and Entity Beans handling business logic and data.
- Involve in Requirement Analysis, Design, Code Testing and debugging, Implementation activities.
- Involved in the Performance Tuning of Database and Informatica. Improved performance by identifying and rectifying the performance bottle necks.
- Understanding how to apply technologies to solve big data problems and to develop innovative big data solutions
- Designed and developed Job flows using Oozie.
- Developed Sqoop commands to pull the data from Teradata.
- The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
- Wrote PL/SQL Packages and Stored procedures to implement business rules and validations.
Environment: Java, J2EE, Java Server Pages (JSP), JavaScript, Hadoop, Oozie, Hive, Teradata, Servlets, JDBC, PL/SQL, ODBC, Struts Framework, XML, CSS, HTML, DHTML, XSL, XSLT and MySQL.