Sr. Spark & Hadoop Developer Resume
Chicago, IL
SUMMARY:
- IT Professional with 8+ years of experience in Software Development Life Cycle including Requirements Gathering, Documenting, Analysis, Development, Testing and Support.
- Over 4 years of extensive experience as Hadoop Developer and Big Data Analyst with expertise in HDFS, Scala, Spark, MapReduce, YARN, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper.
- Good Experience in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution
- Expertise in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Experience with Big Data and Hadoop File System (HDFS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node and MapReduce concepts.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyses large data sets efficiently.
- Good understanding of Kafka architecture and experienced in writing spark streaming jobs in Kafka.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Experience in integrating Spark, Kafka and HBase to power real time dashboard.
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
- Hands - on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.
- Hands on knowledge on RDD and Data Frame transformations in spark.
- Experience with processing different file formats like Avro, Parquet, CSV, JSON and Sequence file formats using MapReduce programs and Spark.
TECHNICAL SKILLS:
Programming Languages: Java, Scala, C/C++, PL/SQL, Shell
Hadoop Ecosystem: Spark, HDFS, Map-Reduce, Hive, HBase, Kafka, Zookeeper, Sqoop, Flume, Oozie, Yarn, SOLR
Development Tools: Eclipse, Maven, DB Visualizer, Putty, Git, SBT
Databases: MySQL, Oracle 11g, HBase, MongoDB, NoSQL (Cassandra)
Web Development: HTML5, CSS3, JavaScript, jQuery, Bootstrap
Frameworks: Spring, jUnit, log4j
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Sr. Spark & Hadoop Developer
Roles & Responsibilities:
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers.
- Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Involved in HBASE setup and storing data into HBASE, which will be used for analysis.
- Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Scala, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera
Confidential, Denver, CO
Spark & Hadoop Developer
Roles & Responsibilities:
- Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Responsible for managing data coming from different sources.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Performed Filesystem management and monitoring on Hadoop log files.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed spark code using Scala and spark-SQL for faster testing and data processing.
- Performed masking on customer sensitive data using Flume interceptors.
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on large sets of structured, semi-structured and unstructured data.
Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Oozie, Maven, Shell Scripting, Spark, Scala, Cloudera Manager
Confidential, Boston, MA
Hadoop Developer
Roles & Responsibilities:
- Setup Hadoop cluster on Amazon EC2.
- Analyzing Hadoop cluster and different big data tools including Pig, HBase and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on installing cluster commissioning decommissioning of DataNode, NameNode recovery capacity planning and slots configuration.
- Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs
- Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
Environment: Apache Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, Java, Eclipse, MySQL, Zookeeper, Amazon EC2, SOLR
Confidential, NYC, NY
Hadoop Developer/Admin
Roles & Responsibilities:
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on RHEL. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Reviewed Hadoop Log files.
Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Maven, Shell Scripting, CDH3
Confidential, Irving, TX
Hadoop Developer
Roles & Responsibilities:
- Involved in gathering and analyzing business requirements, and designing Hadoop Stack as per the requirements.
- Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
- Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Performed masking on customer sensitive data using Flume interceptors.
- Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
- Involved in handling data in different file formats like Text, Sequence, Avro and RC File
- Wrote MapReduce jobs for data processing and the result is stored in HBase for BI reporting.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau
- Involved in development of Pig Latin, Hive QL and other Hadoop ecosystem tools for trend analysis and pattern recognition on user data.
- Developed and executed shell scripts to automate the jobs.
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Impala, Flume, HBase, Pig, Java, SQL, CDH, UNIX, Shell Scripting
Confidential
Java J2EE Developer
Roles & Responsibilities:
- Involvement in all phases of the Software Development Life Cycle (SDLC).
- Involved in the team discussions regarding the modeling, architectural and performance issues.
- Using the UML methodology, developed Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the dynamic view of the system developed in Visual Paradigm.
- Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.
- Understand the business requirement of the project and coding in accordance with the technical design document.
- Prepare High level design document as well as test cases for unit testing of project.
- Fix the bugs/defects raised during System Testing & User Acceptance Testing.
- In production support work, time factor plays an important role. Handled critical call logs in less time.
- Providing project induction training to the fresher’s on the project.
- Deftly coordinate with on-site for timely delivery of project & query resolutions
- Worked very closely with the Transaction Team who is responsible for creating visual layouts of the screen.
Environment: Java 1.2/1.3, Applet, Servlet, JSP, custom tags, JDBC, XML, HTML, CSS, JavaScript, Oracle, DB2, PL/SQL, JUnit, Log4J, RDBMS