We provide IT Staff Augmentation Services!

Sr. Spark & Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • IT Professional with 8+ years of experience in Software Development Life Cycle including Requirements Gathering, Documenting, Analysis, Development, Testing and Support.
  • Over 4 years of extensive experience as Hadoop Developer and Big Data Analyst with expertise in HDFS, Scala, Spark, MapReduce, YARN, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper.
  • Good Experience in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution
  • Expertise in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Experience with Big Data and Hadoop File System (HDFS).
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node and MapReduce concepts.
  • Experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyses large data sets efficiently.
  • Good understanding of Kafka architecture and experienced in writing spark streaming jobs in Kafka.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Experience in integrating Spark, Kafka and HBase to power real time dashboard.
  • Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
  • Hands - on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.
  • Hands on knowledge on RDD and Data Frame transformations in spark.
  • Experience with processing different file formats like Avro, Parquet, CSV, JSON and Sequence file formats using MapReduce programs and Spark.

TECHNICAL SKILLS:

Programming Languages: Java, Scala, C/C++, PL/SQL, Shell

Hadoop Ecosystem: Spark, HDFS, Map-Reduce, Hive, HBase, Kafka, Zookeeper, Sqoop, Flume, Oozie, Yarn, SOLR

Development Tools: Eclipse, Maven, DB Visualizer, Putty, Git, SBT

Databases: MySQL, Oracle 11g, HBase, MongoDB, NoSQL (Cassandra)

Web Development: HTML5, CSS3, JavaScript, jQuery, Bootstrap

Frameworks: Spring, jUnit, log4j

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Sr. Spark & Hadoop Developer

Roles & Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers.
  • Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
  • Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Involved in HBASE setup and storing data into HBASE, which will be used for analysis.
  • Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: Scala, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera

Confidential, Denver, CO

Spark & Hadoop Developer

Roles & Responsibilities:

  • Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Responsible for managing data coming from different sources.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Performed Filesystem management and monitoring on Hadoop log files.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Developed spark code using Scala and spark-SQL for faster testing and data processing.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked on large sets of structured, semi-structured and unstructured data.

Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Oozie, Maven, Shell Scripting, Spark, Scala, Cloudera Manager

Confidential, Boston, MA

Hadoop Developer

Roles & Responsibilities:

  • Setup Hadoop cluster on Amazon EC2.
  • Analyzing Hadoop cluster and different big data tools including Pig, HBase and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on installing cluster commissioning decommissioning of DataNode, NameNode recovery capacity planning and slots configuration.
  • Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.

Environment: Apache Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, Java, Eclipse, MySQL, Zookeeper, Amazon EC2, SOLR

Confidential, NYC, NY

Hadoop Developer/Admin

Roles & Responsibilities:

  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on RHEL. Assisted with performance tuning and monitoring.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Reviewed Hadoop Log files.

Environment: Apache Hadoop, HDFS, MapReduce, Hive, HBase, Sqoop, Maven, Shell Scripting, CDH3

Confidential, Irving, TX

Hadoop Developer

Roles & Responsibilities:

  • Involved in gathering and analyzing business requirements, and designing Hadoop Stack as per the requirements.
  • Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
  • Involved in handling data in different file formats like Text, Sequence, Avro and RC File
  • Wrote MapReduce jobs for data processing and the result is stored in HBase for BI reporting.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau
  • Involved in development of Pig Latin, Hive QL and other Hadoop ecosystem tools for trend analysis and pattern recognition on user data.
  • Developed and executed shell scripts to automate the jobs.

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Impala, Flume, HBase, Pig, Java, SQL, CDH, UNIX, Shell Scripting

Confidential

Java J2EE Developer

Roles & Responsibilities:

  • Involvement in all phases of the Software Development Life Cycle (SDLC).
  • Involved in the team discussions regarding the modeling, architectural and performance issues.
  • Using the UML methodology, developed Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the dynamic view of the system developed in Visual Paradigm.
  • Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.
  • Understand the business requirement of the project and coding in accordance with the technical design document.
  • Prepare High level design document as well as test cases for unit testing of project.
  • Fix the bugs/defects raised during System Testing & User Acceptance Testing.
  • In production support work, time factor plays an important role. Handled critical call logs in less time.
  • Providing project induction training to the fresher’s on the project.
  • Deftly coordinate with on-site for timely delivery of project & query resolutions
  • Worked very closely with the Transaction Team who is responsible for creating visual layouts of the screen.

Environment: Java 1.2/1.3, Applet, Servlet, JSP, custom tags, JDBC, XML, HTML, CSS, JavaScript, Oracle, DB2, PL/SQL, JUnit, Log4J, RDBMS

We'd love your feedback!