We provide IT Staff Augmentation Services!

Spark & Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Memphis, TennesseE

SUMMARY

  • 7+ years of experience in IT, which includes experience in Bigdata Technologies, Hadoopecosystem, Data Warehousing, SQL related technologies in Retail, Manufacturing, Financial and Communication sectors.
  • 5 Years of experience in Big Data Analytics using Various Hadoop eco - systems tools and Spark Framework and currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming dialect.
  • Experience installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Sqoop, Hive, PIG, Flume, HBase, Kafka, Hue, Storm, Zoo Keeper, Oozie, Cassandra, Sqoop, Python
  • Worked with major distributions like Cloudera (CDH 3&4) & Horton works Distributions and AWS. Also worked on Unix and DWH in support for various Distributions
  • Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.X, YARN, Hive, Pig, MapReduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
  • Experience in handling large datasets using Partitions, Spark in memory capabilities, Broadcasts in Spark with Scala and python, Effective and efficient Joins, Transformations and other during ingestion process itself
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS and accomplished developing Pig Latin Scripts and using HiveQL for data analytics.
  • Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
  • Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop
  • Good experience in writing Spark applications using Scala and Java and used Scala set to develop Scala projects and executed using Spark-Submit
  • Experience working on NoSQL databases including HBase, Cassandra and MongoDB and experience using Sqoop to import data into HDFS from RDBMS and vice-versa
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Good experience in writing Sqoop queries for transferring bulk data between ApacheHadoop and structured data stores.
  • Substantial experience in writing Map Reduce jobs in Java, PIG, Flume, Zookeeper,Hive and Storm
  • Created multiple Map Reduce Jobs using Java API, Pig and Hive for data extraction
  • Strong expertise in troubleshooting and performance fine-tuning Spark, Map Reduce and Hive applications
  • Good experience on working with Amazon EMR framework for processing data on EMR and EC2 instances
  • Created AWS VPC network for the installed Instances and configured security groups and Elastic IP’s Accordingly
  • Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
  • Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
  • Worked on data warehousing and ETL tools like Informatica, Tableau, and Pentaho
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
  • Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills

TECHNICAL SKILLS

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS

Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions

Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Portals/Application servers: WebLogic, WebSphere Application server, WebSphere Portal server, JBOSS

Build Automation tools: SBT, Ant, Maven

Version Control: GIT

IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Rational Application Developer, Junit

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata.

PROFESSIONAL EXPERIENCE

Spark & Hadoop Developer

Confidential - Memphis, Tennessee

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL (Informatica) job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
  • Administered all requests and analyzed issues and provided efficient resolution for same.
  • Designed all program specifications and performed required tests in same.
  • Prepared codes for all modules according to require specification and client requirements.
  • Designed all programs and systems and associated documentation for same.
  • Prepared all program and system implementation for all informatics programs.
  • Monitor all production issues and inquiries and provide efficient resolution for same.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Eclipse, Informatica, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Hadoop/Big Data Developer

Confidential - Lowell, Arkansas

Responsibilities:

  • Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and up gradation to CDH4 from CDH3
  • Worked on creating Key space in Cassandra for saving the Spark Batch output
  • Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
  • Manage migration of on-perm servers to AWS by creating golden images for upload and deployment
  • Manage multiple AWS accounts with multiple VPC’s for both production and non-production where primary objectives are automation, build out, integration and cost control.
  • Implemented the real time streaming ingestion using Kafka and Spark Streaming
  • Loaded data using Spark-streaming with Scala and Python
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
  • Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
  • Migrated complex map reduce programs into In-memory Spark processing using Transformations and actions.
  • Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
  • Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
  • Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
  • Worked with Distributed n-tier architecture and Client/Server architecture
  • Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
  • Developed MapReduce application using Hadoop, MapReduce programming and HBase
  • Evaluated usage of Oozie for WorkFlow Orchestration and experienced in cluster coordination using Zookeeper
  • Developing ETL jobs with organization and project defined standards and processes
  • Experienced in enabling Kerberos authentication in ETL process
  • Implemented data access using Hibernate persistence framework
  • Design of GUI using Model View Controller Architecture (STRUTS Framework)
  • Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controller.

Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.

Sr.Hadoop/Spark Developer

Confidential - San Francisco,CA

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV, and TSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Monitored and maintained data supporting internal applications and reports
  • Generated and distributed report packages for Navigant departments and clients
  • Developed and maintained documentation of E.T.L. and reporting processes and controls
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Informatica, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in defining job flows and managing and reviewing Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
  • Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios
  • Implemented best income logic using Pig scripts
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript

Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, UNIX, HBase, Java, JavaScript, HTML

We'd love your feedback!