Spark & Hadoop Developer Resume Memphis, Tennessee - Hire IT People

SUMMARY

7+ years of experience in IT, which includes experience in Bigdata Technologies, Hadoopecosystem, Data Warehousing, SQL related technologies in Retail, Manufacturing, Financial and Communication sectors.
5 Years of experience in Big Data Analytics using Various Hadoop eco - systems tools and Spark Framework and currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming dialect.
Experience installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Sqoop, Hive, PIG, Flume, HBase, Kafka, Hue, Storm, Zoo Keeper, Oozie, Cassandra, Sqoop, Python
Worked with major distributions like Cloudera (CDH 3&4) & Horton works Distributions and AWS. Also worked on Unix and DWH in support for various Distributions
Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.X, YARN, Hive, Pig, MapReduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
Experience in handling large datasets using Partitions, Spark in memory capabilities, Broadcasts in Spark with Scala and python, Effective and efficient Joins, Transformations and other during ingestion process itself
Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS and accomplished developing Pig Latin Scripts and using HiveQL for data analytics.
Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop
Good experience in writing Spark applications using Scala and Java and used Scala set to develop Scala projects and executed using Spark-Submit
Experience working on NoSQL databases including HBase, Cassandra and MongoDB and experience using Sqoop to import data into HDFS from RDBMS and vice-versa
Developed Spark scripts by using Scala shell commands as per the requirement
Good experience in writing Sqoop queries for transferring bulk data between ApacheHadoop and structured data stores.
Substantial experience in writing Map Reduce jobs in Java, PIG, Flume, Zookeeper,Hive and Storm
Created multiple Map Reduce Jobs using Java API, Pig and Hive for data extraction
Strong expertise in troubleshooting and performance fine-tuning Spark, Map Reduce and Hive applications
Good experience on working with Amazon EMR framework for processing data on EMR and EC2 instances
Created AWS VPC network for the installed Instances and configured security groups and Elastic IP’s Accordingly
Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
Worked on data warehousing and ETL tools like Informatica, Tableau, and Pentaho
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills

TECHNICAL SKILLS

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS

Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions

Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Portals/Application servers: WebLogic, WebSphere Application server, WebSphere Portal server, JBOSS

Build Automation tools: SBT, Ant, Maven

Version Control: GIT

IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Rational Application Developer, Junit

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata.

PROFESSIONAL EXPERIENCE

Spark & Hadoop Developer

Confidential - Memphis, Tennessee

Responsibilities:

Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL (Informatica) job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
Administered all requests and analyzed issues and provided efficient resolution for same.
Designed all program specifications and performed required tests in same.
Prepared codes for all modules according to require specification and client requirements.
Designed all programs and systems and associated documentation for same.
Prepared all program and system implementation for all informatics programs.
Monitor all production issues and inquiries and provide efficient resolution for same.
Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Eclipse, Informatica, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Hadoop/Big Data Developer

Confidential - Lowell, Arkansas

Responsibilities:

Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and up gradation to CDH4 from CDH3
Worked on creating Key space in Cassandra for saving the Spark Batch output
Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
Manage migration of on-perm servers to AWS by creating golden images for upload and deployment
Manage multiple AWS accounts with multiple VPC’s for both production and non-production where primary objectives are automation, build out, integration and cost control.
Implemented the real time streaming ingestion using Kafka and Spark Streaming
Loaded data using Spark-streaming with Scala and Python
Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
Migrated complex map reduce programs into In-memory Spark processing using Transformations and actions.
Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
Worked with Distributed n-tier architecture and Client/Server architecture
Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
Developed MapReduce application using Hadoop, MapReduce programming and HBase
Evaluated usage of Oozie for WorkFlow Orchestration and experienced in cluster coordination using Zookeeper
Developing ETL jobs with organization and project defined standards and processes
Experienced in enabling Kerberos authentication in ETL process
Implemented data access using Hibernate persistence framework
Design of GUI using Model View Controller Architecture (STRUTS Framework)
Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controller.

Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.

Sr.Hadoop/Spark Developer

Confidential - San Francisco,CA

Responsibilities:

Involved in the Complete Software development life cycle (SDLC) to develop the application.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
Worked with the Data Science team to gather requirements for various data mining projects.
Worked with different source data file formats like JSON, CSV, and TSV etc.
Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
Import and export data between the environments like MySQL, HDFS and deploying into productions.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Involved in developing Impala scripts to do Adhoc queries.
Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
Monitored and maintained data supporting internal applications and reports
Generated and distributed report packages for Navigant departments and clients
Developed and maintained documentation of E.T.L. and reporting processes and controls
Involved in importing and exporting data from HBase using Spark.
Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Informatica, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Importing and exporting data into HDFS and Hive using Sqoop
Experienced in defining job flows and managing and reviewing Hadoop log files.
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and written Hive UDFs.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
Created HBase tables to store variable data formats of PII data coming from different portfolios
Implemented best income logic using Pig scripts
Load and transform large sets of structured, semi structured and unstructured data
Cluster coordination services through Zookeeper
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript

Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, UNIX, HBase, Java, JavaScript, HTML

We provide IT Staff Augmentation Services!

Spark & Hadoop Developer Resume

Memphis, TennesseE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship