Hadoop/Spark/Scala/ Developer Resume FL - Hire IT People

SUMMARY

Over 6 years of experience in Information Technology which includes experience in Big data and HADOOP Ecosystem.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting,
Managing and reviewing data backups and Hadoop log files. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
Worked on Big Data Integration and Analytics based on Hadoop, Spark, Kafka, Storm and No - SQL databases.
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Loading streaming data using kafka and processing Using Storm.
Around 1year experience on Spark andScala.
Developed analytical components usingScala, Spark and Spark Stream
Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
Implemented different machine learning techniques in Scala using Spark machine learning library.
Developed Spark applications using Scala for easy Hadoop transitions
Developed custom aggregate functions using SparkSQL and performed interactive querying.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
Experience in streaming the data between Kafka and other databases like RDBMS and NoSQL.
Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
Used Spark with YARN and got performance results compared with MapReduce.
Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
Worked extensively with Sqoop for importing metadata from Oracle.
Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
Created Hbase tables to store various data formats of PII data coming from different portfolios.
Used Sqoop to import data from SQL server to hadoop ecosystem.
Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced environment

TECHNICAL SKILLS

Hadoop Ecosystems: HDFS, Map Reduce Hive, Pig, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume,Spark, Kafka.

Operating Systems: UNIX, Linux, Windows 2000 / NT / XP / Vista

Programming languages: Java, XML, Unix Shell scripting,Python, HTML.

Databases/technologies: Oracle 11g/10g, MS-SQL Server, DB2, MySQL, MS-Access.

PROFESSIONAL EXPERIENCE

Confidential, FL

Hadoop/Spark/Scala/ Developer

Responsibilities:

Developing scripts to perform business transformations on the data using Hive and PIG.
Developing UDFs in java for hive and pig.
Worked on reading multiple data formats on HDFS usingScala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs andScala.
Analyzed large data sets by running Hive queries and Pig scripts.
Worked with the Data Science team to gather requirements for various data mining projects
Involved in creating Hive tables, and loading and analyzing data using Hive queries.
Developed Simple to complex MapReduce Jobs using Hive and Pig.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Developed custom aggregate functions using SparkSQL and performed interactive querying.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Capturing data from existing databases that provide SQL interfaces using Sqoop.
Assisted in exporting analyzed data to relational databases using Sqoop.
Responsible for managing data from multiple sources.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
Load and transform large sets of structured, semi structured and unstructured data.
Using Apache Flume stored data in to any of the centralized stores HBase, HDFS.

Environment: Hadoop, HDFS, Pig, Hive,HBase,Spark,Scala MapReduce, Sqoop, LINUX, and Big Data

Confidential

Hadoop Developer

Responsibilities:

Worked on analyzing and Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked on debugging, performance tuning of Hive & Pig Jobs
Created Hbase tables to store various data formats of PII data coming from different portfolios
Implemented test scripts to support test driven development and continuous integration
Worked on tuning the performance Pig queries
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop
Experience working on processing unstructured data using Pig and Hive
Supported MapReduce Programs those are running on the cluster
Gained experience in managing and reviewing Hadoop log files
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Oozie, LINUX, and Big Data

Confidential, Los Angeles, CA

Hadoop Developer

Responsibilities:

Worked extensively in creating MapReduce jobs to power data for search and aggregation
Designed a data warehouse using Hive
Worked extensively with Sqoop for importing metadata from Oracle
Extensively used Pig for data cleansing
Created partitioned tables in Hive
Worked with business teams and created Hive queries for ad hoc access.
Evaluated usage of Oozie for Workflow Orchestration
Mentored analyst and test team for writing Hive Queries
Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.

Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Hortonworks, Oozie, Oracle 11g/10g.

We provide IT Staff Augmentation Services!

Hadoop/spark/scala/ Developer Resume

FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship