Hadoop/spark/scala/ Developer Resume
FL
SUMMARY
- Over 6 years of experience in Information Technology which includes experience in Big data and HADOOP Ecosystem.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting,
- Managing and reviewing data backups and Hadoop log files. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Worked on Big Data Integration and Analytics based on Hadoop, Spark, Kafka, Storm and No - SQL databases.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Loading streaming data using kafka and processing Using Storm.
- Around 1year experience on Spark andScala.
- Developed analytical components usingScala, Spark and Spark Stream
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Implemented different machine learning techniques in Scala using Spark machine learning library.
- Developed Spark applications using Scala for easy Hadoop transitions
- Developed custom aggregate functions using SparkSQL and performed interactive querying.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
- Experience in streaming the data between Kafka and other databases like RDBMS and NoSQL.
- Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
- Used Spark with YARN and got performance results compared with MapReduce.
- Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Used Sqoop to import data from SQL server to hadoop ecosystem.
- Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced environment
TECHNICAL SKILLS
Hadoop Ecosystems: HDFS, Map Reduce Hive, Pig, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume,Spark, Kafka.
Operating Systems: UNIX, Linux, Windows 2000 / NT / XP / Vista
Programming languages: Java, XML, Unix Shell scripting,Python, HTML.
Databases/technologies: Oracle 11g/10g, MS-SQL Server, DB2, MySQL, MS-Access.
PROFESSIONAL EXPERIENCE
Confidential, FL
Hadoop/Spark/Scala/ Developer
Responsibilities:
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Developing UDFs in java for hive and pig.
- Worked on reading multiple data formats on HDFS usingScala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs andScala.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed custom aggregate functions using SparkSQL and performed interactive querying.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Responsible for managing data from multiple sources.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Using Apache Flume stored data in to any of the centralized stores HBase, HDFS.
Environment: Hadoop, HDFS, Pig, Hive,HBase,Spark,Scala MapReduce, Sqoop, LINUX, and Big Data
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing and Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Created Hbase tables to store various data formats of PII data coming from different portfolios
- Implemented test scripts to support test driven development and continuous integration
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Oozie, LINUX, and Big Data
Confidential, Los Angeles, CA
Hadoop Developer
Responsibilities:
- Worked extensively in creating MapReduce jobs to power data for search and aggregation
- Designed a data warehouse using Hive
- Worked extensively with Sqoop for importing metadata from Oracle
- Extensively used Pig for data cleansing
- Created partitioned tables in Hive
- Worked with business teams and created Hive queries for ad hoc access.
- Evaluated usage of Oozie for Workflow Orchestration
- Mentored analyst and test team for writing Hive Queries
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Hortonworks, Oozie, Oracle 11g/10g.