Hadoop/spark Developer Resume
Charlotte, NC
SUMMARY:
- Around 8 Years of experience in Information Technology Industry which includes 5+Years of experience as Hadoop/Spark Developer using Bigdata Technologies like Hadoop Ecosystem, Spark Ecosystems and 2+Years of Java/J2EE Technologies and SQL.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS , MapReduce Programming, Hive , Pig , Yarn , Sqoop , Flume , Hbase , Impala , Oozie , Zoo Keeper , Kafka , Spark .
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- In - depth understanding of Spark Architecture including Spark Core , Spark SQL , Data Frames , Spark Streaming , Spark MLib and Spark Real time Streaming.
- Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
- Hands on experience with AWS (Amazon Web Services) , Elastic Map Reduce (EMR) , Storage S3 , EC2 instances and Data Warehousing .
- Worked and learned a great deal from Amazon Web Services ( AWS ) Cloud services like EC2, S3, EBS
- Migrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets processing and storage , Experienced in Maintaining the Hadoop cluster on AWS EMR .
- Hands on experience in various Bigdata application phases like data ingestion , data analytics and data visualization .
- Experience in usage of Hadoop distribution like Cloudera , Hortonworks distribution & Amazon AWS
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP .
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL , RDD .
- Experience in working with flume to load the log data from multiple sources directly into HDFS .
- Very well versed in workflow scheduling and monitoring tools such as Oozie , Hue and Zookeeper.
- Good knowledge on Impala , Mahou t, SparkSQL , Storm , Avro , Kafka, Hue and AWS and knowledge on IDE tools such as Eclipse, NetBeans, and Maven.
- Installed and configured MapReduce , HIVE and the HDFS , implemented CDH5 and HDP clusters on C entOS . Assisted with performance tuning , monitoring and troubleshooting .
- Experience data processing like collecting , aggregating , moving from various sources using Apache Flume and Kafka
- Experience in manipulating the streaming data to clusters through Kafka and Spark -Streaming.
- Experience in analyzing data using HiveQL , Pig Latin , and custom MapReduce programs in Java
- Experience in NoSQL Column-Oriented Databases like Hbase , Cassandra and its Integration with Hadoop cluster .
- Involved in Cluster coordination services through Zookeeper .
- Good level of experience in Core Java , J2EE technologies as JDBC , Servlets , and JSP .
- Hands-on knowledge on core Java concepts like Exceptions , Collections , Data-structures , Multi-threading, Serialization and deserialization .
- Experience in designing the User Interfaces using HTML , CSS, JavaScript and JSP .
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams
TECHNICAL SKILLS:
Programming Languages: C, C++, Java 1.4/1.5/1.6/1.7/1.8 , Sql, Pl/Sql, JavaScript
Big Data Technologies: HDFS, Hive, Crunch, Oozie, Apache Hadoop, Spark, HIVE, PIG, Hbase, SQOOP, Oozie, Zookeeper, Spark Mahout.
Web Technologies: HTML, HTML5, XML, XHTML, CSS3, JSON, AJAX, XSD, WSDL, ExtJS
RDBMS/Databases: Oracle, MySql, PostgreSQL, SQLServer, MongoDB (NoSQL), ORACLE 8i/9i/10g, SQL Server 6.5, MS Access
Server side Frameworks and Libraries: Spring 2.5/3.0/3.2, Hibernate 3x/4x, MyBatis, Spring MVC, Spring web flow, Spring Batch, Spring Integration, Spring-WS, Struts, Jersey Restful Web services, Xfire, Apache CXF, Mule ESB, Zookeeper, Curator, Apache POI, Junit, Mockito, PowerMock, Slf4j, Log4j, Gson, Jackson, UML, Selenium, Crystal Reports
UI Frameworks and Libraries: ExtJS, JQuery, JQueryUI, AngularJS, Thymeleaf, Prime Faces, Bootstrap
Application Servers: Bea WebLogic, IBM WebSphere, Apache Tomcat
Build Tools and IDE’s: Maven, Ant, IntelliJ, Eclipse, Spring Tool Suite, NetBeans and Jenkins
Operating Systems: Windows, UNIX, SUN Solaris, Linux, Mac OS X
Tools: SVN, JIRA, Toad, SQL Developer, Serena Dimensions, Share point, Clear Case, Perforce
Process & Concepts: Agile, SCRUM, SDLC, Object-Oriented Analysis and Design, Test driven Development, Continuous Integration.
PROFESSIONAL SUMMARY:
Confidential, Charlotte, NC
Hadoop/Spark Developer
Roles/Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS , processing and analyzing the data in HDFS .
- Followed Agile & Scrum principles in developing the project
- Developed Spark API to import data into HDFS from DB2 and created Hive tables .
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive .
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Importing Large Data Sets from DB2 to Hive Table using Sqoop
- Used Impala for querying HDFS data to achieve better performance.
- Implemented Apache PIG scripts to load data from and to store data into Hive.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through SparkSQL
- Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2 , S3 , EBS , RDS and VPC .
- Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API .
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
- Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.
- Developing Spark code in Scala and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive .
- Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java .
- Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Dataframes and Scala.
- Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka .
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Scala and Python .
- Analyzed the SQL scripts and designed the solution to implement using PySpark .
- Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.
- Used Oozie workflow to co-ordinate pig and Hive Scripts.
Environment : HDFS, MapReduce, Hive, Sqoop, HBase, Oozie, Flume, Sqoop, Impala, Kafka, Zookeeper, SparkSQL, Spark Dataframes, PySpark, Scala, Amazon AWS S3, Python, Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.
Confidential, Bedford, NH
Hadoop Developer
Roles/Responsibilities:
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Node Manager, Resource Manager, Name Node, Datanode and MapReduce concepts.
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase .
- Good experience with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.
- Wrote Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume .
- Developed Java code to generate , compare & merge AVRO schema files.
- Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig and using MapReduce Programs using Java to perform various ETL , cleaning and scrubbing tasks.
- Prepared the validation report queries , executed after every ETL runs, and shared the resultant values with business users in different phases of the project.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting & used the hive optimization techniques during joins and best practices in writing hive scripts using HiveQL .
- Importing and exporting data into HDFS and Hive using Sqoop . Writing the HIVE queries to extract the data processed
- Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
- Teamed up with Architects to design Spark model for the existing MapReduce model and Migrated MapReduce models to Spark Models using Scala .
- Implemented Spark using Scala and utilizing SparkCore, Spark Streaming and SparkSQL API for faster processing of data instead of MapReduce in Java .
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce , loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS .
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce Hive, Pig, and Sqoop.
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark and Zookeeper.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup
Environment: Apache Hadoop, HDFS, MapReduce, HBase, Hive, Yarn, Pig, Sqoop, Flume, Zookeeper, Kafka, Impala, SparkSQL, Spark Core, Spark Streaming, NoSQL, MySQL, Cloudera, Java, JDBC, Spring, ETL, WebLogic, Web Analytics, Avro, Cassandra, Oracle, Shell Scripting, Ubuntu.
Confidential, Cambridge, MA
Hadoop Developer
Roles and Responsibilities:
- Installed and configured various components of Hadoop Ecosystem like Job Tracker , Task Tracker , Name Node and Secondary Name Node .
- Designed and developed multiple MapReduce Jobs in Java for complex analysis.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume .
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Analyzing data using HiveQL, Pig Latin , and custom Map Reduce programs in Java.
- Involved in creating Hive tables, Pig tables , and loading data and writing hive queries and pig scripts.
- Moving the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS .
- Good experience in Hive partitioning , bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX , JSON and Avro.
- Developed data pipeline using Flume , Sqoop , Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL
- Created MapReduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Involved in loading and transforming large sets of Structured , Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts .
- Implemented SparkRDD Transformations , actions to migrate MapReduce algorithms.
- Used Zookeeper for providing coordinating services to the cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs
Environment: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Scala, HDFS, MapReduce, Hive, HBase, Oozie, Pig, Sqoop, Flume, Zookeeper, Kafka, Scala, Python, Java, JSON, Oracle, SQL, Avro
Confidential, Irving, TX
Hadoop Developer
Responsibilities:
- Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
- Loading datasets from two different sources like Oracle, MySQL to HDFS and Hive respectively.
- Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
- Importing of data from various data sources, performed transformations, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
- Involved in configuring batch job to perform ingestion of the source files in to the Data Lake.
- Developed Pig queries to load data to HBase.
- Leveraged Hive queries to create ORC tables.
- Developed HIVE scripts for analyst requirements for analysis.
- Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
- Created and altered HBase tables on top of data residing in Data Lake.
- Created Views from Hive Tables on top of data residing in Data Lake. Created Reports with different Selection Criteria from Hive Tables on the data residing in Data Lake.
- Worked closely with scrum master and team to gather information and perform daily activities.
- Worked with Systems Analyst and business users to understand requirements Environment: CDH,Hadoop, MapReduce, HDFS, Hive, Sqoop
Environment: Hadoop, Mainframe, Oracle, Linux, Hive, HDFS, DMX-h, Sqoop, Autosys, Spark, Scala