Hadoop/spark Developer Resume
Charlotte, NC
SUMMARY:
- Around 8 Years of experience in Information Technology Industry which includes 5+Years of experience as Hadoop/Spark Developer using Bigdata Technologies like Hadoop Ecosystem, Spark Ecosystems and 2+Years of Java/J2EE Technologies and SQL.
- Hands on experience in installing, configuring and using Hadoopecosystem components like HDFS, MapReduce Programming, Hive, Pig, Yarn, Sqoop, Flume, Hbase, Impala, Oozie, Zoo Keeper, Kafka, Spark.
- In depth understanding of HadoopArchitecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- In - depth understanding ofSpark Architecture includingSparkCore,Spark SQL, Data Frames, Spark Streaming,SparkMLib and Spark Real time Streaming.
- Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
- Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in usage of Hadoop distribution like Cloudera, Hortonworks distribution & Amazon AWS
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Good knowledge on Impala, Mahout, SparkSQL, Storm, Avro, Kafka, Hue and AWS and knowledge on IDE tools such as Eclipse, NetBeans, and Maven.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS. Assisted with performance tuning, monitoring and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java
- Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its Integration with Hadoopcluster.
- Involved in Cluster coordination services through Zookeeper.
- Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and JSP.
- Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, Multi-threading, Serialization and deserialization.
- Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
TECHNICAL SKILLS:
Languages: Java, Python, Scala, SQL, HiveQL, NoSQL, Piglatin
Hadoop Ecosystem: HDFS, Hive, Map Reduce, HBase, Yarn, Sqoop, Flume, Oozie, Zookeeper, Impala, Avro
Databases: Oracle, RDBMS,DB2,SQL Server, MySQL
NoSQL Databases: HBase, MongoDB, Cassandra
Scripting Languages: JavaScript, AJAX, CSS, Python, Perl, Unix Shell Script
Programming Languages: C, C++, C#, Java, J2EE, JDBC, Python, Scala, Shell Scripting, PL/SQL, Android, Unix
Java Languages: Java,J2EE, JDBC, Servlets, JSP, JSTL, JavaBeans, XMLParsers, EJB, Hibernate, Struts
Web Technologies: Servlets, HTML, JavaScript
Web Servers: Web Logic, Web Sphere, Apache Tomcat, JBOSS.
Web Services: SOAP, Restful API, WSDL
Operating Systems: Windows XP/Vista/7/8, Linux, Unix, Ubuntu
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop/Spark Developer
Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Followed Agile & Scrum principles in developing the project
- Developed SparkAPI to import data into HDFS from DB2 and created Hive tables.
- Used SparkAPI over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Importing Large Data Sets from DB2 to Hive Table using Sqoop
- Used Impala for querying HDFS data to achieve better performance.
- Implemented Apache PIG scripts to load data from and to store data into Hive.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive onSparkand some through SparkSQL
- Imported data from AWS S3 and into SparkRDD and performed transformations and actions on RDD's.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
- Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.
- Developing Spark code in Scala and SparkSQL environment for faster testing and processing of data and Loading the data into SparkRDD and doing In-memory computation to generate the output response with less memory usage.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- Utilized SparkCore,Spark Streaming andSparkSQL API for faster processing of data instead of using MapReduce in Java.
- Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines UsingSpark, MapReduce, Pig, and Hive.
- Involved in converting Hive/SQL queries intoSparktransformations using Spark Dataframes and Scala.
- Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
- DevelopedSparkprograms with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.
- Used Oozie workflow to co-ordinate pig and Hive Scripts.
Environment: HDFS,MapReduce, Hive, Sqoop, HBase, Oozie, Flume, Sqoop, Impala, Kafka, Zookeeper, SparkSQL, Spark Dataframes, PySpark, Scala, Amazon AWS S3, Python, Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.
Confidential, Bedford, NH
Sr.Hadoop Developer
Responsibilities:
- In depth understanding/knowledge of HadoopArchitecture and various components such as HDFS, Application master, Node Manager, Resource Manager, Name Node, Datanode and MapReduce concepts.
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase.
- Good experience with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.
- Wrote Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Used Spark API over ClouderaHadoopYARN to perform analytics on data in Hive.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed Java code to generate, compare & merge AVRO schema files.
- Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig and using MapReduce Programs using Java to perform various ETL, cleaning and scrubbing tasks.
- Prepared the validation report queries, executed after every ETL runs, and shared the resultant values withbusiness users in different phases of the project.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting & used the hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
- Importing and exporting data into HDFS and Hive using Sqoop. Writing the HIVE queries to extract the data processed
- Developing and running Map-Reduce Jobs on YARN and Hadoopclusters to produce daily and monthly reports as per user's need.
- Teamed up with Architects to design Spark model for the existing MapReduce model and Migrated MapReduce models to Spark Models using Scala.
- Implemented Spark using Scala and utilizing SparkCore, Spark Streaming and SparkSQL API for faster processing of data instead of MapReduce in Java.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data usingSparkSQL
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Used Oozie workflow engine to manage interdependent Hadoopjobs and to automate several types of Hadoopjobs such as Java MapReduce Hive, Pig, and Sqoop.
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark and Zookeeper.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup
Environment: Apache Hadoop, HDFS, MapReduce, HBase, Hive, Yarn, Pig, Sqoop, Flume, Zookeeper, Kafka, Impala, SparkSQL, Spark Core, Spark Streaming, NoSQL, MySQL, Cloudera, Java, JDBC, Spring, ETL, WebLogic, Web Analytics, Avro, Cassandra, Oracle, Shell Scripting, Ubuntu.
Confidential, Cambridge, MA
Hadoop Developer
Responsibilities:
- Installed and configured various components of HadoopEcosystem like Job Tracker, Task Tracker, Name Node and Secondary Name Node.
- Designed and developed multiple MapReduce Jobs in Javafor complex analysis.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
- Moving the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data usingSparkSQL
- Created MapReduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Implemented SparkRDD Transformations, actions to migrate MapReduce algorithms.
- Used Zookeeper for providing coordinating services to the cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and the classifier using MapReduce jobs, Pig jobs and Hive jobs
Environment: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Scala, HDFS, MapReduce, Hive, HBase, Oozie, Pig, Sqoop, Flume, Zookeeper, Kafka, Scala, Python, Java, JSON, Oracle, SQL, Avro
Confidential
Java / SQL Developer
Responsibilities:
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Developed applications in environments of Waterfallmethodologies.
- Designed a Web application using Web API with Angular JS and populated data using java entity framework
- Developed the GUIs using HTML, CSS, JSP and AngularJS framework Components.
- Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of the application
- Used Struts Framework to design actions, action forms, and related configuration for every use-case
- Used SOAP for the data exchange between the backend and user interface.
- Implemented application servers like Apache Tomcat, Web Sphere and Web Logic in project based on the requirement.
- Used Web sphere Application Server for deploying the application.
- Used SQLqueries to perform backend testing on the database.
- Created database access layer using JDBC and SQLstored procedures
- Worked on Javabased connectivity of client requirement on JDBC connection.
- Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC Template, Builder and Factory Patterns.
- Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create andmaintain the database, roles, users, tables, views, procedures and triggers.
- Utilized Javaand MySQL from day to day to debug and fix issues with client processes.
- Used JIRA tracking tool to manage and track the issues reported by QA and prioritize and take action based on the severity
- WroteSQL statements Stored procedures and functions that are called in Java.
- Extensively used Core Javasuch as Multithreading, Exceptions, and Collections.
- Hands on experience using JBOSS for the purpose of EJB and JTA, and for caching and clustering purposes.
- Generated server side SQL scripts for data manipulation and validation and materialized views.
Environment: Java, JSP, HTML, CSS, RAD, JDBC, AJAX JavaScript, Struts, Servlets, Apache Tomcat, Web Logic, Web Sphere, SOAP, JBoss, PL/SQL, Eclipse, JavaScript,, EJB, XML, Windows XP, LINUX, ANT, Eclipse.
Confidential
SQL Developer
Responsibilities:
- Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
- Created Database objects like Stored Procedures, Functions, Packages, Triggers, Indexes and Views using T-SQL
- Performed data conversions from flat files into a normalized database structure
- Created database maintenance planner for the performance of SQLServer, which covers Database integrity checks, update Database statistics and re-indexing.
- Created and maintained dynamic websites using HTML, CSS, Jquery, JavaScript
- Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc) and then loaded the data into destination tables by performing different kinds of transformations using SSIS/DTS packages.
- Created several SSIS packages for performing ETL operations to transform data from a cube using MDX
- Mostly worked on Installation, configuration, development, maintenance, administration and upgrade.
- Participated in maintaining and modifying tables and constraints for Premium Database using MSSQLServer
- Migrating data from different data sources toSQLserver database using SSIS.
- Performed Unit Testing and Tuned SQLstatements using Indexes and Stored Procedures
- Created several SSIS packages for performing ETL operations to transform data from OLTP to OLAP systems.
- Built SSIS packages to load data to OLAP Environment and monitoring the ETL Package Job.
- Developed custom reports like Sub Reports, Matrix Reports, Charts, and Drill down reports using SQLServer Reporting Services (SSRS) to review score cards, business trends based on the data from different locations.
- Created various kinds of reports involving Drill Down, Drill through Report, Parameterized Reports and Ad-hoc Reports.
- Created checkpoints and configuration files in SSIS packages, Experienced in slowly changing dimension in SSIS packages.
- Responsible for backup, restore systems and other databases as per requirements and also scheduledthose backups
- Developed and deployed packages in SSIS, imported data on daily basis from the OLTP system, Staging area to Data Warehouse and Data Marts.
Environment: MS SQL Server 2005,SQLIntegration Services (SSIS), SSRS, Data Transformation Services (DTS), T-SQL, Visual Studio 2008, Windows 2007 Enterprise, MS Office2007.