Hadoop Developer Resume
Charlotte, NC
SUMMARY
- 7+ Years of experience in Information Technology Industry which includes 4+Years of experience as Hadoop/Spark Developer using Bigdata Technologies like Hadoop Ecosystem, Spark Ecosystems and 3+Years of Java/J2EE Technologies and SQL.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS, MapReduce
- Programming, Hive, Pig, Yarn, Sqoop, Flume, HBase, Impala, Oozie, Zoo Keeper, Kafka, Spark.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib and Spark Real Time Streaming.
- Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
- Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data set processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in usage of Hadoop distribution like Cloudera, Hortonworks distribution & Amazon AWS
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Good knowledge on Impala, Mahout, Spark SQL, Storm, Avro, Kafka, Hue and AWS and knowledge on IDE tools such as Eclipse, NetBeans, and Maven.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS.
- Assisted with performance tuning, monitoring and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java
- Experience in NoSQL Column-Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
- Involved in Cluster coordination services through Zookeeper.
- Experience in working with different data sources like Flat files, Avro files and Databases.
- Experience in Importing and exporting data from different databases like MySQL, Oracle into HDFS using Sqoop.
- Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and JSP.
- Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, Multi-threading,
- Serialization and deserialization.
- Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
TECHNICAL SKILLS
Hadoop Ecosystem Development: HDFS, Map Reduce, Hive, Pig, Oozie, HBase, Sqoop, Flume, Spark, Kafka
Languages: Java, C, C++, Scala
Scripting: Shell Script, Perl
Web Programming Languages: HTML, XML
Database: Oracle 11g, MySQL, SQL Server Management Studio
Operating Systems: Linux, Unix, Windows
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Hadoop developer
Environment: CDH 5.9.1, Hadoop, YARN, HDFS, Hive, Map Reduce, Sqoop, LINUX, Scala, Kafka, Flume, Spark, Crontab.
Responsibilities:
- Responsible for establishing real-time data feed which is aligned to a CSO goal to automate existing functionality, derive better insight into current data gaps and deliver data to downstream applications.
- End to end experience in Design, Development, Maintenance and Analysis of various types of applications using efficient Data Science methodologies in the Hadoop ecosystem.
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed and Implemented data integration using Sqoop, Hadoop Streaming, Map reduce and built workflows using Oozie.
- Analyzed and processed large data sets of Global Information Security (GIS) domain using Map Reduce, hive and pig.
- Developed Map reduce jobs for data cleansing and pre-processing.
- Developed Spark API to import data into HDFS from DB2 and created Hive tables.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Importing Large Data Sets from DB2 to Hive Table using Sqoop
- Used Impala for querying HDFS data to achieve better performance.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
- Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.
- Developing Spark code in Scala and Spark SQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java.
- Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
- Involved in building Data Pipelines and providing Automated Solutions to deliver analytical capabilities and enriched data to Operations team.
- Used Spark-Streaming to ingest real time data feeds from Kafka. Also worked with Spark and Hive on Spark to ingest data and create reports.
- Involved in building Data Pipelines and providing Automated Solutions to deliver analytical capabilities and enriched data to Operations team.
- Involved in the development of an application called Alerton Search, which would create automatic alerts and send out email to the teams if there is any breach of security depending upon some rules which are preset by us.
- Performed data analytics in Hive for developing and automating several metrics in the Global Information security domain.
- Responsible for establishing real-time data feed which is aligned to a CSO goal to automate existing functionality, derive better insight into current data gaps and deliver data to downstream applications.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Migrating data from multiple source systems to Hadoop distributed file system for data analysis.
- Used Hive Partitioning, bucketing and perform different types of joins on Hive Tables and implementing Hive SerDes like Avro.
- Responsible for generating GIS domain specific reports for various clients in the organization
- Scheduling Hadoop jobs for processing millions of records of text data.
- Worked on loading data from LINUX file system to HDFS
- Developed Pig scripts and UDF’s to perform data quality checks on the processed data.
- Responsible for managing data from multiple sources
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Agile development methodology used for development cycle and assisted in managing product release and Sprints using Jira
Confidential, New Jersey
Hadoop Developer
Environment: Hive, Pig, HBase, HDFS, MapReduce, MapR, Java, Scala, Spark, Elasticsearch, Sqoop, LINUX, Kafka, Flume.
Responsibilities:
- Involved in extracting the data from various sources into Hadoop HDFS for storage and processing
- Effectively used Sqoop to transfer data between different databases and HDFS
- Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
- Installed and configured various components of Hadoop Ecosystem like Job Tracker, Task Tracker, Name Node and Secondary Name Node.
- Designed and developed multiple MapReduce Jobs in Java for complex analysis.
- Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
- Moving the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL
- Created MapReduce programs to handle semi/unstructured data like xml, JSON, Avro data files and sequence files for log files.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Implemented Spark RDD Transformations, actions to migrate MapReduce algorithms.
- Used Zookeeper for providing coordinating services to the cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs
Confidential, Charlotte, NC
Hadoop developer
Environment: CDH 5.4.5, Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Crontab.
Responsibilities:
- End to end experience in Design, Development, Maintenance and Analysis of various types of applications using efficient Data Science methodologies in the Hadoop ecosystem.
- Developed and Implemented data integration using Sqoop, Hadoop Streaming, Map reduce and built workflows using Oozie.
- Analyzed and processed large data sets of Global Information Security (GIS) domain using Map Reduce, hive and pig.
- Developed Map reduce jobs for data cleansing and pre-processing.
- Performed data analytics in Hive for developing and automating several metrics in the Global Information security domain.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Migrating data from multiple source systems to Hadoop distributed file system for data analysis.
- Creating Hive tables, and loading and analyzing data using hive queries
- Used Hive Partitioning, bucketing and perform different types of joins on Hive Tables and implementing Hive SerDes like Avro.
- Responsible for generating GIS domain specific reports for various clients in the organization
- Scheduling Hadoop jobs for processing millions of records of text data.
- Worked on loading data from LINUX file system to HDFS
- Developed Pig scripts and UDF’s to perform data quality checks on the processed data.
- Responsible for managing data from multiple sources
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Agile development methodology used for development cycle and assisted in managing product release and Sprints using Jira
- Attended presentations on Apache Spark, Apache Kafka.
Confidential, MD
Hadoop developer
Environment: Hadoop, HDFS, Sqoop, Pig, Hive, Oozie, Tableau, Map Reduce
Responsibilities:
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from data-bases and also log data from servers.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were Managed or external tables.
- Developed Pig UDFs to create the schema dynamically in the JSON format and using it as Avro schema for building hive tables.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Developed UDFs in Pig and Hive
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map-reduce, Hive and Sqoop as well as system specific jobs.
- Worked with BI teams in generating the reports on Tableau.
Environment: Hadoop, HDFS, Sqoop, Pig, Hive, Oozie, Tableau, Map Reduce
Confidential, Towson, MD
Hadoop Admin/Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing
- Importing and exporting data into HDFS and Hive using Sqoop
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup and disaster recovery systems and procedures
- Installed and configured 20 node Apache Hadoop cluster.
- Used Sqoop to efficiently transfer data between MySQL and HDFS
- Developed Map Reduce jobs in Java for Data Cleansing.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Developed the Pig UDF'S to process the data for analysis.
- Designed ETL process for FTP all transactional source (oracle, MS SQL, MS access, Excel, Text) from source systems onto UNIX server, transformation and loading into Oracle Data warehouse.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Analyze existing automation scripts and tools for any missing efficiencies or gaps.
- Support internal and external teams in relation to information security initiatives
Confidential
JAVA/SQL Developer
Environment: JDBC, JUNIT, PL/SQL, Oracle, Eclipse
Responsibilities:
- Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
- Involved in designing Database Connections using JDBC.
- Developed the business components (in core Java) used for the calculation module (calculating various entitlement at-tributes).
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Prepared the Functional, Design and Test case specifications.
- Involved in writing Stored Procedures in Oracle to do some database side validations.
- Performed unit testing, system testing and integration testing
- Developed Unit Test Cases. Used JUNIT for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
Confidential
Java Developer
Environment: JAVA, MVC, Struts, Servlets, JQuery, JBoss, SQL, XML, JUnit.
Responsibilities:
- Enhancing the application by implementing new functionality according to the business requirements specified.
- Involved in Requirement analysis, Design, Review and Deployment.
- Involved in writing struts action classes and DAO classes.
- Identified and fixed issues due to incorrect exception handling.
- Developed and updated stored procedures and SQL statements.
- Extensively worked on user interface using JSP, HTML, CSS and JavaScript.
- Developed Test cases using JUnit tool.
- Involved in direct interaction with the Client on requirement analysis and approach.
- Responsible for application deployment using Jenkins or manual in all the environments and acting as a deployment manager.
- Handled product life cycle and Live support.
- Responsible for application deployment using Jenkins or manual in all the environments and acting as a deployment manager.