Sr. Hadoop Developer Resume
Boston, MA
SUMMARY:
- 8+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
- Good working knowledge on Data Transformations and Loading using Export and Import.
- Hands on experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Used different Hive Serde's like Regex Serde and HBase Serde.
- Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Hands on experience in writing Spark SQL scripting.
- Sound knowledge in programming Spark using Scala.
- Good understanding in processing of real-time data using Spark.
- Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Developed small distributed applications in our projects using Zookeeper and scheduled the work flows using Oozie.
- Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation.
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Expertise writing custom UDFs for extending Hive and Pig core functionality.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing. Predictive analytic using Apache Spark Scala APIs.
- Knowledge on installing, configuring and using Hadoop components like Hadoop Map Reduce (MR1), YARN (MR2), HDFS, Hive, Pig, Flume and Sqoop.
- Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration
- Extensively used Informatica Power Center for Extraction, Transformation and Loading process.
- Experience in Dimensional Data Modeling using Star and Snow Flake Schema.
- Worked on reusable code known as Tie outs to maintain the data consistency.
- More than 4 years of experience in JAVA, J2EE, Web Services, SOAP, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Hands on JAXWS, JSP, Servlets, Struts, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, Unix, WSDL, XML, HTML, AWS and Scala and Vertica.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Experience in complete project life cycle of Client Server and Web applications.
- Good understanding of Data Mining and Machine Learning techniques.
- Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Developer
Confidential, Boston, MA
Responsibilities:
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Real time streaming the data using Spark Streaming with Kafka
- Developed Spark scripts by using Scala as per the requirement.
- Load the data intoSpark RDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Used Angular JS for data-binding, and Node JS for back-end support with APIs.
- Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
- Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
- Responsible to manage data coming from various sources.
- Installed and configured Hive and also written Hive UDFs.
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Installed and configured Hadoop Mapreduce, HDFS.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end jQuery Ajax calls.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writting Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using Hive QL.
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Designed and implemented Mapreduce based large-scale parallel relation-learning system.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java, node.js,Oozie, HBase, Kafka, Spark, Scala, Eclipse, Linux, Oracle, Teradata.
Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Created dynamic end to end REST API with Loopback-Node JS Framework.
- Experienced in managing and reviewing Hadoop log files.
- Maintenance of all the services in Hadoop ecosystem using ZOOKEPER.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from various sources.
- Got good experience with NOSQL database such as HBase
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Written the programs in Spark using Scala and used RDD for transformations and performed actions on them.
Environment: Java 6, Eclipse, Oracle 10g, Linux Red Hat. Linux, MapReduce, Node Js, HDFS, Oozie,Hive, Java (JDK 1.6), MapReduce, Spark, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, Elastic, Flume, Cloudera, UNIX Shell Scripting.
Hadoop Developer
Confidential, St. Louis, MO
Responsibilities:
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Used GZIP with AWS Cloud front to forward compressed files to destination node/instances.
- Implemented using SCALA and SQL for faster testing and processing of data. Real-time streaming the data using with KAFKA.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Agile methodology projects extensively.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Installing, Upgrading and Managing Hadoop Clusters.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Implemented anaylytical platform thas used HiveQL functions and different kind of join operations like Map joins, Bucketed Map joins.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanismsLZO, snappy.
- Processed the source data to structured data and store in NoSQL database Cassandra.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Developed Service (EJB) components for middle tier and implementation of business logic using J2EE Design patterns on Web Logic App Server
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Environment: Map Reduce, HDFS, Hive, EJB 3,Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Scala, Zookeeper, J2EE, Eclipse, Cassandra.
Hadoop Developer
Confidential, Bridgeton, NJ
Responsibilities:
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine Tune Cluster
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- The plugin also provided data locality for Hadoop across host nodes and virtual machines
- Wrote data ingesters and map reduce program.
- Developed map Reduce jobs to analyze data and provide heuristics reports
- Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data set.
- Extensive data validation using HIVE and also written Hive UDF.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce.
- Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Experienced with different scripting language like Python and shell scripts. lots of scripting (python and shell) to provision and spin up virtualized hadoop clusters.
- Adding, Decommissioning and rebalancing node.
- Worked on HBase Java API to populate operational HBase table with Key value.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
- Applying Patches and Perform Version Upgrades.
- Incident Management, Problem Management and Change Managemen.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Schedule Map Reduce Jobs -FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoo
- Integration with RDBMS using swoop and JDBC Connector
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Avro, Storm, Cloudera Pig Hive HBase Flume Sqoop, Cassandra, NOSQL
SQL Developer
Confidential
Responsibilities:
- Involved in Creation of tables, indexes, sequences, constraints and created stored procedures and triggers which were used to implement business rules.
- Installation of SQL Server on Development and Production Servers, setting up databases, users, roles and permissions.
- Extensively involved in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Provided documentation about database/data warehouse structures and Updated functional specification and technical design documents.
- Designed and created different ETL packages using SSIS and transfer data from heterogeneous database different files format Oracle, SQL Server, and Flat File to SQL server destination.
- Worked on several transformations in Data Flow including Derived column, Slowly Changing Dimension Using SSIS Controls, Lookup, Fuzzy Lookup, Data Conversion, Conditional split and many more.
- Created various reports with drilldowns, drill through, calculated members, and drilldowns reports by using SQL Server Reporting Services
- Used various report items like tables, sub report and charts to develop the reports in SSRS and upload into Report Manager
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, SQL joins and other T- SQL code to implement business rules
- Used Performance Monitor and SQL Profiler to optimize queries and enhance the performance of database servers.
Environment: MS SQL Server 2012/2008R2/2008, T- SQL, SQL Server Reporting Services (SSRS), SSIS, SSAS, Business Intelligence Development Studio (BIDS), MS Excel, Visual Source Team Foundation Server, VB Script
Java developer
Confidential
Responsibilities:
- Develop the complete website for the company from the scratch and deploy the same
- Involved in requirements gathering.
- Designed and developed user interface using HTML, CSS and JavaScript.
- Designed HTML screens with JSP for the front-end.
- Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
- Designed, Created and maintained database using MySQL
- Made JDBC calls from the Servlets to the Database to store the user details
- Java Script was used for client-side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used Eclipse for project building
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Used WebLogic to deploy applications on local and development environments of the application.
- Debugged and fixed the errors
- Implemented and supported the project through development, Unit testing phase into production environment.
- Involved in documenting the application.
- Designed HTML screens with JSP for the front-end.
- Made JDBC calls from the Servlets to the Database.
- Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
- Formatting the results from the Database as HTML reports to the client.
- Java Script was used for client-side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used WebLogic to deploy applications on local and development environments of the application.
- Used Eclipse for building the application.
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Implemented and supported the project through development, Unit testing phase into production environment.
- Used PVCS Version manager for source control and PVCS Tracker for change control management
- Implemented Test First unit testing framework driven using Junit.
Environment: Java, JSP, Servlets, JDBC, Java Script, HTML, CSS, WebLogic, Eclipse and Test DirectorRY