Sr. Hadoop Developer Resume Boston, MA - Hire IT People

SUMMARY:

8+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application.
Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
Good working knowledge on Data Transformations and Loading using Export and Import.
Hands on experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
Used different Hive Serde's like Regex Serde and HBase Serde.
Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
Hands on experience in writing Spark SQL scripting.
Sound knowledge in programming Spark using Scala.
Good understanding in processing of real-time data using Spark.
Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
Developed small distributed applications in our projects using Zookeeper and scheduled the work flows using Oozie.
Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation.
Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
Expertise writing custom UDFs for extending Hive and Pig core functionality.
Hands on dealing with log files to extract data and to copy into HDFS using flume.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing. Predictive analytic using Apache Spark Scala APIs.
Knowledge on installing, configuring and using Hadoop components like Hadoop Map Reduce (MR1), YARN (MR2), HDFS, Hive, Pig, Flume and Sqoop.
Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration
Extensively used Informatica Power Center for Extraction, Transformation and Loading process.
Experience in Dimensional Data Modeling using Star and Snow Flake Schema.
Worked on reusable code known as Tie outs to maintain the data consistency.
More than 4 years of experience in JAVA, J2EE, Web Services, SOAP, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, computer proficiency and ability to follow through with projects from inception to completion.
Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
Hands on JAXWS, JSP, Servlets, Struts, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, Unix, WSDL, XML, HTML, AWS and Scala and Vertica.
Developed applications using Java, RDBMS, and Linux shell scripting.
Experience in complete project life cycle of Client Server and Web applications.
Good understanding of Data Mining and Machine Learning techniques.
Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.

PROFESSIONAL EXPERIENCE:

Sr. Hadoop Developer

Confidential, Boston, MA

Responsibilities:

Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
Real time streaming the data using Spark Streaming with Kafka
Developed Spark scripts by using Scala as per the requirement.
Load the data intoSpark RDD and performed in-memory data computation to generate the output response.
Performed different types of transformations and actions on the RDD to meet the business requirements.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
Involved in loading data from UNIX file system to HDFS.
Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Implemented best offer logic using Pig scripts and Pig UDFs.
Used Angular JS for data-binding, and Node JS for back-end support with APIs.
Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
Responsible to manage data coming from various sources.
Installed and configured Hive and also written Hive UDFs.
Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
Cluster coordination services through Zookeeper.
Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Installed and configured Hadoop Mapreduce, HDFS.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Installed and configured Pig.
Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end jQuery Ajax calls.
Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
Involved in managing and reviewing Hadoop log files.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Responsible for writting Hive queries for data analysis to meet the business requirements.
Responsible for creating Hive tables and working on them using Hive QL.
Responsible for importing and exporting data into HDFS and Hive using Sqoop.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Designed and implemented Mapreduce based large-scale parallel relation-learning system.
Involved in scheduling Oozie workflow engine to run multiple Hive jobs

Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java, node.js,Oozie, HBase, Kafka, Spark, Scala, Eclipse, Linux, Oracle, Teradata.

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

Involved in review of functional and non-functional requirements.
Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
Facilitated knowledge transfer sessions.
Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
Importing and exporting data into HDFS and Hive using Sqoop.
Experienced in defining job flows.
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
Created dynamic end to end REST API with Loopback-Node JS Framework.
Experienced in managing and reviewing Hadoop log files.
Maintenance of all the services in Hadoop ecosystem using ZOOKEPER.
Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Load and transform large sets of structured, semi structured and unstructured data.
Responsible to manage data coming from various sources.
Got good experience with NOSQL database such as HBase
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
Designed and implemented Mapreduce-based large-scale parallel relation-learning system
Written the programs in Spark using Scala and used RDD for transformations and performed actions on them.

Environment: Java 6, Eclipse, Oracle 10g, Linux Red Hat. Linux, MapReduce, Node Js, HDFS, Oozie,Hive, Java (JDK 1.6), MapReduce, Spark, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, Elastic, Flume, Cloudera, UNIX Shell Scripting.

Hadoop Developer

Confidential, St. Louis, MO

Responsibilities:

Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Used GZIP with AWS Cloud front to forward compressed files to destination node/instances.
Implemented using SCALA and SQL for faster testing and processing of data. Real-time streaming the data using with KAFKA.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
Worked on Agile methodology projects extensively.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
Installing, Upgrading and Managing Hadoop Clusters.
Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
Implemented anaylytical platform thas used HiveQL functions and different kind of join operations like Map joins, Bucketed Map joins.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Extensively worked on creating End-End data pipeline orchestration using Oozie.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanismsLZO, snappy.
Processed the source data to structured data and store in NoSQL database Cassandra.
Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
Developed Service (EJB) components for middle tier and implementation of business logic using J2EE Design patterns on Web Logic App Server
Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.

Environment: Map Reduce, HDFS, Hive, EJB 3,Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Scala, Zookeeper, J2EE, Eclipse, Cassandra.

Hadoop Developer

Confidential, Bridgeton, NJ

Responsibilities:

Installation and Configuration of Hadoop Cluster
Working with Cloudera Support Team to Fine Tune Cluster
Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
The plugin also provided data locality for Hadoop across host nodes and virtual machines
Wrote data ingesters and map reduce program.
Developed map Reduce jobs to analyze data and provide heuristics reports
Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data set.
Extensive data validation using HIVE and also written Hive UDF.
Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce.
Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
Experienced with different scripting language like Python and shell scripts. lots of scripting (python and shell) to provision and spin up virtualized hadoop clusters.
Adding, Decommissioning and rebalancing node.
Worked on HBase Java API to populate operational HBase table with Key value.
Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
Applying Patches and Perform Version Upgrades.
Incident Management, Problem Management and Change Managemen.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
Schedule Map Reduce Jobs -FIFO and FAIR share
Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoo
Integration with RDBMS using swoop and JDBC Connector
Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.
Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Avro, Storm, Cloudera Pig Hive HBase Flume Sqoop, Cassandra, NOSQL

SQL Developer

Confidential

Responsibilities:

Involved in Creation of tables, indexes, sequences, constraints and created stored procedures and triggers which were used to implement business rules.
Installation of SQL Server on Development and Production Servers, setting up databases, users, roles and permissions.
Extensively involved in SQL joins, sub queries, tracing and performance tuning for better running of queries
Provided documentation about database/data warehouse structures and Updated functional specification and technical design documents.
Designed and created different ETL packages using SSIS and transfer data from heterogeneous database different files format Oracle, SQL Server, and Flat File to SQL server destination.
Worked on several transformations in Data Flow including Derived column, Slowly Changing Dimension Using SSIS Controls, Lookup, Fuzzy Lookup, Data Conversion, Conditional split and many more.
Created various reports with drilldowns, drill through, calculated members, and drilldowns reports by using SQL Server Reporting Services
Used various report items like tables, sub report and charts to develop the reports in SSRS and upload into Report Manager
Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, SQL joins and other T- SQL code to implement business rules
Used Performance Monitor and SQL Profiler to optimize queries and enhance the performance of database servers.

Environment: MS SQL Server 2012/2008R2/2008, T- SQL, SQL Server Reporting Services (SSRS), SSIS, SSAS, Business Intelligence Development Studio (BIDS), MS Excel, Visual Source Team Foundation Server, VB Script

Java developer

Confidential

Responsibilities:

Develop the complete website for the company from the scratch and deploy the same
Involved in requirements gathering.
Designed and developed user interface using HTML, CSS and JavaScript.
Designed HTML screens with JSP for the front-end.
Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
Designed, Created and maintained database using MySQL
Made JDBC calls from the Servlets to the Database to store the user details
Java Script was used for client-side validation.
Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
Used Eclipse for project building
Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
Used WebLogic to deploy applications on local and development environments of the application.
Debugged and fixed the errors
Implemented and supported the project through development, Unit testing phase into production environment.
Involved in documenting the application.
Designed HTML screens with JSP for the front-end.
Made JDBC calls from the Servlets to the Database.
Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
Formatting the results from the Database as HTML reports to the client.
Java Script was used for client-side validation.
Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
Used WebLogic to deploy applications on local and development environments of the application.
Used Eclipse for building the application.
Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
Implemented and supported the project through development, Unit testing phase into production environment.
Used PVCS Version manager for source control and PVCS Tracker for change control management
Implemented Test First unit testing framework driven using Junit.

Environment: Java, JSP, Servlets, JDBC, Java Script, HTML, CSS, WebLogic, Eclipse and Test DirectorRY

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship