Hadoop Developer Resume Houston, TX - Hire IT People

SUMMARY:

6 years of overall IT experience in a variety of industries, which includes hands on experience of 2.5+ years in Big Data technologies and designing and implementing Map Reduce.
Very good experience in complete project life cycle (design, development, testing and implementation)
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Experience in data analysis using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
Worked in large and small teams for systems requirement, design & development.
Experience of using build tools Ant and Maven.
Preparation of Standard Code guidelines, analysis and testing documentations.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hbase, Hive, Pig, Sqoop, Spark, Storm, Kafka, Oozie, Impala, MongoDB, Cassandra

Languages: C, Core Java, Unix, SQL, Python, C#, Scala

J2EE Technologies: Servlets, JSP, JDBC, Java Beans.

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE).

NoSQL Technologies: Cassandra, MongoDB, Hbase

Operating Systems: Windows XP/10, Linux, Sandbox.

Software Package: MS Office 2010.

Tools: & Utilities: Eclipse, Net Beans, My Eclipse, SVN, Git, Maven, SOAP UI, JMX explorer, XML Spy, QC, QTP, Jira

Web Servers: WebLogic, WebSphere, Apache Tomcat.

Web Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.

PROFESSIONAL EXPERIENCE:

Confidential,Richardson, TX

Hadoop Developer/Spark Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Worked extensively with Sqoop for importing metadata from Oracle.
Good Experience working with Amazon AWS for setting up Hadoop cluster.
Experience in pulling the data from Amazon s3 bucket to data lake and built hive tables on top of it and also created data frames in spark on top of that data and performed further analysis.
Involved in creating Hive tables, and loading and analyzing data using hive queries
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
Used DB Visualizer to query hive tables for better visualization.
Used Putty -SSH Client to connect remotely to the servers.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop YARN, Spark-Core, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Impala, Tableau, Talend, Cloudera, Oracle 10g, Linux

Confidential,Houston, TX

Hadoop Developer

Responsibilities:

Worked on the proof-of-concept for Apache Hadoop1.20.2 framework initiation
Installed and configured Hadoop clusters and eco-system
Developed automated scripts to install Hadoop clusters
Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validated
Performed load and retrieve unstructured data.
Developed Hive jobs to transfer 8 years of bulk data from DB2 to HDFS layer
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
Job automation framework to support & operationalize data loads
Automated the DDL creation process in hive by mapping the DB2 data types
Monitored Hadoop cluster job performance and capacity planning.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
Used AVRO, Parquet file formats for serialization of data.
Good experience with informatica power center.
Developed several test cases using MR Unit for testing Map Reduce Applications
Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
Used Bzip2 compression technique to compress the files before loading it to Hive
Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
Prepare daily and weekly project status report and share it with the client.

Environment: Hadoop, Map Reduce, Flume, Sqoop, Hive, Pig, Restful Service, Linux, Core Java, HBase, Informatica, Avro, Cloudera, MR Unit, MS-SQL Server, DB2

Confidential

Java Developer

Responsibilities:

Key responsibilities included requirements gathering, designing and developing the Java application.
Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code.
Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server.
Performed unit testing, system testing and user acceptance test.
Built Web applications using Struts MVC framework
Gathered specifications for the Library site from different departments and users of the services.
Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers.
Designed and implemented the UI using HTML and Java.
Worked on database interaction layer for insertions, updating and retrieval operations on data.

Environment: Core Java, JDBC, Struts, HTML, SQL, Oracle10g, Struts, PL/SQL, BM Rational, Eclipse IDE

Confidential

Programmer Analyst/ SQL Developer

Responsibilities:

Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
Responsible for the designing the advance SQL queries, procedure, cursor, triggers.
Build data connection to the database using MS SQL Server.
Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.

Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship