Sr. Hadoop Developer Resume , Madison, WA - Hire IT People

SUMMARY:

Hadoop Developer with over 6 years of professional IT experience, which includes implementing, developing and maintenance of various Web Based applications using Java, Python, J2EE Technologies and Big Data Ecosystem
Excellent knowledge in understanding hadoop architecture, HDFS, yarn and map reduce.
Hands on experience in writing map reduce jobs in hadoop ecosystems including hive, pig.
Experience in Cloudera'shadoop platforms installing, configuring, supporting and managing along with CDH3 and CDH4.
Have hands on experience in sequence files, RC files, combiners, counters, dynamic partitions, bucketing for best practice and performance improvement.
Knowledge in job/workflow scheduling and monitoring tools like Oozie and Zookeeper
Experience in working on apache hadoop open source distribution.
Designed hive queries and pig scripts to perform data analysis, data transfer and table design.
Good knowledge in understanding the concepts of Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
Proficiency in using Apache Sqoop to import and export data from HDFS with RDBMS and Hive at other end.
Highly proficient in Classic MapReduce and YARN architectures along with SQL, ETL, orchestration, and distributed processing.
Developed MapReduce jobs, Used different optimization techniques to improve performance in Map Reduce Programs.
Experience on tools like Chef, Puppet and in the deployment of Hadoop Cluster using Puppet tool.
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
Experience in implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
Hands on experience in working with Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
Experience in NoSQL databases such as HBase and Cassandra.
Experienced in job workflow scheduling tool like Oozie.
Experienced in managing Hadoop cluster using Cloudera Manager Tool.
Experienced in worked on Backend database programming using SQL, PL/SQL, Stored Procedures, Functions, Macros, Indexes, Joins, Views, Packages and Database Triggers.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.

TECHNICAL SKILLS:

IDE Tools: Eclipse, NetBeans

Programming languages: Java/J2EE, Python, Linux shell scripts, C++

Databases: Oracle MySQL, DB2, MS-SQL Server, Teradata

Web Technologies: HTML, Java Script, XML, ODBC, JDBC, JSP, Servlets, Struts, Junit, REST API, Spring, Hibernate

Visualization: MS Excel, RAW, Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Madison, WA

Sr. Hadoop Developer

Responsibilities:

Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
Experienced in implementing Spark RDD transformations, actions to implement business analysis.
Designed Data Quality Framework to perform schema validation and data profiling on Spark (pySpark).
Leveraged spark (pySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
Installed and configured cluster and was involved in setting up puppet for centralized configuration management.
Created Hive UDFs to encapsulate complex and reusable logic for the end users.
Developing predictive analytic using Apache Spark Scala APIs.
Experienced in migrating HiveQL into Impala to minimize query response time.
Designed an agent - based computational framework based on Scala, Breeze to scale computations for many simultaneous users in real-time.
Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level.
Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
Experienced with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, and Orcfile.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Implemented data injection systems by creating Kafka brokers, Java producers, Consumers, custom encoders.
Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
Configured Kafka to write the data into ElasticSearch via the dedicated consumer.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Developed Spark code using Scala and Spark-Sql Streaming for faster testing and processing of data.
Developed a data pipeline using Kafka and Strom to store data into HDFS.
Developed some utility helper classes to get data from HBase tables.
Good experience in troubleshooting performance issues and tuning Hadoop cluster.
Knowledge in Spark Core, Streaming, Data Frames and SQL, MLib, GraphX.
Implemented Caching for Spark Transformations, action to use as reusable component.
Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
Developed workflows in Oozie.
Extensively used the Hue browser for interacting with Hadoop components.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Worked on Amazon Web Services.
Developed hive and impala queries using partitioning, bucketing and windowing functions.
Proficient using version control tools like GIT, VSS, SVN and PVCS.
Cluster coordination services through Zookeeper.
Involved in agile methodologies, daily scrum meetings, spring planning's.

Confidential - Richmond, VA

Hadoop Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multipleMapReduce jobs in java for data cleaning and preprocessing.
Importing and exporting data into HDFS and Hive using Sqoop.
Experienced in defining job flows.
Experienced in managing and reviewing Hadoop log files.
Load and transform large sets of structured, semi structured and unstructured data.
Responsible to manage data coming from different sources.
Good experience with NOSQL database.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Created utility scripts using bash to standardize and automate the whole process.
Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
Work on Git repositories, version tagging and Pull Requests.
Installed and configured Hive and also written Hive UDFs.
Used Spring Framework for Dependency injection and integrated with Hibernate Framework.
Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
Implemented CDH3 Hadoop cluster on CentOS.
Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Implemented best income logic using Pig scripts.
Created a SOLR schema from the Indexer settings
Implemented SOLR index cron jobs.
Experience in writing SOLR queries for various search documents .
Load and transform large sets of structured, semi structured and unstructured data
Cluster coordination services through Zookeeper.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Confidential, Irvine, CA

Hadoop Developer

Responsibilities:

Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
Involved in data loading from external sources with Impala queries to target tables.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Performed debugging and fine-tuning in Hive & Pig for improving performance.
Used Oozie operational services for batch processing and scheduling workflows dynamically.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.
Involved in migrating HiveQL into Impala to minimize query response time.
Performed Map side joins on data in Hive to explore business insights.
Involved in forecast based on the present results and insights derived from data analysis.
Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
Implemented Spring MVC for designing and implementing the UI Layer for the application.
Worked on Spring Batch for Asynchronous processing transactions, Established efficient exception handling and logging using Spring AOP.
Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
Participated in team discussions to develop useful insights from big data processing results.

Confidential, Somerset, NJ

Jr. Java/ Hadoop Developer

Responsibilities:

Involved in Analysis, Design, Development and Testing of application modules.
Analyzed the complex relationship of system and improve performances of various screens.
Developed various user interface screens using struts framework.
Worked with spring framework for dependency injection.
Developed JSP pages, using Java Script, Jquery, AJAX for client side validation and CSS for data formatting.
Created Spring based Camel routes to create camel context objects.
Accessed and manipulated the Oracle 7.0 database environment by writing SQL queries and PL/SQL Stored procedures, functions and triggers.
Worked with SQL queries to store and retrieve the data in MS SQL server.
Written domain, mapper and DTO classes and hbm.xml files to access data from DB2 tables.
Developed various reports using Adobe APIs and Web services.
Wrote test cases using Junit and coordinated with testing team for integration tests.
Fixed bugs, improved performance using root cause analysis in production support.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Madison, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship