We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Madison, WA

SUMMARY:

  • Hadoop Developer with over 6 years of professional IT experience, which includes implementing, developing and maintenance of various Web Based applications using Java, Python, J2EE Technologies and Big Data Ecosystem
  • Excellent knowledge in understanding hadoop architecture, HDFS, yarn and map reduce.
  • Hands on experience in writing map reduce jobs in hadoop ecosystems including hive, pig.
  • Experience in Cloudera'shadoop platforms installing, configuring, supporting and managing along with CDH3 and CDH4.
  • Have hands on experience in sequence files, RC files, combiners, counters, dynamic partitions, bucketing for best practice and performance improvement.
  • Knowledge in job/workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience in working on apache hadoop open source distribution.
  • Designed hive queries and pig scripts to perform data analysis, data transfer and table design.
  • Good knowledge in understanding the concepts of Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Proficiency in using Apache Sqoop to import and export data from HDFS with RDBMS and Hive at other end.
  • Highly proficient in Classic MapReduce and YARN architectures along with SQL, ETL, orchestration, and distributed processing.
  • Developed MapReduce jobs, Used different optimization techniques to improve performance in Map Reduce Programs.
  • Experience on tools like Chef, Puppet and in the deployment of Hadoop Cluster using Puppet tool.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
  • Experience in implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
  • Hands on experience in working with Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
  • Experience in NoSQL databases such as HBase and Cassandra.
  • Experienced in job workflow scheduling tool like Oozie.
  • Experienced in managing Hadoop cluster using Cloudera Manager Tool.
  • Experienced in worked on Backend database programming using SQL, PL/SQL, Stored Procedures, Functions, Macros, Indexes, Joins, Views, Packages and Database Triggers.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.

TECHNICAL SKILLS:

IDE Tools: Eclipse, NetBeans

Programming languages: Java/J2EE, Python, Linux shell scripts, C++

Databases: Oracle MySQL, DB2, MS-SQL Server, Teradata

Web Technologies: HTML, Java Script, XML, ODBC, JDBC, JSP, Servlets, Struts, Junit, REST API, Spring, Hibernate

Visualization: MS Excel, RAW, Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Madison, WA

Sr. Hadoop Developer

Responsibilities:

  • Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
  • Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
  • Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (pySpark).
  • Leveraged spark (pySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
  • Installed and configured cluster and was involved in setting up puppet for centralized configuration management.
  • Created Hive UDFs to encapsulate complex and reusable logic for the end users.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Designed an agent - based computational framework based on Scala, Breeze to scale computations for many simultaneous users in real-time.
  • Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level.
  • Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
  • Experienced with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, and Orcfile.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Implemented data injection systems by creating Kafka brokers, Java producers, Consumers, custom encoders.
  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
  • Configured Kafka to write the data into ElasticSearch via the dedicated consumer.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Developed Spark code using Scala and Spark-Sql Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Developed some utility helper classes to get data from HBase tables.
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster.
  • Knowledge in Spark Core, Streaming, Data Frames and SQL, MLib, GraphX.
  • Implemented Caching for Spark Transformations, action to use as reusable component.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
  • Developed workflows in Oozie.
  • Extensively used the Hue browser for interacting with Hadoop components.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked on Amazon Web Services.
  • Developed hive and impala queries using partitioning, bucketing and windowing functions.
  • Proficient using version control tools like GIT, VSS, SVN and PVCS.
  • Cluster coordination services through Zookeeper.
  • Involved in agile methodologies, daily scrum meetings, spring planning's.

Confidential - Richmond, VA 

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multipleMapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Good experience with NOSQL database.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Created utility scripts using bash to standardize and automate the whole process.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Work on Git repositories, version tagging and Pull Requests.
  • Installed and configured Hive and also written Hive UDFs.
  • Used Spring Framework for Dependency injection and integrated with Hibernate Framework.
  • Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Implemented CDH3 Hadoop cluster on CentOS.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Created a SOLR schema from the Indexer settings
  • Implemented SOLR index cron jobs.
  • Experience in writing SOLR queries for various search documents .
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Confidential, Irvine, CA 

Hadoop Developer

Responsibilities:

  • Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
  • Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
  • Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
  • Involved in data loading from external sources with Impala queries to target tables.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Performed debugging and fine-tuning in Hive & Pig for improving performance.
  • Used Oozie operational services for batch processing and scheduling workflows dynamically.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
  • Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.
  • Involved in migrating HiveQL into Impala to minimize query response time.
  • Performed Map side joins on data in Hive to explore business insights.
  • Involved in forecast based on the present results and insights derived from data analysis.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Implemented Spring MVC for designing and implementing the UI Layer for the application.
  • Worked on Spring Batch for Asynchronous processing transactions, Established efficient exception handling and logging using Spring AOP.
  • Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
  • Participated in team discussions to develop useful insights from big data processing results.

Confidential, Somerset, NJ 

Jr. Java/ Hadoop Developer

Responsibilities:

  • Involved in Analysis, Design, Development and Testing of application modules.
  • Analyzed the complex relationship of system and improve performances of various screens.
  • Developed various user interface screens using struts framework.
  • Worked with spring framework for dependency injection.
  • Developed JSP pages, using Java Script, Jquery, AJAX for client side validation and CSS for data formatting.
  • Created Spring based Camel routes to create camel context objects.
  • Accessed and manipulated the Oracle 7.0 database environment by writing SQL queries and PL/SQL Stored procedures, functions and triggers.
  • Worked with SQL queries to store and retrieve the data in MS SQL server.
  • Written domain, mapper and DTO classes and hbm.xml files to access data from DB2 tables.
  • Developed various reports using Adobe APIs and Web services.
  • Wrote test cases using Junit and coordinated with testing team for integration tests.
  • Fixed bugs, improved performance using root cause analysis in production support.

We'd love your feedback!