Sr. Hadoop Developer Resume
3.00/5 (Submit Your Rating)
Madison, WA
SUMMARY:
- Hadoop Developer with over 6 years of professional IT experience, which includes implementing, developing and maintenance of various Web Based applications using Java, Python, J2EE Technologies and Big Data Ecosystem
- Excellent knowledge in understanding hadoop architecture, HDFS, yarn and map reduce.
- Hands on experience in writing map reduce jobs in hadoop ecosystems including hive, pig.
- Experience in Cloudera'shadoop platforms installing, configuring, supporting and managing along with CDH3 and CDH4.
- Have hands on experience in sequence files, RC files, combiners, counters, dynamic partitions, bucketing for best practice and performance improvement.
- Knowledge in job/workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experience in working on apache hadoop open source distribution.
- Designed hive queries and pig scripts to perform data analysis, data transfer and table design.
- Good knowledge in understanding the concepts of Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
- Proficiency in using Apache Sqoop to import and export data from HDFS with RDBMS and Hive at other end.
- Highly proficient in Classic MapReduce and YARN architectures along with SQL, ETL, orchestration, and distributed processing.
- Developed MapReduce jobs, Used different optimization techniques to improve performance in Map Reduce Programs.
- Experience on tools like Chef, Puppet and in the deployment of Hadoop Cluster using Puppet tool.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Experience in implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
- Hands on experience in working with Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
- Experience in NoSQL databases such as HBase and Cassandra.
- Experienced in job workflow scheduling tool like Oozie.
- Experienced in managing Hadoop cluster using Cloudera Manager Tool.
- Experienced in worked on Backend database programming using SQL, PL/SQL, Stored Procedures, Functions, Macros, Indexes, Joins, Views, Packages and Database Triggers.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
TECHNICAL SKILLS:
IDE Tools: Eclipse, NetBeans
Programming languages: Java/J2EE, Python, Linux shell scripts, C++
Databases: Oracle MySQL, DB2, MS-SQL Server, Teradata
Web Technologies: HTML, Java Script, XML, ODBC, JDBC, JSP, Servlets, Struts, Junit, REST API, Spring, Hibernate
Visualization: MS Excel, RAW, Tableau
PROFESSIONAL EXPERIENCE:
Confidential, Madison, WA
Sr. Hadoop Developer
Responsibilities:
- Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
- Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
- Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
- Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Designed Data Quality Framework to perform schema validation and data profiling on Spark (pySpark).
- Leveraged spark (pySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
- Installed and configured cluster and was involved in setting up puppet for centralized configuration management.
- Created Hive UDFs to encapsulate complex and reusable logic for the end users.
- Developing predictive analytic using Apache Spark Scala APIs.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Designed an agent - based computational framework based on Scala, Breeze to scale computations for many simultaneous users in real-time.
- Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level.
- Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
- Experienced with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, and Orcfile.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Implemented data injection systems by creating Kafka brokers, Java producers, Consumers, custom encoders.
- Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
- Configured Kafka to write the data into ElasticSearch via the dedicated consumer.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Developed Spark code using Scala and Spark-Sql Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Developed some utility helper classes to get data from HBase tables.
- Good experience in troubleshooting performance issues and tuning Hadoop cluster.
- Knowledge in Spark Core, Streaming, Data Frames and SQL, MLib, GraphX.
- Implemented Caching for Spark Transformations, action to use as reusable component.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
- Developed workflows in Oozie.
- Extensively used the Hue browser for interacting with Hadoop components.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on Amazon Web Services.
- Developed hive and impala queries using partitioning, bucketing and windowing functions.
- Proficient using version control tools like GIT, VSS, SVN and PVCS.
- Cluster coordination services through Zookeeper.
- Involved in agile methodologies, daily scrum meetings, spring planning's.
Confidential - Richmond, VA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multipleMapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Good experience with NOSQL database.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Created utility scripts using bash to standardize and automate the whole process.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
- Work on Git repositories, version tagging and Pull Requests.
- Installed and configured Hive and also written Hive UDFs.
- Used Spring Framework for Dependency injection and integrated with Hibernate Framework.
- Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Implemented CDH3 Hadoop cluster on CentOS.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Created a SOLR schema from the Indexer settings
- Implemented SOLR index cron jobs.
- Experience in writing SOLR queries for various search documents .
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Prepared an ETL framework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Confidential, Irvine, CA
Hadoop Developer
Responsibilities:
- Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
- Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
- Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
- Involved in data loading from external sources with Impala queries to target tables.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Performed debugging and fine-tuning in Hive & Pig for improving performance.
- Used Oozie operational services for batch processing and scheduling workflows dynamically.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
- Exported the analysed data to the relational databases using Sqoop for visualization to generate reports for the BI team.
- Involved in migrating HiveQL into Impala to minimize query response time.
- Performed Map side joins on data in Hive to explore business insights.
- Involved in forecast based on the present results and insights derived from data analysis.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
- Implemented Spring MVC for designing and implementing the UI Layer for the application.
- Worked on Spring Batch for Asynchronous processing transactions, Established efficient exception handling and logging using Spring AOP.
- Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
- Participated in team discussions to develop useful insights from big data processing results.
Confidential, Somerset, NJ
Jr. Java/ Hadoop Developer
Responsibilities:
- Involved in Analysis, Design, Development and Testing of application modules.
- Analyzed the complex relationship of system and improve performances of various screens.
- Developed various user interface screens using struts framework.
- Worked with spring framework for dependency injection.
- Developed JSP pages, using Java Script, Jquery, AJAX for client side validation and CSS for data formatting.
- Created Spring based Camel routes to create camel context objects.
- Accessed and manipulated the Oracle 7.0 database environment by writing SQL queries and PL/SQL Stored procedures, functions and triggers.
- Worked with SQL queries to store and retrieve the data in MS SQL server.
- Written domain, mapper and DTO classes and hbm.xml files to access data from DB2 tables.
- Developed various reports using Adobe APIs and Web services.
- Wrote test cases using Junit and coordinated with testing team for integration tests.
- Fixed bugs, improved performance using root cause analysis in production support.