We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Richardson, TX

SUMMARY:

  • 7 years of IT industry experience with 5 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
  • 3+ years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
  • Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
  • Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
  • Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
  • Experience in data extraction and transformation using MapReduce jobs.
  • Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
  • Performed data analysis using Hive and Pig.
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Well versed with job workflow scheduling and monitoring tools like Oozie
  • Developed MapReduce jobs to automate transfer of data from HBase.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in using Sqoop, Oozie and Cloudera Manager.
  • Hands on experience in application development using RDBMS, and Linux shell scripting.
  • Have experience with working on Amazon EMR and EC2 Spot instances
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Support development, testing, and operations teams during new system deployments.
  • Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Hands on experience in Tableau to generate Hadoop data report.
  • Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
  • Possess excellent communication and analytical skills along with a can - do attitude.

TECHNICAL SKILLS:

Programming languages: C, C++, Java, Python, Scala, R

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Reporting Tools: Tableau

Web/Application Servers: Apache Tomcat, Sun Java Application Server

IDE Tools: IntelliJ, Eclipse, NetBeans

Scripting: BASH, JavaScript

Version Controls: GIT, SVN

Cloud Services: Amazon Web Services

Monitoring Tools: Nagios, Ganglia

Build Tools: Maven

PROFESSIONAL EXPERIENCE:

Senior Hadoop Developer

Confidential, Richardson, TX

Responsibilities:

  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa
  • Knowledge in converting Hive or SQL queries into Spark transformations using Python and Scala .
  • Experience in using Sequence files, RCFile, ORC, AVRO file formats ; Managing and reviewing Hadoop log files.
  • Experience in creating Sqoop jobs with incremental load to populate Hive External tables.
  • Generate a Spark-Scala application to generate a daily monitoring report which tracks the status of file ingestions and schedule run in the datalake .
  • Experience in working with Tivoli Work Scheduler and good knowledge on composing and scheduling several jobs.
  • Experience in migrating a Perl, Python and Shell scripts to Spark-Scala code to improve the performance.
  • Experience in parsing data in custom format to CSV format. Also good experience in cleaning such data using Spark jobs
  • Good experience in writing shell scripts to support additional functionality for the application.
  • Good experience in storing intermediate data and metadata in SQL, to track the status of ingestion in datalake
  • Developed Spark, Scala and shell application to track the small files in the cluster and merge them.
  • Created a statistics report that depicts the number of small files merged and current memory space occupied.
  • Experience in using SVN along with tortoise as a code repository.
  • Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
  • Firm knowledge on HDFS commands to perform basic to advanced activities.
  • Provide production support to resolve the issues for the applications deployed
  • Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
  • Experience in writing TWS schedules to run at regular intervals.
  • Knowledge on incremental import, free-form query import, export and Hadoop ecosystem integration using Sqoop.
  • Experience in working with Hortonworks Distribution.
  • Good understanding of Partitioning, Bucketing, Join optimizations and query optimizations in Hive .
  • Developed Json Parser, which converts the Json files to flat files using Spark, Scala.
  • Good experience in communicating with off-shore team with daily status calls.
  • Experience in dealing with production issues and good exposure to telecom domain.
  • Practical experience in developing Spark applications in IntelliJ with Maven.
  • Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
  • Developed test cases for Unit testing and performed integration and system testing.
  • Developed applications using ATT proprietary software by coordinating with various teams both off-shore and on-shore

Environment: Hadoop 2.7.3, Java 1.7, Spark 1.6.0, SparkSQL, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Maven, IntelliJ, UNIX Shell scripting, Oracle 11g/10g, Linux, SVN

Senior Hadoop Developer

Confidential, Cincinnati, OH

Responsibilities:

  • Strong understanding and practical experience in developing Spark applications with Scala.
  • Developed Spark scripts by using Spark shell commands as per the requirement.
  • Developed Scala scripts, UDF's using both Data frames / SQL and RDD in Spark for Data Aggregation.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
  • Experience in developing SparkSQL applications both using SQL and DSL
  • Extensively worked with parquet file format and gained practical knowledge in writing spark and hive applications to meet the parquet requirements.
  • Experience in using various compression techniques along with Parquet file format.
  • Experience in managing extensive retail datasets from Kroger and gained good experience in creating the test datasets for development purpose
  • Experience in building dimensional and fact tables using Spark Scala applications
  • Practical knowledge on writing applications in Scala to interact with the Hive through the Spark application.
  • Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning.
  • Performed POC on writing the spark applications in Scala, Python and R programming language
  • Good hands on experience with Hive to perform data queries and analysis as a part of the QA
  • Practical experience in using Pig to perform the QA by calculating the statistics of the final output.
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Experience in writing Sqoop scripts to import data from exadata to HDFS
  • Good exposure to MongoDB, it’s functionality and use-cases
  • Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows
  • Performed optimizations and performance tuning in Spark and Hive
  • Developed Unix script to automate data load into HDFS
  • Strong knowledge on HDFS commands to manage the files and also gained good understanding in managing the file system through the Spark Scala applications.
  • Extensive usage of alias for Oozie and HDFS commands
  • Experienced in managing and reviewing Hadoop log files.
  • Experience in log controlling for Spark applications and extensive use of log4j to log the respective phases of the application accordingly
  • Good knowledge on GIT commands, version tagging and pull requests
  • Performed unit testing and also integration testing after the development and participated in code reviews.
  • Experience in writing the Junit test cases for testing the Spark and SparkSQL applications
  • Practical experience with developing applications in IntelliJ and Maven
  • Good exposure to Agile environment. Participated in daily standups, Big Room Planning, Sprint meetings and Team Retrospectives
  • Interact with business analysts to understand the business requirements and translate them to technical requirements
  • Collaborate with various technical experts, architects and developers for design and implementation of technical requirements
  • Documented business requirements, technical specifications, and process flows.

Environment: Hadoop 2.6.0-cdh5.7.0, Java 1.8.0 92, Spark 1.6.0, SparkSQL, R programming, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Oracle 11g/10g, Log4j, Linux, Agile development

Senior Hadoop Developer

Confidential, Long Beach, CA

Responsibilities:

  • Involved in the review of functional and non-functional requirements.
  • Practical experience in developing Spark applications in Eclipse with Maven.
  • Strong understanding of Spark real time streaming and SparkSQL.
  • Loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Firm understanding of optimizations and performance-tuning practices while working with Spark.
  • Good knowledge on compression and serialization to improve performance in Spark applications
  • Performed interactive querying using SparkSQL.
  • Practical knowledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
  • Good knowledge on building predictive models focusing on customer service using R programming.
  • Have practical knowledge on implementing Internet Of Things (IoT)
  • Experience in reviewing and managing Hadoop log files.
  • Used Cassandra Query Language to design Cassandra database and tables with various configuration options.
  • Debug CQL queries and implement performance enhancement practices.
  • Strong knowledge on Apache Oozie for scheduling the tasks.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Experience in configuring Kafka brokers, consumers and producers for optimal performance.
  • Knowledge of creating Apache Kafka consumers and producers in Java.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Experience with GIT for version control system.
  • Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
  • Understanding technical specifications and documenting technical design documents.
  • Strong skills in Agile development and Test-Driven development.

Environment: Hadoop Cloudera Distribution(CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, MLib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Senior Hadoop Developer

Confidential, Austin, TX

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Good knowledge on implementing Image processing with Spark
  • Experience in building batch and streaming applications with Apache Spark and Python.
  • Experience in tackling parallel computing to support the Spark Machine Learning Applications.
  • Experience in deploying machine learning algorithms and models and scale them for real-time events.
  • Experienced in running Apache Pig Scripts to convert XML data to JSON data.
  • Used Scala extensively for the processing and for extracting the images.
  • Good knowledge on Dimensionality Reduction techniques in Mlib in Scala and Java
  • Understanding of matPlotlib library for displaying images and experience in extracting images as vectors.
  • Experience with Java Abstract Window Toolkit (AWT) which is used for basic image processing functions.
  • Strong understanding of mapping, search queries, filters and validating queries in ElasticSearch application.
  • Practical experience in defining queries on JSON data using Query DSL provided by ElasticSearch.
  • Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE .
  • Experience in optimizing an Hbase cluster using different Hadoop and Hbase parameters.
  • Good knowledge on Hbase data model and its operations along with various troubleshooting and maintenance techniques
  • Good understanding of data storage, replication, data scanning and data filtration in Hbase.
  • Experience in reading from and writing data to Amazon S3 in Spark Applications.
  • Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
  • Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
  • Good understanding of the internals of Kafka design, message compression and replication.
  • Experience in maintaining and operating Kafka and monitor it consistently and effectively using cluster management tools.
  • Experience in integrating Kafka with other tools for logging and packaging.
  • Experience in transferring data between HDFS and RDBMS using Sqoop.
  • Knowledge on adding and describing a third party connector in Sqoop
  • Knowledge on incremental import, free-form query import, export and Hadoop ecosystem integration using Sqoop.
  • Run machine learning Spark jobs on Hadoop using Oozie and create quick Oozie jobs using Hue.
  • Schedule Sqoop jobs through Oozie to import data from database to HDFS.

Environment: Amazon Web Services, Java 7, Hadoop 2.4.0, Spark, MLib, Python, Scala, Hbase, ElasticSearch, Apache Pig 0.12.0, Apache Hive 0.13.0, MapReduce, HDFS, Sqoop, Oozie, Kafka, Zookeeper, Maven, Eclipse, Nagios, Ganglia, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Hadoop Developer

Confidential, NJ

Responsibilities:

  • Developed several advanced Map Reduce programs to process data files received
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
  • Experience in implementing joins in the analysis of dataset to discover interesting relationships.
  • Completely involved in the requirement analysis phase.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
  • Analyzed the data by performing Hive queries and running Pig scripts.
  • Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
  • Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
  • Experience in writing cron jobs to run at regular intervals.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in managing and reviewing Hadoop log files.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.

Hadoop Developer

Confidential, San Ramon, CA

Responsibilities:

  • Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
  • Experienced on loading and transforming of large sets of structured and semi structured.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • Experience in using Redis in managing large datasets and have knowledge on scaling Redis to multiple servers.
  • Researched on Apache Hadoop and MapReduce model and its implementation
  • Experience in job chaining and job merging in Mapreduce programming.
  • Firm knowledge on HDFS commands to perform basic to advanced activities.
  • Worked on setting up Hadoop over multiple nodes and designed and developed Java mapreduce jobs.
  • Integrating with content management system database to load data to HDFS.
  • Develop basic design to solve the problem and reduce problem to many small mapreduce jobs.
  • Build and support standard-based infrastructure capable of supporting tens of thousands of computers in multiple locations.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.

Environment: Hadoop 1.0, Java, Apache Pig 0.9.2, MapReduce, HDFS, GIT, UNIX Shell scripting, MySQL, Linux.

Java Developer

Confidential

Responsibilities:

  • Involved in Analysis, Design, Implementation and Bug Fixing Activities.
  • Designing the initial Web-WAP pages for a better UI as per the requirement.
  • Involved in Functional & Technical Specification documents review and also the code review.
  • Undergone training on the Domain Knowledge.
  • Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
  • Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Prepared the Support Guide containing the complete functionality.

Env ironment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

We'd love your feedback!