We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

3.00/5 (Submit Your Rating)

MN

SUMMARY

  • 6+ years of professional experience in designing, developing, debugging Web - based as well as Enterprise applications using OOA, OOD, OOPS and JAVA/J2EE technologies.
  • Over 4 years' experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop Distributed file system and Parallel processing implementation, Hadoop Ecosystems HDFS, YARN, Map Reduce, Hive, Pig, Python, HBase, Sqoop, Hue, Oozie, Impala, Spark.
  • Built and Deployed Industrial scale Data Lake on premise and Cloud platforms.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files.
  • Extensively worked on Spark Core, Numeric RDD's, Pair RDD's, Data Frames, and Caching for developing Spark applications
  • Expertise in deployment of Hadoop, Yarn, Spark integration with Cassandra, etc.
  • Experience and Expertise in ETL, Data analysis and designing data warehouse strategies.
  • Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
  • Industrial experience in creating applications in Python, Java, Scala, Java Script (AngularJS, NodeJS and SQL Server ).
  • Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java. Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
  • Experience in using Python's packages like xlrd, numpy, pandas, scipy, scikit-learn and IDEs - PyCharm, Spyder, Anaconda, Jupyter, IPython.
  • Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
  • Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Experienced on Apache Flink for streaming and batch processing at runtime. Integrated various storage systems to process their data using Apache Flink.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Experience in converting MapReduce applications to Spark.
  • Knowledge on handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
  • Experience in processing data in AWS and ELK stack.
  • Experienced in using Agile software methodology (Scrum).
  • Knowledge of different operating systems including Linux, Windows and UNIX Shell Script.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop
  • Involved in Successfully loading files to Hive and HDFS from MongoDB, Cassandra, HBase
  • Loaded the dataset into Hive for ETL Operation.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Spark-SQL to perform transformations and actions on data residing in Hive.
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD
  • Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Worked on Web logic, Tomcat Web Server for Development and Deployment of the Java/J2EE Applications.

TECHNICAL SKILLS

Languages: Java, Java script, SQL, XML, HTML, Scala, Python.

J2EE Technologies: Servlets/JSP, Java Beans, JDBC, JMS, EJB, web services, GWT

Databases: Oracle 10g, DB2.

Big data Technologies: Hadoop, Hive, Impala, MR, Spark, Kafka, Sqoop, Elastic search (ELK), Flink

No SQL: Cassandra, Hbase

EAI Technologies: Oracle SOA, BPEL, Tibco BW, Tibco EMS, Apache Camel

Application Servers: Tomcat 6, Weblogic 12.x, wildfly

Frame works: Struts1.2, Spring, Hibernate, Axis2, Jax-WS

Operating Systems: Linux, UNIX, Windows 98/NT/2000/XP/Vista

Java IDE: Eclipse, EditPlus, and JDeveloper

Testing Tools: SOAPUI

Configuration tools: Git, VSS, Clear Case, StarTeam, SVN

Design Tools: Microsoft Vision

PROFESSIONAL EXPERIENCE

Confidential, MN

Sr Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management
  • Worked on a product team using Agile Scrum methodology to Design, Develop, Deploy and support solutions that leverage the Client big data platform
  • Working on moving on premises Hadoop environment to Amazon EMR and s3 as optional storage
  • Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
  • Built on-premise data pipelines using kafka and spark for real time data analysis.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Evaluated performance of SparkSQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Implemented Spark using Scala and utilizing Data frames and SparkSQL API for faster processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Involved in managing and reviewing Hadoop Log files.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Setup Spark EMR to process huge data which is stored in AmazonS3.
  • Developed PIGUDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
  • Connected to AWS EC2 using SSH and ran spark-submit jobs
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Worked on Apache Flink for streaming and batch processing at runtime.
  • Integrated various storage systems to process their data using Apache Flink.
  • Worked with Apache Nifi for Data Ingestion. Triggered the shell Script and Schedule them using Nifi.
  • Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
  • Created Nifi flows to trigger spark jobs in case if we have any failures we got email notifications regarding the failures.

Environment: Hadoop, HDFS, Hive, Cassandra, Impala, Cloudera, SQL Server, UNIX Shell Scripting, AmazonS3,AWS,Oozie,Pig, Flink, Nifi, Python, Scala, Spark, SparkSql, Sqoop, Kafka.

Confidential, MA

Hadoop developer

Responsibilities:

  • Involved in ETL, Data Integration and Migration.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Real time streaming the data using Spark with Kafka.
  • Good knowledge on building Apache spark applications using Scala.
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS
  • Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Developed MapReduce and Spark jobs to discover trends in data usage by users
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
  • Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
  • Involved in Optimization of Hive Queries.
  • Implemented Spark POC's using Spark with Scala.
  • Created MapReduce jobs which were used in processing survey data and log data stored in HDFS.
  • Used Pig Scripts for data cleaning and data preprocessing.
  • Used MapReduce jobs and pig scripts.
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.

Environment: Hadoop, Map-Reduce, HDFS, Hadoop distribution of Cloudera, Spark, Hive, Pig, HBase, Storm, Scala, Flume, AWS, Sqoop, Cassandra

Confidential

Hadoop Developer

Responsibilities:

  • Involved in designing and developing Hadoop Map Reduce jobs using JAVA Runtime Environment for the batch processing to search and match the scores.
  • Involved in developing Hadoop Map Reduce jobs for merging and appending the repository data.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie.
  • Executed speedy reviews and first mover advantages by using workflows like Oozie in order to automate the data.
  • Loading process into the Hadoop distributed File System (HDFS) and Pig language in order to preprocess the data.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, Sqoop, Flume).
  • Worked on Oozie workflow engine for Job scheduling.
  • Importing and exporting large sets of data into HDFS and vice-versa using Sqoop.
  • Transferred log files from the log generating servers into HDFS.
  • Read the log generated data form HDFS using advanced HiveQL(Serialization-De Serialization).
  • Executed the HiveQL commands on CLI (Command Line Interface) and transferred back the required output data to HDFS.
  • Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Responsible for converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
  • Data Ingestion using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Apache tools like FLUME and SQOOP into HIVE and Nosql databases like Hbase.
  • Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.

Environment: UNIX, Linux, Java, Apache HDFS Map Reduce, Oozie, Hive Cassandra, Sqoop.

Confidential

Java/J2EE Developer

Responsibilities:

  • Implemented Servlets, JSP and Ajax to design the user interface.
  • Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
  • Used Servlets, JSP, Java Script, HTML, and CSS, RESTful for manipulating, validating, customizing, error messages to the User Interface.
  • Implemented Object-relation mapping in the persistence layer using Hibernate framework.
  • Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests.
  • All the Business logic in all the modules is written in core Java.
  • Wrote Web Services using SOAP for sending and getting data from the external interface
  • Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
  • Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO.
  • Worked on JSP, Servlets and JDBC in creating web components.
  • Used HTML, XHTML, JavaScript, JQuery, DHTML and Ajax to improve the interactive front end.
  • Used EJB entity and session beans to implement business logic and session handling and transactions.
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans using WebLogic as Application Server.
  • Designed the database tables and indexes used for the project.
  • Developed stored procedures, packages and database triggers to enforce data integrity.
  • JDBC API was used with Query Statements and Prepared Statements to interact with the database using SQL
Environment: JAVA, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, weblogic, JBOSS, HTML5, DHTML, CSS, XML, Log4j, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate.

We'd love your feedback!