We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Washington, DC

SUMMARY:

  • Above 11+ years of experience as Big Data Engineer/Hadoop and Java Developer with skills in analysis, design, development, testing and deploying various software applications.
  • Design and development experience with Big Data, AWS, Apache Spark, Python, Cassandra NoSQL, Scala, Hadoop Eco System Components like Pig, Hive, Sqoop, HDFS, Shell scripting, AWS, BI reporting.
  • Experience in analyzing data using Hive, Pig Latin and custom MR programs in Java.
  • Hands on experience in writing Spark SQL scripts and implementing Spark RDD transformations and actions using Python/Scala.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
  • Excellent Java development skills using J2EE Frameworks like Spring, Hibernate, Web Services and Restful Web Services, Micro - services
  • Experience building platforms and deploying cloud based tools and solutions with technologies like AWS EMR, RDS, Kinesis
  •  Experience in developing applications using enterprise J2EE technologies like Java Servlets JSP
  • Hands on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Well versed with developing and implementing Spark programs using Python/Scala and Spark Streaming to work with Big Data.
  • Hands on writing custom UDFs for extending Hive and Pig core functionality.
  • Hands on dealing with log files to extract data and to copy into HDFS using flume.
  • Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services(AWS) - EC2, Open Stack.
  • Good experience in working with real time streaming applications using tools like Spark Streaming, Storm and Kafka.
  •  Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef, Jenkins and Deploy 
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Knowledge on installing, configuring, and using Hadoop components like Hadoop Map Reduce(MR1), YARN(MR2), HDFS, Hive, Pig, Flume and Sqoop.
  • Interested in exploring new technologies and experimenting with them to improve existing infrastructure and applications.
  • Good understanding of Data Mining and Machine Learning techniques.

TECHNICAL SKILLS:

Big Data Ecosystem: MapReduce, HDFS, Yarn, HBase, Zookeeper, Hive, Pig, Sqoop, MongoDB, Flume.

Programming Languages: JDK 7/6, Java/J2EE, R, Pig, Hive, SQl, Linux.

Web technologies: AngularJS, Hibernate, HTML 5/4, JavaScript, CSS 3/2, JSP, DHTML, XML, XSLT, AJAX, JQuery and ExtJS

Databases: MYSQL, Oracle12c/11g, SQL Server 2016/2014, NoSQL, AWS, Cassandra

Scheduling Tool: Autosys, Control-M and Informatica Scheduler, Zena.

IDE’s: Eclipse 4.6/4.2, Visual Studio 2014/2008, Net Beans 8.2/7.5

Operating System: Windows 7, 8, 10, Linux Ubuntu.

Application/Web Servers: Web Sphere Application Server 6.1, Tomcat 7.0.

Build Tools & Others: ANT, Maven, Visio, Gliffy, iReport 4.5.1, SQl Developer.

Testing Tools: Win runner, Load Runner, Quick Test Professional QTP

PROFESSIONAL EXPERIENCE:

Confidential, Washington, DC

Sr. Big data Engineer

Responsibilities:

  • Implemented POC by comparing SPARK with Hive on big data sets by performing aggregations and observing time responses.
  • Worked with Business Analyst and helped representing the business domain details and prepared low level and high level documentation
  • Created Hive tables and created Sqoop jobs to import the data from Oracle to HDFS and scheduled them in Autosys by creating Oozie workflows
  • Designed and developed applications that work on AngularJS based UI and Restful APIs, Cassandra DB, AWS environment, security, 
  • Import the data from different sources like HBase into Spark RDD developed a data pipeline using Kafka and Storm to store data into HDFS
  • Developed script which will Load the data into Spark RDD and do in memory data computation to generate the output response.
  • Involved in converting Hive into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Developed Spark streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Developed Spark programs, scripts and UDF's using Spark SQL for aggregative operations as per the requirement.
  • Used Spark Data Frame API to perform analytics on hive data and implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
  • Involved in converting Hive into Spark transformations using Spark SQL and Scala.
  • Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings. 
  • Leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test Driven Development to enable the rapid delivery of end user capabilities

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, Kafka, SQL, Hortonworks, Spark, Sqoop, Storm, Flume, AWS, Tableau, YARN, Oozie, Eclipse, Cloudera, Cassandra, Python, Scala, Shell Scripting, Hadoop, Oracle, UNIX, NoSQL.

Confidential, NJ

Sr. Big data Engineer

Responsibilities:

  • Used Spark API over Hadoop YARN to perform analytics on data in Hive
  • Involved in developing Spark using Scala and Spark-SQL for faster processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN. 
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Integrated Splunk with Hadoop and setup jobs to export data from and to Splunk.
  • Developed generic Sqoop script for importing and exporting data between HDFS and Relational Systems like Oracle and Teradata.
  • Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX.
  • Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.
  • Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on top of them.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
  • Working with vendor and client support teams to assist critical production issues based on SLA.
  • Used Splunk to captures, indexes and correlates real-time data in a searchable repository from which it can generate reports and alerts. 
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, Cassandra, HBase, Sqoop, Apache Kafka, Linux, Talend, Tableau, AWS, Teradata, Oracle, JIRA, Confluence, GitHub, Bitbucket, Source tree, Jenkins, UNIX.

Confidential, Tampa, FL

Sr. Hadoop/J2EE Developer

Responsibilities:

  • Used Sqoop extensively to import data from RDMS sources into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
  • Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Developed multiple scripts for analyzing data using Hive and Pig and integrating with HBase.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Utilized Puppet for configuration management of hosted Instances within AWS.
  • Used spark machine learning technique implemented in Scala.
  • Involved in continuous monitoring of operations using Storm.
  • Involved in creating generic Sqoop import script for loading data into hive tables from RDBMS.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • Developed the technical strategy for Spark integrated for pure streaming and more general data-computation needs.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Developed the technical strategy for Spark integrated for pure streaming and more general data-computation needs.
  • Exported the result set from HIVE to MYSQL using Kettle (Pentaho data-integration tool).
  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Developed Spark streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
  • Worked with business team in creating Hive queries for ad hoc access.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Kafka, Hive, Scala Pig, Sqoop, Oozie, AWS, Solaris, DB2, Spark SQL, Spark Streaming, Spark, UNIX Shell Scripting.

Confidential, Atlanta, GA

Sr. Java/J2EE Developer

Responsibilities:

  • Involved in Java, J2EE, Struts, Spring, Web Services and Hibernate in a fast paced development environment.
  • Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Used Spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.
  • Used Spring Inheritance to develop beans from already developed parent beans.
  • Used DAO pattern to fetch data from database using Hibernate to carry out various database.
  •  Created and consumed SOAP Web Services using JAX-WS
  • Involved in developing Unit testing & Integration testing with unit testing frameworks like JUnit, Mockito, Test NG, Jersey Test and Power Mocks.
  • Proficient in developing applications having exposure to Java, JSP, UML, Servlets, Struts, Swing Oracle (SQL, PL/SQL), HTML, Junit, JSF, Java Script, CSS.
  • Worked on Evaluating, comparing different tools for test data management with Hadoop.
  • Developed Session Beans which encapsulates the workflow logic  Application deployment on Tomcat Web Server & Weblogic application server
  • Involved in defect tracking as well as planning using JIRA
  • Developed enterprise inter-process communication frame work using Spring REST-ful Web Service
  • Developed frontend of application using Bootstrap (Model, View, and Controller), Java Script, and Angular.js framework.
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Modified the Spring Controllers and Services classes so as to support the introduction of Spring framework.
  • Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
  • Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
  • Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design

Environment: J2EE, Spring, Spring MVC, Hibernate, jQuery, JSON, JSF, Servlets, JDBC, AJAX, Web services, SOAP, XML, Java Beans, SOAP, CSS, HTML, AngularJS, Bootstrap, JIRA, JavaScript, Oracle, IBM RAD, WebSphere

Confidential, Sacramento, CA

Java/J2EE Developer

Responsibilities:

  • Actively involved in setting coding standards and writing related documentation.
  • Published and consumed Web Services using SOAP, WSDL and deployed it on Weblogic server Web Server.
  • Used SOAP based web services to develop interfaces to integrate between front end systems and back end systems.
  • Developed WSDL and XSD for creating interfaces between different systems using SOAP based web services.
  • Used Spring Framework for Authentication and Authorization and ORM components to support the Hibernate tool.
  • Responsible for writing/reviewing server side code using Spring JDBC and DAO module of spring for executing stored procedures and SQL queries
  • Developed Web services for the services to get the data from external systems to process the request from client sides
  • Designed and developed the business logic layer and data access layer using different kinds of EJB and Data Access Objects.
  • Developed new screens to the application using HTML, CSS, JSP, JavaScript and AJAX.
  • Developed the application using Eclipse as the IDE and used the standard features for editing, debugging, running etc.
  • Built scripts using MAVEN that compiles the code, pre-compiles the JSP, built an EAR file and deployed the application on the Web Logic application server.
  • Used SVN as a documentation repository and version-controlling tool.
  • Created the design documents with use case diagrams, class diagrams, and sequence diagrams using Rational Rose.
  • Participated in and contributed to design reviews and code reviews.
  • The web service is created using top down approach and tested using SOAP UI tool
  • Developed JSPs & Servlets to dynamically generate HTML and display data to client side.
  • Designed Web Applications using MVC design pattern.
  • Developed Shell script to retrieve the vendor files dynamically and used Cron tab to execute these scripts periodically.
  • Designed the Batch Process for processing vendor data files using IBM Web sphere Application Server's Task Manager Framework.
  • Used Log4j logging framework to debug the code. 
  • Performed unit testing using JUnit Testing Framework and Log4J to monitor the error log.

Environment: Core JAVA, J2EE, JSON, JSP, MAVEN, Eclipse, Hibernate, Spring, JavaScript,, HTML CSS, JUNIT, Web services, SOAP, Oracle, UML, Web Logic, WSDL, EJB, SOAP, WSDL, SOAP UI, Jenkins CI, Windows.

We'd love your feedback!