We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Hartford, CT

SUMMARY:

  • 8 Years of extensive experience including 3 years of Big Data and Big Data analytics on Ecommerce, Education Financials and Healthcare domains, over 5 years of professional experience in design, development and support of Enterprise, Web and Client - Server applications using Java, J2EE (JSP, Servlets, Spring, JSF, Struts, Web Services(SOAP, REST), Hibernate), JDBC, HTML, Java Script.
  • Experience in Developed Apache Spark jobs using Scala for faster data processing and used Spark SQL for querying. Excellent understanding of Hadoop Architecture and Deamons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
  • Created custom Solr Query segments to optimize ideal search matching.
  • Developed Spark Application by using Python (Pyspark).
  • Used Solr Search & MongoDB for querying and storing data.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala.
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala.
  • Expertise in Big Data Technologies and Hadoop Ecosystem tools like Flume, Sqoop, HBase, Zookeeper, Oozie, MapReduce, Hive, PIG and YARN.
  • Extracted and updated the data into MONGO DB using MONGO import and export command line utility interface.
  • Developed Collections in Mongo DB and performed aggregations on the collections.
  • Hands on experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera and Horton works distributions.
  • In-depth Knowledge of Data Structures, Design and Analysis of Algorithms.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Hands on experience in various Hadoop distributions IBM Big Insights, Cloudera, Horton works and MapR.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
  • Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
  • Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
  • Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
  • Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
  • Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming the data using flume interceptors.

TECHNICAL SKILLS:

Hadoop/BigData Technologies: HDFS, Map Reduce, YARN, Pig, Hbase, Spark, Zookeeper, Hive, Oozie, Sqoop, Flume, Kafka, Storm, Impala

Hadoop Distribution Systems: Apache, Hortonworks, Cloud era, MapR

Programming Languages: Java JDK1.6/1.8, Python, Scala. C/C++, HTML, SQL, PL/SQL, AVS & JVS Frameworks Hibernate 2.x/3.x, spring 2.x/3.x, Struts 1.x/2.x

Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey

Operating Systems: UNIX, Windows, LINUX

Web/Application Servers: IBM WebSphere, Apache Tomcat, WebLogic, JBOSS

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript

Databases: Oracle, MS-SQL Server, MySQL

NoSQL Databases: HBase, Cassandra, MongoDB

IDE: Eclipse 3.x

Version Control: Git, SVN

AWS services: AWS EC2, S3, VPC

PROFESSIONAL EXPERIENCE:

Confidential, Hartford, CT

SR. Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked in Agile Iterative sessions to create Hadoop Data Lake for the client.
  • Worked closely with the customer to understand the business requirements and implemented them.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Scala/Python.
  • Extracted files from NoSQL database (MongoDB) and processed them with Spark using mongo Spark connector.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Written Hive queries on the analyzed data for aggregation and reporting.
  • Imported and exported data from different databases into HDFS and Hive using Sqoop.
  • Used HUE and Aginity workbench for Hive Query execution.
  • Hands on design and development of an application using Hive (UDF).
  • Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Used Sqoop for loading existing metadata in Oracle to HDFS.
  • Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Worked on Apache spark writing Python applications to convert TXT, XML, JSON,files and parse.
  • Developed the collect UDF to collect array Stracts using brickhouse implementation.
  • The collection of Necessary data to store it in one Central big data lake database This Centralized big data lake will feed into Tableau dashboard to provide clear report.

Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark, SQL, Yarn, Linux, Sqoop, Java, Scala, Tableau, Python, SOAP, REST, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.

Confidential, Boston, MA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Extensively worked on elastic search querying and indexing to retrieve the documents in high speeds.
  • Ingested data to elastic search for lightening search.
  • Loading JSON from upstream systems using Spark streaming and load them to elastic search.
  • Written various key queries in elastic search for retrieval of data effectively.
  • Used Spark-Streaming APIs to perform necessary transformations.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts. Implementation on Data Loading Part (XML Load).
  • Worked on Apache spark writing Python applications to convert txt, xls files and parse.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Extensively worked on the core and Spark SQL modules of Spark.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the Spark Cluster.
  • Used Reporting tools like Kibana to connect with Hive for generating daily reports of data.
  • Involved in development of Storm topology for ingestion of data through XML payload and then load them to various distributed stores.
  • Extensively worked on MongoDB like crud operations, sharing etc.
  • Developed REST services which processed several requests triggered from UI.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS in the initial version of the product.
  • Stored the output files for export onto HDFS and later these files are picked up by downstream systems.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data and developed very quick POC’S on Spark in the initial stages of the product.
  • Manage and support Info works the data ingestion and integration tool for the Data Lake.
  • Plan data ingestion and integration process from the EDW Environmentinto a Data lake in HDFS and test SOLR for index search.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used RESTFUL web services in JSON format to develop server applications.
  • Used Kafka with Spark (YARN based) to pump and analyze real time data in the Data lake.
  • Experienced working with HDP2.4.

Environment: Hortonworks, Linux, Java, Python, Map Reduce, HDFS, Hive, Pig, SqoopApache Spark Apache Strom,Elastic Search, Kafka, Zookeeper, and Kibana.

Confidential, San Francisco, CA

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources using Sqoop.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Develop Pig Latin scripts to extract data from the output files to load into HDFS.
  • Developed workflow in Oozie workflow scheduler to automate and schedule the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for Design and developing solution for data extraction, cleansing and transformation using tools like MSBI, Azure DW, Azure Datalake, Azure Polybase,Azure HDInsight(Spark).
  • Implemented UDFs in java for hive to process the data that can't be performed using Hive inbuilt functions.
  • Developed simple to complex UNIX shell/Bash scripting scripts in framework developing process.
  • Developed complex Talend jobs mappings to load the data from various sources using different components.
  • Design, develop and implement solutions using Talend Integration Suite.
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Worked on POC's to integrate Spark with other tools.
  • Involved in installing AWS EMR framework.
  • Setup Amazon EC2 multinode Hadoop cluster with PIG, Hive, Sqoop ecosystem tools.
  • Experience in moving data to Amazon S3, also, performed EMR programs on data stored in S3
  • Created Parquet Hive tables with Complex Data Types corresponding to the Avro Schema.

Environment: HDFS, Sqoop, Hive, HBase, Pig, Flume, Yarn, Ozie, Spark, Talend ETL, Apache Parquet, Amazon EC2, AWS EMR, Amazon S3, UNIX/Linux Shell Scripting, NoSQL, JIRA.

Confidential, NJ

Hadoop Developer

Responsibilities:

  • Analyze large datasets to provide strategic direction to the company.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Involved in analyzing the system and business.
  • Developed SQL statements to improve back-end communications.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
  • Created reports and dashboards using structured and unstructured data.
  • Involved in importing data from MySQL to HDFS using SQOOP.
  • Involved in writing Hive queries to load and process data in Hadoop File System.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in working with Impala for data retrieval process.
  • Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
  • Sentiment Analysis on reviews of the products on the client’s website.
  • Exported the resulted sentiment analysis data to Tableau for creating dashboards
  • Experienced in Agile processes and delivered quality solutions in regular sprints.
  • Developed custom Map Reduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts. Implementation on Data Loading Part (XML Load)

Environment: Hadoop, HDFS, Pig, Sqoop, Oozie, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential, New York, NY

Java/J2EE Developer

Responsibilities:

  • Responsible for all stages of design, development, and deployment of applications.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Implemented the application using Struts2 Framework which is based on Model View Controller design pattern.
  • Developed Custom Tags to simplify the JSP2.0 code. Designed UI screens using JSP 2.0, Ajax and HTML. Used JavaScript for client side validation.
  • Actively involved in designing and implementing Value Object, Service Locator, and MVC and DAO design patterns.
  • Developed and used JSP custom tags in the web tier to dynamically generate web pages.
  • Used Java Message Service for reliable and asynchronous exchange of important information such as Order submission that consumed the messages from the Java message queue and generated emails to be sent to the customers.
  • Designed and developed Stateless Session driven beans (EJB 3)
  • Used JQuery as a Java Script library.
  • Used Data Access Object (DAO) pattern to introduce an abstraction layer between the business logic tier (Business object) and the persistent storage tier (data source).
  • Implemented Session EJB’s at a middle tier level to house the business logic.
  • Used Restful Web services for sending and getting data from different applications using Jersey Framework.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Used DB2 as database and developed complex SQL queries.
  • Used Unit framework for unit testing of application and Maven to build the application and deployed on Web Sphere 8.5. Used IDE RAD 7.5
  • Used HP Quality Center for Defect Reporting and Tracking
  • Prepared Low Level Design, High level Design, Unit testing Results documents.
  • Used Log4J for logging.

Environment: Struts2, EJB 3, Web Sphere 8.5, Query, Java 1.6, REST Jersey, JSP 2.0, Servlets 2.5, JMS,XML,JavaScript, UML, HTML5, JNDI, CVS, Log4J, JUnit, Eclipse.

Confidential

Java Developer

Responsibilities:

  • Analyzed Business requirements based on the Business Requirement Specification document.
  • Involved in System Requirements study and conceptual design.
  • Created UML diagrams like activity diagrams, sequence diagrams, and Use case diagrams.
  • Developed presentation layer of the project using HTML, JSP 2.0, and JSTL and JavaScript technologies.
  • Using Micro services based architecture to develop Micro services from a large monolithic.
  • Used Object/Relational mapping Hibernate 3.0 framework as the persistence layer for interacting with Oracle 9i.
  • Used various Java and J2EE APIs including XML, Servlets, JSP and JavaBeans.
  • Designed and developed Application based on Struts Framework using MVC design pattern.
  • Developed Struts Action classes using Struts controller component.
  • Written complex SQL queries, stored procedures, functions and triggers in PL/SQL.
  • Configured and used Log4j for logging all the debugging and error information.
  • Developed Ant build scripts for compiling and building the project. Used SVN for version control of the application.
  • Created test plans and JUnit test cases and test suite for testing the application.
  • Participated in the production support and maintenance of the project.

Environment: GWT, Java, Web Logic, UNIX OS, CSS, JavaScript, AJAX, Eclipse, Perforce, Maven, Hudson, HP Client for Automation, Argo UML, Putty, HP Quality Center.

Confidential

JR Application Developer

Responsibilities:

  • Critical role in the Production support and Customization of application with requirement gathering, analysis, troubleshooting, administrating, production deployment and Development through Agile principles.
  • Involved in the elaboration, construction and transition phases of the Rational Unified Process.
  • Designed and developed necessary UML Diagrams like Use Case, Class, Sequence, State and Activity diagrams using IBM Rational Rose.
  • Used IBM Rational Application Developer (RAD) for development.
  • Extensively applied various design patterns such as MVC-2, Front Controller, Factory, Singleton, Business Delegate, Session Façade, Service Locator, DAO etc. throughout the application for a clear and manageable distribution of roles.
  • Implemented the project as a multi-tier application using Jakarta Struts Framework along with JSP for the presentation tier.
  • Used the Struts Validation Framework for validation and Struts Tiles Framework for reusable presentation components at the presentation tier.
  • Developed various Action Classes that route requests to appropriate handlers.
  • Developed Session Beans to process user requests and Entity Beans to load and store information from IBM DB2database.
  • Wrote Stored Procedures and complicated queries for IBM DB2.

Environment: Struts 2.5, MQ Series, JSP 2.0, JMS, JNDI, JDBC, PL/SQL, JavaScript, IBM DB2, IBM Rational Rose, JUnit, CVS, log4j, and LINUX.

We'd love your feedback!