Sr. Data Engineer Resume
Hartford, CT
SUMMARY
- 8 Years of extensive experience including 3 years of Big Data and Big Data analytics on Ecommerce, Education Financials and Healthcare domains, over 5 years of professional experience in design, development and support of Enterprise, Web and Client - Server applications using Java, J2EE (JSP, Servlets, Spring, JSF, Struts, Web Services(SOAP, REST), Hibernate), JDBC, HTML, Java Script.
- Experience in Developed Apache Spark jobs using Scala for faster data processing and used Spark SQL for querying. Excellent understanding of Hadoop Architecture and Deamons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra wif Cloudera and Hortonworks distribution.
- Created custom Solr Query segments to optimize ideal search matching.
- Developed Spark Application by using Python (Pyspark).
- Used Solr Search & MongoDB for querying and storing data.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala.
- Analyzed teh Cassandra/SQL scripts and designed teh solution to implement using Scala.
- Expertise in Big Data Technologies and Hadoop Ecosystem tools like Flume, Sqoop, HBase, Zookeeper, Oozie, MapReduce, Hive, PIG and YARN.
- Extracted and updated teh data into MONGO DB using MONGO import and export command line utility interface.
- Developed Collections in Mongo DB and performed aggregations on teh collections.
- Hands on experience in installation, configuration, management and deployment of Big Data solutions and teh underlying infrastructure of Hadoop Cluster using Cloudera and Horton works distributions.
- In-depth Knowledge of Data Structures, Design and Analysis of Algorithms.
- Good understanding of Data Mining and Machine Learning techniques.
- Hands on experience in various Hadoop distributions IBM Big Insights, Cloudera, Horton works and MapR.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for teh required input data and performed teh data transformations using Spark-Core.
- Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
- Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
- Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing teh Big Data.
- Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
- Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
- Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming teh data using flume interceptors.
TECHNICAL SKILLS
Hadoop/BigData Technologies: HDFS, Map Reduce, YARN, Pig, Hbase, Spark, Zookeeper, Hive, Oozie, Sqoop, Flume, Kafka, Storm, Impala
Hadoop Distribution Systems: Apache, Hortonworks, Cloud era, MapR
Programming Languages: Java JDK1.6/1.8, Python, Scala. C/C++, HTML, SQL, PL/SQL, AVS & JVS Frameworks Hibernate 2.x/3.x, spring 2.x/3.x, Struts 1.x/2.x
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Operating Systems: UNIX, Windows, LINUX
Web/Application Servers: IBM WebSphere, Apache Tomcat, WebLogic, JBOSS
Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript
Databases: Oracle, MS-SQL Server, MySQL
NoSQL Databases: HBase, Cassandra, MongoDB
IDE: Eclipse 3.x
Version Control: Git, SVN
AWS services: AWS EC2, S3, VPC
PROFESSIONAL EXPERIENCE
Confidential, Hartford, CT
SR. Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked in Agile Iterative sessions to create Hadoop Data Lake for teh client.
- Worked closely wif teh customer to understand teh business requirements and implemented them.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Scala/Python.
- Extracted files from NoSQL database (MongoDB) and processed them wif Spark using mongo Spark connector.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Written Hive queries on teh analyzed data for aggregation and reporting.
- Imported and exported data from different databases into HDFS and Hive using Sqoop.
- Used HUE and Aginity workbench for Hive Query execution.
- Hands on design and development of an application using Hive (UDF).
- Developed Simple to complex MapReduce streaming jobs using Python language dat are implemented using Hive and Pig.
- Used Sqoop for loading existing metadata in Oracle to HDFS.
- Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Worked on Apache spark writing Python applications to convert TXT, XML, JSON,files and parse.
- Developed teh collect UDF to collect array Stracts using brickhouse implementation.
- Teh collection of Necessary data to store it in one Central big data lake database dis Centralized big data lake will feed into Tableau dashboard to provide clear report.
Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark, SQL, Yarn, Linux, Sqoop, Java, Scala, Tableau, Python, SOAP, REST, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.
Confidential, Boston,MA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Extensively worked on elastic search querying and indexing to retrieve teh documents in high speeds.
- Ingested data to elastic search for lightening search.
- Loading JSON from upstream systems using Spark streaming and load them to elastic search.
- Written various key queries in elastic search for retrieval of data effectively.
- Used Spark-Streaming APIs to perform necessary transformations.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts. Implementation on Data Loading Part (XML Load).
- Worked on Apache spark writing Python applications to convert txt, xls files and parse.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala.
- Developed Spark scripts by using Scala shell commands as per teh requirement.
- Extensively worked on teh core and Spark SQL modules of Spark.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh Spark Cluster.
- Used Reporting tools like Kibana to connect wif Hive for generating daily reports of data.
- Involved in development of Storm topology for ingestion of data through XML payload and tan load them to various distributed stores.
- Extensively worked on MongoDB like crud operations, sharing etc.
- Developed REST services which processed several requests triggered from UI.
- Built data pipeline using Pig and Java Map Reduce to store onto HDFS in teh initial version of teh product.
- Stored teh output files for export onto HDFS and later these files are picked up by downstream systems.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data and developed very quick POC’S on Spark in teh initial stages of teh product.
- Manage and support Info works teh data ingestion and integration tool for teh Data Lake.
- Plan data ingestion and integration process from teh EDW Environmentinto a Data lake in HDFS and test SOLR for index search.
- Load teh data into Spark RDD and do in memory data Computation to generate teh Output response.
- Exploring wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used RESTFUL web services in JSON format to develop server applications.
- Used Kafka wif Spark (YARN based) to pump and analyze real time data in teh Data lake.
- Experienced working wif HDP2.4.
Environment: Hortonworks, Linux, Java, Python, Map Reduce, HDFS, Hive, Pig, SqoopApache Spark Apache Strom,Elastic Search, Kafka, Zookeeper, and Kibana.
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources using Sqoop.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- DevelopPigLatin scripts to extract data from teh output files to load into HDFS.
- Developed workflow inOozie workflow schedulerto automate and schedule teh tasks of loading teh data into HDFS and pre-processing wif Pig.
- Responsible for Design and developing solution for data extraction, cleansing and transformation using tools like MSBI, Azure DW, Azure Datalake, Azure Polybase,Azure HDInsight(Spark).
- ImplementedUDFs in java for hive to process teh data dat can't be performed using Hive inbuilt functions.
- Developed simple to complex UNIX shell/Bash scripting scripts in framework developing process.
- Developed complexTalendjobs mappings to load teh data from various sources using different components.
- Design, develop and implement solutions usingTalendIntegration Suite.
- Worked on implementing Flume to import streaming data logs and aggregating teh data to HDFS throughFlume.
- Worked on POC's to integrateSparkwif other tools.
- Involved in installing AWSEMRframework.
- SetupAmazonEC2multinode Hadoopcluster wif PIG, Hive, Sqoop ecosystem tools.
- Experience in moving data toAmazonS3, also, performedEMRprograms on data stored in S3
- Created Parquet Hive tables wif Complex Data Types corresponding to teh Avro Schema.
Environment: HDFS, Sqoop, Hive, HBase, Pig, Flume, Yarn, Ozie, Spark, Talend ETL, Apache Parquet, Amazon EC2, AWS EMR, Amazon S3, UNIX/Linux Shell Scripting, NoSQL, JIRA.
Confidential, NJ
Hadoop Developer
Responsibilities:
- Analyze large datasets to provide strategic direction to teh company.
- Collected teh logs from teh physical machines and integrated into HDFS using Flume.
- Involved in analyzing teh system and business.
- Developed SQL statements to improve back-end communications.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into teh Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Involved in writing Hive queries to load and process data in Hadoop File System.
- Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
- Involved in working wif Impala for data retrieval process.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
- Sentiment Analysis on reviews of teh products on teh client’s website.
- Exported teh resulted sentiment analysis data to Tableau for creating dashboards
- Experienced in Agile processes and delivered quality solutions in regular sprints.
- Developed custom Map Reduce programs to extract teh required data from teh logs.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Responsible for creating Hive tables, loading teh structured data resulted from MapReduce jobs into teh tables and writing hive queries to further analyze teh logs to identify issues and behavioral patterns.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts. Implementation on Data Loading Part (XML Load)
Environment: Hadoop, HDFS, Pig, Sqoop, Oozie, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential, New York, NY
Java/J2EE Developer
Responsibilities:
- Responsible for all stages of design, development, and deployment of applications.
- Used Agile (SCRUM) methodologies for Software Development.
- Implemented teh application using Struts2 Framework which is based on Model View Controller design pattern.
- Developed Custom Tags to simplify teh JSP2.0 code. Designed UI screens using JSP 2.0, Ajax and HTML. Used JavaScript for client side validation.
- Actively involved in designing and implementing Value Object, Service Locator, and MVC and DAO design patterns.
- Developed and used JSP custom tags in teh web tier to dynamically generate web pages.
- Used Java Message Service for reliable and asynchronous exchange of important information such as Order submission dat consumed teh messages from teh Java message queue and generated emails to be sent to teh customers.
- Designed and developed Stateless Session driven beans (EJB 3)
- Used JQuery as a Java Script library.
- Used Data Access Object (DAO) pattern to introduce an abstraction layer between teh business logic tier (Business object) and teh persistent storage tier (data source).
- Implemented Session EJB’s at a middle tier level to house teh business logic.
- Used Restful Web services for sending and getting data from different applications using Jersey Framework.
- Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
- Used DB2 as database and developed complex SQL queries.
- Used Unit framework for unit testing of application and Maven to build teh application and deployed on Web Sphere 8.5. Used IDE RAD 7.5
- Used HP Quality Center for Defect Reporting and Tracking
- Prepared Low Level Design, High level Design, Unit testing Results documents.
- Used Log4J for logging.
Environment: Struts2, EJB 3, Web Sphere 8.5, Query, Java 1.6, REST Jersey, JSP 2.0, Servlets 2.5, JMS,XML,JavaScript, UML, HTML5, JNDI, CVS, Log4J, JUnit, Eclipse.
Confidential
Java Developer
Responsibilities:
- Analyzed Business requirements based on teh Business Requirement Specification document.
- Involved in System Requirements study and conceptual design.
- Created UML diagrams like activity diagrams, sequence diagrams, and Use case diagrams.
- Developed presentation layer of teh project using HTML, JSP 2.0, and JSTL and JavaScript technologies.
- Using Micro services based architecture to develop Micro services from a large monolithic.
- Used Object/Relational mapping Hibernate 3.0 framework as teh persistence layer for interacting wif Oracle 9i.
- Used various Java and J2EE APIs including XML, Servlets, JSP and JavaBeans.
- Designed and developed Application based on Struts Framework using MVC design pattern.
- Developed Struts Action classes using Struts controller component.
- Written complex SQL queries, stored procedures, functions and triggers in PL/SQL.
- Configured and used Log4j for logging all teh debugging and error information.
- Developed Ant build scripts for compiling and building teh project. Used SVN for version control of teh application.
- Created test plans and JUnit test cases and test suite for testing teh application.
- Participated in teh production support and maintenance of teh project.
Environment: GWT, Java, Web Logic, UNIX OS, CSS, JavaScript, AJAX, Eclipse, Perforce, Maven, Hudson, HP Client for Automation, Argo UML, Putty, HP Quality Center.
Confidential
JR Application Developer
Responsibilities:
- Critical role in teh Production support and Customization of application wif requirement gathering, analysis, troubleshooting, administrating, production deployment and Development through Agile principles.
- Involved in teh elaboration, construction and transition phases of teh Rational Unified Process.
- Designed and developed necessary UML Diagrams like Use Case, Class, Sequence, State and Activity diagrams using IBM Rational Rose.
- Used IBM Rational Application Developer (RAD) for development.
- Extensively applied various design patterns such as MVC-2, Front Controller, Factory, Singleton, Business Delegate, Session Façade, Service Locator, DAO etc. throughout teh application for a clear and manageable distribution of roles.
- Implemented teh project as a multi-tier application using Jakarta Struts Framework along wif JSP for teh presentation tier.
- Used teh Struts Validation Framework for validation and Struts Tiles Framework for reusable presentation components at teh presentation tier.
- Developed various Action Classes dat route requests to appropriate handlers.
- Developed Session Beans to process user requests and Entity Beans to load and store information from IBM DB2database.
- Wrote Stored Procedures and complicated queries for IBM DB2.
Environment: Struts 2.5, MQ Series, JSP 2.0, JMS, JNDI, JDBC, PL/SQL, JavaScript, IBM DB2, IBM Rational Rose, JUnit, CVS, log4j, and LINUX.