Sr. Big Data Engineer Resume
New, JerseY
PROFESSIONAL SUMMARY:
- Technology professional with Java, Spark & Big data applications with 7 years of progressive and diverse experience in developing applications with an emphasis on Hadoop ecosystem Tools and Technologies using industry accepted methodologies and procedures
- Strong ability and experience in working on object oriented and functional components of JVM based languages like Java (8), Scala and good knowledge of Python
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in - memory computing capabilities written in Scala
- Hands on experience in working with multiple tools in the Hadoop ecosystem like SparkSQL, Spark Streaming, Hive, Pig, Sqoop, Kafka, MapReduce on various Hadoop distributions like Horton Works, Cloudera, EMR.
- Experience in the full life cycle of migrating data from existing in-house data center to the cloud built on top of AWS with experience in using multiple compression techniques
- Experience working with multiple RDBMS like MySQL, Oracle and columnar databases like Cassandra, HBase.
- Experience with D evops tools like Jenkins for continuous integration, Git as a repository and chef for automation of system level patching and upgrades.
TECHNICAL SKILLS:
Big data Ecosystem: Spark Streaming, Spark SQL, Kafka, Map Reduce, Hive, Impala, Sqoop
Operating Systems: Windows, Linux Distro (Ubuntu, Mint, Fedora)
Languages: Java 8, Scala
Scripting Language: Unix Shell Scripting, Python Scripting
RDBMS DB: Oracle, MySQL
NOSQL DB: Hbase, Cassandra
Servers: Tomcat, JBoss
Operations: Maven, Jenkins, SVN Repository, GIT
Web Services: REST, SOAP
MarkUp Languages: HTML/HTML5, XML,XML Schema, CSS/CSS3
PROFESSIONAL EXPERIENCE:
Sr. Big Data Engineer
Confidential, New Jersey
Responsibilities:- Actively participated from design phase of the data lake starting with performing POC’s using multiple Big data (Cassandra, HBase, Kafka, AWS) tools to identify the best tools that solve the business problem at hand.
- Responsible to identify the Scope of Testing, Project Estimation and Resource Planning for a project
- Responsible to prepare the Test Strategy in requirement gathering phase and maintain testing standards
- Validated the source tables data migrated into the Hive Database
- Experience in Data Analysis, Data Validation, Data Verification, Data Cleansing, Data Completeness and identifying Data mismatch
- Developed Impala to define the tables and mapping them to equivalent tables in HBase
- Involved in the development of Spark Streaming Application for one of the data source using Scala, Spark by applying the transformations
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
- Developed a Script in Scala to read all the Parquet tables in a Database and parse them as Json files another script to parse them as a structured tables in Hive.
- Extensively involved in writing Spark Applications with Spark-SQL/Streaming for faster processing of Data.
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries. Implemented, migrated existing Hive Script in SparkSQL for better performance.
- Created RESTful services, converted data formats to make it consumable by services.
- Developed Spark streaming jobs in JAVA 8 to receive real time data from Kafka, process and store the data to HDFS.
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS .
- Used Agile(SCRUM) methodologies for Software Development.
Environment: JAVA8, Sqoop, Impala, Kerberos, Spark SQL, Spark Streaming, Kafka, Scala, Amazon Web Services(AWS-EMR), Hive, Pig, REST, Oozie, Maven, Control-M, Jenkins.
Big Data Engineer
Confidential, New Jersey
Responsibilities:- Actively participated in complete software development lifecycle (Scope, Design, Implement, Deploy, Test) including design and code reviews.
- Moving Bulk amount of data into HBase with Map Reduce Integration.
- Developed Pig Programs for loading and filtering the streaming data into HDFS using Kafka.
- Developed Scala Scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, Queries and writing data back into OLTP System through Sqoop.
- Performed advanced procedures like text analytics and processing, using in-memory computing capabilities of Spark using Scala
- Developed HBase data model on top of HDFS data to perform real time analytics.
- Used tools like MapReduce and Spark with Scala for performing operations like Clickstream Analysis and to perform Analysis on batch Data.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Developed visualizations and dashboards using multiple BI tools like Tableau, Platfora.
- Used Oozie scheduler to automate the pipeline workflow and orchestrate the Map Reduce, Sqoop, hive and pig jobs that extract the data on a timely manner.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: MapReduce, Spark with Scala, Hive, Pig, Sqoop, Oozie, HBase, Platfora, Redis, REST Services, Linux, Maven, Jenkins, HDFS.
Java/J2EE Developer
Confidential
Responsibilities:- Designed, developed and validated User Interface using HTML, Java Script, XML.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Extensively worked with Struts, Hibernate, Spring(Spring Core, Spring MVC) application design, development.
- Experience in using various design patterns like Business Delegate, Session Facade, Service Locator, Singleton and Model-View-Controller
- Extensively worked with Web Services including SOAP over JMS & HTTP,REST(JAX-RS)
- Experience with multithreading, enterprise java beans(EJB), worked with session, entity and message driven bean
- Experience in preparation of Test procedures, Test Scenarios, Test Cases and Test Data.
- Created Test Cases using Element Locators and Selenium WebDriver Methods.
- Developed RDS client code, on the top of spring based SOAP web services, to insert, search and retrieve, update and delete
- Involved in Regression Testing and Automation Infrastructure Development using Selenium.
- Expertise in implementation of Automation FrameWork using Selenium, LoadRunner, UFT.
- Experience in developing applications in WebSphere 8.5 & 7, JBoss, Tomcat and BEA web logic. Developed server running script for automation using the JBoss 6.3 application server.
- Strong experience in SOA(Service Oriented Architecture), EAI (Enterprise Application Integration) and ESB (Enterprise Service Bus)
- Extensive experience in developing web page quickly and effectively using JSF, Ajax JQuery, JavaScript, HTML, CSS and also in making web pages cross browser compatible
- Good in unit testing skills using Junit framework and functional Junit capturing user entered data and mapping it back to database to provide accurate test results.
- Interacted with application architect to design the workflow and service integration on top of spring MVC, Ajax and web services layers.
- Set up Web sphere Application server and used Ant tool to build the application and deploy the application in Web sphere
- Used Spring Framework for Dependency Injection and integrated with Hibernate.
- Involved in writing JUnit Test Cases.
- Used Log4J for any errors in the application
Environment: Java, Spring Framework, J2EE, HTML, JUnit, XML, JavaScript, Eclipse, WebLogic, PL/SQL, Maven, Oracle .