Sr. Big Data Engineer Resume
Mclean, VA
SUMMARY
- Around 7 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
- Hadoop Developer with 3 years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Experience in working with different Hadoop distributions like CDH and Hortonworks .
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), User Defined Aggregate Function (UDAFs) for custom data specific processing.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and good knowledge on Zookeeper.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked on NoSQL databases including HBase and Mongo DB.
- Experienced with performing CRUD operations using HBase Java Client API and Solr API
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3 .
- Experience in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins and AWS.
- Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Strong Experience in working with Databases like Oracle 10g, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.
Programming Languages: Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL
Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.
Development Tools: Eclipse, SVN, Git, Ant, Maven, SOAP UI
Databases: Greenplum, Oracle 11g/10g/9i, Teradata, MS SQL
No SQL Databases: Apache HBase, Mongo DB
Frameworks: Struts, Hibernate, And Spring MVC.
Distributed platforms: Hortonworks, Cloudera.
Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8
PROFESSIONAL EXPERIENCE
Confidential - McLean VA
Sr. Big Data Engineer
Responsibilities:
- Integrating transaction data with Mhub (360Science) engine and Loqate (for address verification) to build the customer master database.
- Worked on Name Matching process to generate customer Ids for Hilton customers, for this used Mhub inline database for generating unique customer id. Worked on Loqate tool for finding accurate address code based on ranking that address passed.
- Designing and Developing Spark/Scala jobs to ingest the transaction data into enterprise zone, which is on hive and Amazon S3.
- Integrating the required business rules using Scala programming language and bundle them within spark jobs.
- Designing and Developing Apache NiFi jobs to get the files from transaction systems into data lake raw zone.
- Designing and Develop ingestion scripts to load the transaction data from Amazon S3 to business zone, which is on Amazon Redshift .
- Collaborating with Quality Assurance team to get the Spark/Scala code and data certified.
- Troubleshooting the production incidents reported by Business Team.
- Support and enhance existing enterprise information management frameworks.
- Closely collaborate with source and target systems build managers and analysts to understand those systems integration capabilities during all phases of implementation. Hence, demonstrating the knowledge of various ETL integrations is very important.
- Demonstrate technical expertise and analytical skills, data architecture and integration work closely with product owners, data architects, build managers and other IT partners to deliver project and operational changes to data integrations/transformations within data lake and analytics services.
- Work with architecture and business teams to schedule, deliver and manage workloads.
- Work with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
- Assisting business teams with data feeds and testing.
- Implementing real time data analytics and virtualizations using data streams and business intelligence APIs.
- Implementing complex business rules and common services for enterprise data analytics.
- Working closely with development, engineering and operation teams, jointly develop key deliverables ensuring production scalability and stability.
- Evaluate and introduce technology tools and processes that enable organization to develop products and solutions, to embrace business opportunities and/or improve operational efficiency.
- Work with Architecture and Development teams to understand usage patterns and workload requirements of new projects in order to ensure the Hadoop platform can effectively meet performance requirements and service levels of application.
- Develop and enhance platform best practices and educate developers on the same.
- Good understanding on establishing analytic environments required for structured, semi-structured and unstructured data.
Environment: Apache Spark, Scala, Hive, HDFS, Hortonworks, Aurora MySQL, 360 Science Mhub, Loqate, Apache HBase, AWS S3, AWS Redshift, Maven, Oozie, Apache NiFi, IntelliJ and UNIX Shell Scripting.
Confidential, NY
Hadoop Developer
Responsibilities:
- Worked on Lilly for indexing the data added/updated/deleted in HBase database to Solr collection. Indexing allows to query data stored in HBase with the Solr service.
- Working on Spark stack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.
- Used Lilly Indexer for supporting flexible, custom, application-specific rules to extract, transform, and load HBase data into Solr.
- Worked Spark on Treadmill to deploy a cluster from scratch under couple of minutes.
- Implemented Moving averages, Interpolations and Regression analysis on input data using Spark with Scala.
- Worked on POC for streaming data using Kafka and Spark streaming
- Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala
- Experience in pulling the data from Amazon s3 bucket to data lake and built hive tables on top of it and created data frames in spark on top of that data and performed further analysis.
- Created HBase tables to store variable data formats of input data coming from different portfolios
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Create, modify and execute DDL and ETL scripts for de-normalized tables to load data into Hive and AWS Redshift tables.
- Extensively used Talend Bigdata tool to load the big volume of source files from S3 to Redshift.
- Designed and managed External tables with right partition strategies to optimize performance in Hive.
- Responsible for gathering the business requirements for the Initial POCs to load the enterprise data warehouse data to Greenplum databases.
- Developed and maintained large scale distributed data platforms with experienced in data warehouses, data marts and data lakes.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
Environment: Java 1.8, Scala 2.11, Apache Spark 1.6.0, Apache Zeppelin, AWS Redshift, GreenPlum 4.3 (PostgreSQL), Treadmill, CDH 5.8.2, ivy 2.0, Gradle 2.13, Hive, HDFS, Sqoop 1.4.3, Apache SOLR, Apache HBase, UNIX Shell Scripting, AWS S3, Jenkins.
Confidential, Eagan MN
Hadoop/Spark Developer
Responsibilities:
- Active member for developing POCs on Real time data processing applications by using Scala and implemented Apache Spark Streaming from our streaming source WSO2 JMS message broker. Json Data for real time processing would be available as event streams on WS02 ESB for streaming ingestion for Spark.
- Involved in developing POC to develop and Configured Kafka brokers to pipeline server logs data into Spark streaming for real time processing.
- Worked on Spark processing to load data from HDFS into Hive managed tables in Stage Layer using data processing jobs.
- Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
- Used IIG ( Confidential Information Grid) jobs for batch files received via PRIME data Intake process are ingested into HDFS RAW Layer.
- Worked in job scheduling workflow designed by in IIG tool supportel Airflow scheduling platform.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Extensively worked on creating roles and providing grant permissions for specific active directory groups.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Implemented Partitioning and bucketing in Hive based on the requirement.
- Created tables in HiveQL and used Serde to analyze the JSON files from HBase.
- Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using DataStage.
- Evaluate and introduce technology tools and processes that enable organization to develop products and solutions, to embrace business opportunities and/or improve operational efficiency.
- Work with Architecture and Development teams to understand usage patterns and workload requirements of new projects in order to ensure the Hadoop platform can effectively meet performance requirements and service levels of application.
- Good understanding on establishing analytic environments required for structured, semi-structured and unstructured data.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Onsite-Offshore synchronization. Teams at both the ends should be well connected to have a smooth flow in the project and solve the roadblocks
Environment: Scala 2.11, Apache Spark 1.6, WSO2, IIG ( Confidential Information Grid), Airflow, Hadoop2.6, Hive, HDFS, Sqoop, SQL, Hive, Apache HBase, AWS, UNIX Shell Scripting, Agile Methodology, Auto Sys and Sub Version.
Confidential, IN
Java/SQL Developer
Responsibilities:
- Involved in projects utilizing Java, JavaEE web applications to create fully-integrated client management systems
- Developed UI using HTML, JavaScript, JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML
- Designed, Developed and analyzed the front-end and back-end using JSP, Servlets and spring.
- Developed several Soap web services supporting XML to expose information from Customer Registration System.
- Created maven archetypes for generating fully functional Soap web services supporting XML message transformation.
- Implemented Log4j to log errors and messages for ease of debugging.
- Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in several production systems.
- Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables
- Normalized Oracle database, conforming to design concepts and best practices.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Wrote Servlets class to generate dynamic HTML pages.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Developed the XML Schema and Web services for the data maintenance and structures. Wrote test cases in JUnit for unit testing of classes.
- Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
- Debugged the application using Firebug to traverse the documents.
- Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database
- Involved in developing web pages using HTML and JSP.
- Involved in writing procedures, complex queries using PL-SQL to extract data from database and to delete the data and to reload the data Oracle database.
- Integrated SSRS Reports using various web parts into Share point and various delivery mechanisms
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
- Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
Environment: Java, XML, HTML, JavaScript, JDBC, CSS, PL/SQL, MS SQL Server Reporting Services, MS SQL Server Analysis Services, SQL Server 2008 (SSRS & SSIS), Oracle 10g, Web MVC, Eclipse, Ajax, JQuery, Log4j, Spring with Hibernate and Apache Tomcat.