Sr. Big Data Engineer Resume
Redmond, WA
PROFESSIONAL SUMMARY:
- Over 7+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies.
- In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper.
- Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Good usage of Apache Hadoop along enterprise version of Cloudera and Hortonworks.
- Good Knowledge on MAPR distribution & Amazon's EMR.
- Good knowledge of Data modeling, use case design and Object - oriented concepts.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Good knowledge on spark components like Spark SQL, MLlib, Spark Streaming and GraphX
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Involved in integrating hive queries into spark environment using Spark Sql.
- Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
- Experience in using Flume to stream data into HDFS.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Hands on experience working on NoSQL databases including Hbase, Cassandra, MongoDB and its integration with Hadoop cluster & Kubernetes cluster.
- Proficient with Cluster management and configuring Cassandra Database.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
- Build AWS secured solutions by creating VPC with private and public subnets.
- Expertise in configuring Relational Database Service.
- Worked extensively in configuring Auto scaling for high Availability.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with spring and Hibernates frameworks for JAVA.
- Experience in using IDEs like Eclipse, NetBeans and Intellij.
- Proficient using version control tools like GIT, VSS, SVN and PVCS.
- Experience with web-based UI development using JQuery UI, JQuery, CSS, HTML, HTML5, XHTML and JavaScript.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
- Developed stored procedures and queries using PL/SQL.
- Hands on Experience with best practices of Web services development and Integration (both REST and SOAP).
- Experience in working with build tools like Ant, Maven, SBT, and Gradle to build and deploy applications into server.
- Expertise in Object Oriented Analysis and Design (OOAD) and knowledge in Unified Modeling Language (UML).
- Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile, Scrum models.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Spark Sql, Spark streaming, AWS, Azure Data lake
NoSQL Databases: Hbase, Cassandra, MongoDB
Cloud Platforms: AWS, EC2, EC3, MS Azure, Azure Data Lake
Build Management Tools: Maven, Apache Ant
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting
Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit
Version control: Github, Jenkins
IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ
Databases: Oracle 12c/11g, Confidential SQL Server2016/2014, DB2 & MySQL 4.x/5.x
Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)
Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5
WORK EXPERIENCE:
Confidential - Redmond, WA
Sr. Big Data Engineer
Responsibilities:
- As a Big data Engineer involved in Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Participated in Code Reviews, Enhancement discussion, maintenance of existing pipelines & systems, testing and bug-fix activities on-going basis.
- Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
- Interacted with ETL Team to understand Ingestion of data from ETL to Azure Data Lake to develop Predictive analytics.
- Built a prototype Azure Data Lake application that accesses 3rd party data services via Web Services.
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Created various Documents such as Source-To-Target Data mapping Document, Unit Test, Cases and Data Migration Document.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
- Worked with Azure ExpressRoute to create private connections between Azure datacenters and infrastructure for on premises and in co-location environment.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
- Build Data Sync job on Windows Azure to synchronize data from SQL 2012 databases to SQL Azure.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs using Java API.
- Wrote complex SQL and PL/SQL queries for stored procedures.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts .
- Created Azure Event Hubs for Application instrumentation and for User experience or work flow processing.
- Implemented Security in Web Applications using Azure and Deployed Web Applications to Azure.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Environment: Agile, Hive, MS Sql 2012, Sqoop, Azure Data Lake, Storm, Kafka, HDFS, AWS, Data mapping, Hadoop, YARN, MapReduce, RDBMS, Data Lake, Python, Scala, Dynamo DB, Flume, Pig
Confidential - Sunnyvale, CA
Big Data/Hadoop Developer
Responsibilities:
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Ingested data into HDFS using Sqoop, and written custom Input Adaptors (Network Adapter, FTP Adapter and S3 Adapter) and analyzed the data using Spark (Data frames and Spark-SQL), and series of Hive scripts to produce summarized results from Hadoop to downstream systems.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hive.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
- Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
- Involved in Hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Integrated Kafka with Spark streaming for real time data processing.
- Worked with NoSQL database HBase in getting real time data analytics using Apache Spark with both Scala and Python
- Closely worked with data science team in building Spark MLlib applications to build various predictive models.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Worked on custom Talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Creating Hive tables and working on them using HiveQL.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using Pyspark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
- Worked on Cluster co-ordination services through Zookeeper.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Creating the cube in Talend to create different types of aggregation in the data and also to visualize them.
Environment: Hadoop, HDFS, Spark, AWS, S3, Scala, Zookeeper, Map Reduce, Hive, Pig, Sqoop, HBase, Cassandra, MongoDB, Tableau, Java, Maven, UNIX Shell Scripting.
Confidential - Plano, TX
Sr. Java/Hadoop Developer
Responsibilities:
- Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Wrote shell scripts for Key Hadoop services like zookeeper, and also automated them to run by using CRON.
- Developed PIG scripts for the analysis of semi structured data.
- Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Designed and implemented MapReduce based large-scale parallel processing.
- Developed and updated the web tier modules using Struts 2.1 Framework.
- Modified the existing JSP pages using JSTL.
- Implemented Struts Validator for automated validation.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQl Server.
- Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
- Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
- Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Writing complex SQL and PL/SQL queries for stored procedures.
- Developed Reference Architecture for E-Commerce SOA Environment
- Used UDF's to implement business logic in Hadoop
- Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
- Used CVS for version controlling and JUnit for unit testing.
Environment: Eclipse, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell Scripting, MySQL, SQL.
Confidential - Gwinn, MI
Sr. Java/J2EE Developer
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the JDA Demand product.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Developed front-end screens using JSP, HTML, AJAX, JavaScript, ExtJs, JSON and CSS.
- Involved in overall system's support and maintenance services such as Bug Fixing, Enhancements, Testing and Documentation
- Developed persistence layer using ORM Hibernate for transparently store objects into database.
- Responsible for coding all the JSP, Servlets used for the Used Module.
- Developed the JSP, Servlets and various Beans using WebSphere server.
- Wrote Java utility classes common for all of the applications.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Implemented XSLT's for transformations of the xml's in the spring web flow.
- Developed POJO based programming model using spring framework.
- Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
- Handled Java multi threading part in back-end component, one thread will be running for each user, which serves that user.
- Used Web Services to connect to mainframe for the validation of the data.
- WSDL has been used to expose the Web Services.
- Participating in multiple WebEx sessions with clients/Support in the process of bug fixing.
- Developed stored procedures, Triggers and functions to process the data using PL/SQL and mapped it to Hibernate Configuration File and also established data integrity among all tables.
- Involved in the up gradation of WebLogic and SQL Servers.
- Participated in Code Reviews of other modules, documents, test cases.
- Performed unit testing using JUnit and performance and volume testing.
- Implemented UNIX Shell to deploy the application.
- Used Oracle database for data persistence.
- Log4j framework has been used for logging debug, info & error data.
- Extensively worked on UNIX operating systems.
- Used GIT as version control system.
- Implemented the Business Services and Persistence Services to perform Business Logic.
- Responsibilities include design for future user requirements by interacting with users, as well as new development and maintenance of the existing source code.
Environment: JDA, SDLC, JSP, HTML, AJAX, JavaScript, JSON, Backbone JS, XSLT's, xml's, spring framework, Java, Hibernate, JUnit, UNIX Shell, Oracle, Log4j framework, GIT and CSS
Confidential
Java/J2EE Developer
Responsibilities:
- Developed Interactive GUI screens using HTML, bootstrap and JSP and data validation using Java Script.
- Responsible for designing Rich user Interface Applications using Servlet, JavaScript, CSS, HTML, XHTML and AJAX.
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
- Developed code using Core Java to implement technical enhancement following Java Standards.
- Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
- Filling the requirement gaps and communicated with Analyst to fill those gaps.
- Established a JSON contract to make a communication between the JS pages and java classes.
- Implemented an asynchronous, AJAX and JQuery UI components based rich client to improve customer experience.
- Extensively used Maven to manage project dependencies and build management.
- Developed the UI panels using Spring MVC, XHTML, CSS, JavaScript and JQuery.
- Used Hibernate for object Relational Mapping and used JPA for annotations.
- Integrated Hibernate with Struts using Hibernate Template and uses provided methods to implement CRUD operations.
- Used JDBC and Hibernate for persisting data to different relational databases
- Established Database Connectivity using JDBC, Hibernate O/R mapping with ORM for MySQL Server.
- Involved in creating the tables using SQL and connectivity is done by JDBC.
- Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
- Wrote various SQL queries for data retrieval using JDBC.
- Wrote various stored procedures in PL/SQL and JDBC routines to update tables.
- Involved in building and parsing XML documents using SAX parser.
- Followed good coding standards with usage of JUnit, Easy Mock and Check style.
- Build/Integration tools and Deployment using Maven 2 and Jenkins.
- Consumed Web Services to interact with other external interfaces in order to exchange the data in the form of XML and by using SOAP.
- Involved in splitting of big Maven projects to small projects for easy maintainability.
- Involved in deploying and testing the application in JBoss application server.
Environment: GUI, HTML, bootstrap, JavaScript, Angular JS, JSP, AJAX, Struts, Servlets, java, Hibernate, JQuery, Maven, MVC, XHTML, CSS, JPA, CRUD, JDBC