Big Data Engineer Resume
Austin, TexaS
SUMMARY:
- Above 10+ years of experience in IT industry, including Big data environment, Hadoop ecosystem and Design, Developing, Maintenance of various applications.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
- Expertise in core Java, JDBC and proficient in using Java API's for application development.
- Experience includes development of web based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle. .
- Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
- Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
- Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
- Good working experience in Application and web Servers like JBoss and Apache Tomcat.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools.
- Experience in installation, configuration, supporting and managing Hadoop clusters.
- Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Strong hands on experience with AWS services, including but not limited to EMR, S3, EC2, route 53, RDS, ELB, Dynamo DB, Cloud Formation, etc.
- Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies.
- Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans
- Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
- Experience in working with different data sources like Flat files, XML files and Databases.
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.
TECHNICAL SKILLS:
Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, Hbase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Platform: Amazon AWS, EC2, Redshift
Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
NoSQL Databases: HBase and MongoDB
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Web Technologies: JavaScript, CSS, HTML and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, Intellij, Toad and NetBeans.
PROFESSIONAL EXPERIENCE:
Confidential, Austin, Texas
Big Data Engineer
Responsibilities:
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Distributions like Hortonworks.
- Experienced in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
- Developed end to end architecture design on big data solution based on variety of business use cases
- Worked on end-to-end Hadoop implementation at large enterprise environment integrating with multiple legacy applications in heterogeneous technologies (Microsoft, Java, PowerBuilder, Oracle, SQL Server, Mainframe, GIS (Point Cloud), Sensors etc)
- Designed and implemented Hadoop ecosystem that will enable big data storage repository, data warehouse and data mart capabilities and business intelligence (BI) plus big data analytics
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion
- Worked on Amazon AWS - EMR, EC2, RDS, S3, Redshift, Hadoop, Hive, Pig, Sqoop, Oozie, Hbase, Flume, Spark Hadoop ecosystem (Hortonworks), Map Reduce, HC Catalog, Tez, Spark, Phoenix, Presto, Accumulo, Storm, Kafka, Falcon, Atlas, Ambari, Hue, Security - Kerberos, Ranger, Knox, Oracle ASO, HDFS encryption, AD/LDAP, hosting platform - AWS) .
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Provided an overall architect responsibilities including roadmaps, leadership, planning, technical innovation, security, IT governance, etc
- Designed, provided Layout, and Deployed Hadoop clusters in the cloud using Hadoop ecosystem & open Source platforms
- Implemented and designed geospatial big data ingestion, processing and delivery
- Provided cloud-computing infrastructure solutions on Amazon Web Services (AWS - EC2, VPCs, S3, IAM)
- Involved in installing Hadoop Ecosystem components (Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Zookeeper and HBase).
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Environment: AWS S3, RDS, EC2, Redshift, Hadoop 3.0, Hive 2.3, Pig, Sqoop 1.4.6, Oozie, Hbase 1.2, Flume 1.8, Hortonworks, MapReduce, Kafka, HDFS, Oracle 12c, Microsoft, Java, GIS, Spark 2.2, Zookeeper
Confidential, Rocky Hill, CT
Sr. Big Data Developer
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume
- Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF).
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Designed and developed UI screens using Struts, DOJO, JavaScript, JSP, HTML, DOM, CSS, and AJAX.
- Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
- Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
- Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice
- Managed and lead the development effort with the help of a diverse internal and overseas group.
Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.
Confidential, Tampa, FL
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Developed Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
- Developed scripts to automate the workflow of various processes using python and shell scripting.
- Collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
- Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
- Used UDF's to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
- Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
- Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS using Flume.
- Developed an end-to-end workflow to build a real time dashboard using Kibana, Elastic Search, Hive and Flume.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Using Oozie for designing workflows and scheduling various jobs in the Hadoop ecosystem.
- Developed Map Reduce programs in java for applying business rules on the data and optimizing them using various compression formats and combiners.
- Using SparkSQL to create data frames by loading JSON data and analyzing it.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, DB2, HBase, Mahout, Scala.
Confidential, Atlanta, GA
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- Custom Talend jobs to ingest and distribute data in Cloudera Hadoop ecosystem.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Used Hadoop Pig, Hive and Map Reduce for analyzing the data to help by extracting data sets for meaningful information
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Used Pig UDF's in Python, Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Created PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
- Implemented exception tracking logic using Pig scripts
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduce jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Implemented GUI screens for viewing using Servlets, JSP, Tag Libraries, JSTL, JavaBeans, HTML, JavaScript and Struts framework using MVC design pattern.
- Build, configured and deployed Web components on Web Logic application
- Application built on Java Financial platform, which is an integration of several technologies like Struts and Spring Web Flow.
- Used spring framework modules like Core container module, Application context module, Spring AOP module, Spring ORM and Spring MVC module.
- Developed the presentation layer using Model View Architecture implemented by Spring MVC .
- Performed Unit testing using JUnit.
- Used SVN as version control tools to maintain the code repository.
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, HBase, Flume, Cloudera, Oracle10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.
Confidential, Paso Robles, CA
Sr. Java/J2EE Developer
Responsibilities:
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
- Developed the J2EE application based on the Service Oriented Architecture.
- Used Design Patterns like Singleton, Factory, Session Facade and DAO.
- Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Worked with EJB(Session and Entity) to implement the business logic to handle various interactions with the database.
- Implemented a high-performance, highly modular, load-balancing broker in C with ZeroMQ and Redis.
- Used Spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Used Spring Inheritance to develop beans from already developed parent beans.
- Used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used SOAPLite module to communicate with different web-services based on given WSDL.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Modified the Spring Controllers and Services classes so as to support the introduction of Spring framework.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS.
- Used Aptana Studio and Sublime to develop and debug application code.
- Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
- Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Used Log4j utility to generate run-time logs.
- Wrote SAX and DOM XML parsers and used SOAP for sending and getting data from the external interface.
- Deployed business components into WebSphere Application Server.
- Developed Functional Requirement Document based on users' requirement.
Environment: Core Java, J2EE, JDK 1.6, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.
Confidential
Java Developer
Responsibilities:
- Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
- Gathered the requirements and designed the application flow for the application.
- Used HTML, JavaScript, JSF 2.0, AJAX and JSP to create the User Interface.
- Involved in writing Maven for building and configuring the application.
- Developed Action classes for the system as a feature of Struts.
- Performed both Server side and Client side Validations.
- Developed EJB component to implement business logic using Session and Message Bean.
- Developed the code using Core Java Concepts Spring Framework, JSP, Hibernate 3.0, JavaScript, XML and HTML.
- Used Spring Framework to integrate with Struts web framework, Hibernate.
- Extensively worked with Hibernate to connect to database for data persistence.
- Integrated Activate Catalog to get parts using JMS.
- Used Log4J log both User Interface and Domain Level Messages.
- Extensively worked with Struts for middle tier development with Hibernate as ORM and Spring IOC for Dependency Injection for the application based on MVC design paradigm.
- Created struts-config.xml file to manage with the page flow.
- Developed html views with HTML, CSS, and Java Script.
- Performed Unit testing for modules using Junit.
- Played an active role in preparing documentation for future reference and upgrades.
- Implemented the front end using JSP, HTML, CSS and JavaScript, JQuery, AJAX for dynamic web content.
- Worked in an Agile Environment used Scrum as the methodology wherein I was responsible for delivering potentially shippable product increments at the end of each Sprint.
- Involved in Scrum meetings that allow clusters of teams to discuss their work, focusing especially on areas of overlap and integration.
Environment: Java 1.4, JSP, Servlets, Java Script,, HTML 5, AJAX, JDBC, JMS, EJB, Struts 2.0, Spring 2.0, Hibernate 2.0, Eclipse 3.x, WebLogic9, Oracle 9i, Junit, Log4j