Big Data Engineer Resume
Atlanta, GA
SUMMARY
- Around 6 years of experience in IT which includes Analysis, Design, Development of Big Data using Hadoop, design and development of applications using JAVA, J2EE and data base and data warehousing development using My SQL, Oracle, Teradata and Informatica.
- Having experience in Snowflake cloud data warehousing shared technology environment for providing stable infrastructure, architecture, best practices, secured environment, reusable generic frameworks, robust design architecture, technology expertise, best practices.
- Around 4 years of work experience on Big Data Analytics with hands on experience in installing, configuring and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Design dimensional model, data lake architecture, data vault 2.0 on Snowflake and used Snowflake logical data warehouse for compute.
- Experience in using Cloudera Manager for installation and management of single - node and multi-node Hadoop cluster (CDH4&CDH5).
- Extensive experience in migrating data from legacy systems into the AWS cloud Environment.
- Experience in analyzing data using Hive, Pig and custom MR programs in Java.
- Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
- Experienced in writing Map Reduce programs & UDF's for both Pig & Hive in java.
- Used in-memory analytics with Apache Spark on Amazon EMR (Elastic Map Reduce).
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Experience in integrating Hive and Hbase for effective operations.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Experience in Impala, Solr, MongoDB, HBase and Spark.
- Hands on knowledge of writing code in Scala.
- Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
- Processed the large datasets present in Amazon S3 using Apache Spark on Amazon EMR.
- Experience in Testing and documenting software for client applications.
- Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
- Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
- Good working knowledge on Spring Framework.
- Strong Experience in writing SQL queries.
- Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
- Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark
Programming Languages: Java (5, 6, 7), Python, Scala
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g, Snowflake
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell
ETL Tools: Cassandra, HBASE, ELASTIC SEARCH
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon EC2, AWS, Amazon EMR
TECHNICAL SKILLS
Confidential, Atlanta, GA
Big Data Engineer
Responsibilities:
- Engineeredprograms in Spark using Scala and Spark SQL for Data processing.
- Analyzed Customer Claims data to detect anomalies among the data.
- Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark,with Hive and SQL/Teradata.
- Used Spark API over Cloudera CDH 6.2 Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Created various hive external tables, staging tables and joined the tables as per the requirement.
- Written transformations and actions on data frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using SparkData Framesand Scala.
- Used control-m workload automation for orchestrating the application workflow.
- Created shell scripts for handling the handling the files upload in the HDFS directory and for submitting the spark jobs.
Environment: Cloudera CDH V6.2, Hadoop 3, Hive, Shell Scripting, Spark 2.4, Scala, Control-M, Eclipse, Maven.
Confidential, Irving, TX
Data Engineer
Responsibilities:
- Engineered Customer 360 view from analyzing different customer domain data using Hive, Spark and Oozie.
- Developed multiple POCs using PySpark and deployed machine learning models on the Yarn cluster.
- Design dimensional model, data lake architecture, data vault 2.0 on Snowflake and used Snowflake logical data warehouse for compute.
- Processing of incoming files using Spark native API.
- Used Spark API over Hortonworks Hadoop Cluster.
- Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
- Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Usage of Spark Streaming and Spark SQL API to process the files.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Processing the schema oriented and non-schema-oriented data using Scala and Spark.
- Developed and designed automate process using shell scripting for data movement and purging.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/Map Reduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Working on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Spark,VMG, Teradata MySQL,Apache Oozie, SFTP
Confidential, Plano, TX
Hadoop/Scala Developer
Responsibilities:
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Developed analytical component using Scala, Spark and Spark Stream.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in migration from Livelink to Sharepoint using Scala through Restful web service.
- Extensively involved in developing Restful API using JSON library of Play framework.
- Used Scala collection framework to store and process the complex consumer information.
- Used Scala functional programming concepts to develop business logic.
- Designed and implemented Apache Spark Application (Cloudera)
- Analyzing effected code line objects and design suitable algorithms to address problem.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract
- Used Zookeeper for providing coordinating services to the cluster.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
Environment: Scala,Spark Cloudera Manager, Pig, Sqoop, Zookeeper, Teradata, PL/SQL, MySQL, Windows, Hbase.
Confidential, Bellevue, WA
Bigdata Developer
Responsibilities:
- Developed Spark Programs for Batch processing and managing data coming from different sources.
- Developed Spark scripts by using Java, and Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Experienced in defining job flows managing and reviewing Hadoop log files
- Supported MapReduce Programs those are running on the cluster.
- Jobs management using Fair scheduler and Cluster coordination services through Zookeeper.
- Hands on Experience in Oozie Job Scheduling.
- Build dimensional modelling, data vault architecture on Snowflake.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Design dimensional model, data lake architecture, data vault 2.0 on Snowflake.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Involved in creating Hive tables, loading data and writing Hive queries.
- Imported and exported data into HDFS using Sqoop which includes incremental loading.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Flat files, Oracle 11g, PL/SQL, SQL, Sqoop, Snowflake.
Confidential, Morgantown, WV
Hadoop Developer
Responsibilities:
- Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Pig as ETL tool to do transformations, event joins and pre-aggregations before storing the data onto HDFS.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
- Created HBase tables to load large sets of structured data.
- Managed and reviewed Hadoop log files.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
- Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
- Used different file formats like Text files, Sequence Files, Avro.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
- Installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.
Confidential
Software Developer
Responsibilities:
- Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, extensively involved throughout Software Development Life Cycle (SDLC
- Implemented various J2EE standards and MVC framework involving the usage of AJAX and servlets for UI design.
- Used SOAP/ REST for the data exchange between the backend and user interface.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes.
- Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
- Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
- Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
- Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
- Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
- Developed authentication through LDAP by JNDI.
- Developed and debugged the application using Eclipse IDE.
- Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
- Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix.
- Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
- Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.
Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.