Big Data Engineer Resume

SUMMARY

Over 9+ years of IT experience with extensive knowledge in Software Development Life Cycle (SDLC) involving Requirements Gathering, Architect, Design, Analysis, Development, Maintenance, Implementation and Testing.
Having experience in Hadoop Architect /Developer (with knowledge of Hive, Sqoop, MR, Storm, Pig, HBase, Flume, Spark).
Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
Experienced in application development using Java, J2EE, JDBC, spring, Junit.
Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
Experienced in Big Data Hadoop Eco System including Map Reduce, Map reduce 2, YARN, flume, Sqoop, Hive, Apache Spark, Scala.
Excellent understanding of Hadoop architecture and underlying framework including storage management.
Expertise in Big Data Tools like Map Reduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP etc.
Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
Selecting appropriate AWS services to design and deploy an application based on given requirements.
Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
Expertise in architecting Big data solutions using Data ingestion, Data Storage
Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
Expertise in NOSQL databases like HBase, MongoDB.
Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
Expertise in data analysis, design and modeling using tools like ErWin.
Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
Experienced in testing data in HDFS and Hive for each transaction of data.
Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
Experienced in using database tools like SQL Navigator, TOAD.
Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
Good experience in Shell programming.
Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS.
Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.

TECHNICAL SKILLS

Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume,Scala, Akka, Kafka, Storm.

Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery

No SQL Databases: Cassandra, mongo DB

Web Technologies: HTML, DHTML, XML, XHTML, JavaScript, CSS, XSLT, Dynamo DB

Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.

AWS: EC2, EMR, S3, ECSLanguages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Databases: Oracle 12c/11g/10g, Microsoft Access, MS SQL, Mongo DB.

Frameworks: MVC Struts, spring, Hibernate.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Network protocols: TCP/IP fundamentals, LAN and WAN.

PROFESSIONAL EXPERIENCE

Confidential

Big Data Engineer

Responsibilities:

Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move the data from Staging to main tables
Worked in exportingdatafrom Hive 2.0.0 tables into Netezza 7.2.x database.
Implemented theBigDatasolution using Hadoop, hive and Informatica 9.5.1 to pull/load thedata into the HDFS system.
Pulling thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
Active involvement in design, new development and SLA based support tickets ofBigMachines applications.
Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB,Elasticsearch, Virtual Private Cloud (VPC).
Involved in Kafka and building use case relevant to our environment.
Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark 2.0.0 forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
Provided thought leadership for architecture and the design ofBigDataAnalytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement aBigDatasolution.
Developed numerous MapReduce jobs in Scala 2.10.x forDataCleansing and AnalyzingDatain Impala 2.1.0.
CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
Build Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
Developed complete end to endBig-dataprocessing in Hadoop eco system.
Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
Proof-of-concept to determine feasibility and product evaluation of Big Data products
Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Used Kafka and Storm for real time data injestion and processing.
Hands-on experience in developing integration withElasticsearchin any of the programming languages. Having knowledge of advance reporting usingElasticsearchand Node JS.
AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard
Involved in developing Map-reduce framework, writing queries scheduling map-reduce
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Design of Redshift Data model, Redshift Performance improvements/analysis
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Performed File system management and monitoring on Hadoop log files.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Developed customized classes for serialization and Deserialization in Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential - Seattle, WA

Sr. Big Data Engineer

Responsibilities:

DevelopedBigDatasolutions focused on pattern matching and predictive modeling.
Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
Experience on BI reporting with At Scale OLAP for Big Data.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive.
Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
Worked with Spark and Python.
Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
Identify query duplication, complexity and dependency to minimize migration efforts

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship