Big Data Engineer Resume
SUMMARY
- Over 9+ years of IT experience with extensive knowledge in Software Development Life Cycle (SDLC) involving Requirements Gathering, Architect, Design, Analysis, Development, Maintenance, Implementation and Testing.
- Having experience in Hadoop Architect /Developer (with knowledge of Hive, Sqoop, MR, Storm, Pig, HBase, Flume, Spark).
- Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
- Experienced in application development using Java, J2EE, JDBC, spring, Junit.
- Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
- Experienced in Big Data Hadoop Eco System including Map Reduce, Map reduce 2, YARN, flume, Sqoop, Hive, Apache Spark, Scala.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Expertise in Big Data Tools like Map Reduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP etc.
- Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
- Selecting appropriate AWS services to design and deploy an application based on given requirements.
- Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
- Expertise in architecting Big data solutions using Data ingestion, Data Storage
- Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
- Expertise in NOSQL databases like HBase, MongoDB.
- Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
- Expertise in data analysis, design and modeling using tools like ErWin.
- Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
- Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
- Experienced in testing data in HDFS and Hive for each transaction of data.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experienced in using database tools like SQL Navigator, TOAD.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
- Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
- Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
- Good experience in Shell programming.
- Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS.
- Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.
TECHNICAL SKILLS
Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume,Scala, Akka, Kafka, Storm.
Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery
No SQL Databases: Cassandra, mongo DB
Web Technologies: HTML, DHTML, XML, XHTML, JavaScript, CSS, XSLT, Dynamo DB
Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.
AWS: EC2, EMR, S3, ECSLanguages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark
Databases: Oracle 12c/11g/10g, Microsoft Access, MS SQL, Mongo DB.
Frameworks: MVC Struts, spring, Hibernate.
Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.
Network protocols: TCP/IP fundamentals, LAN and WAN.
PROFESSIONAL EXPERIENCE
Confidential
Big Data Engineer
Responsibilities:
- Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
- Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Worked in exportingdatafrom Hive 2.0.0 tables into Netezza 7.2.x database.
- Implemented theBigDatasolution using Hadoop, hive and Informatica 9.5.1 to pull/load thedata into the HDFS system.
- Pulling thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
- Active involvement in design, new development and SLA based support tickets ofBigMachines applications.
- Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB,Elasticsearch, Virtual Private Cloud (VPC).
- Involved in Kafka and building use case relevant to our environment.
- Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark 2.0.0 forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
- Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
- Provided thought leadership for architecture and the design ofBigDataAnalytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement aBigDatasolution.
- Developed numerous MapReduce jobs in Scala 2.10.x forDataCleansing and AnalyzingDatain Impala 2.1.0.
- CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
- Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
- Developed complete end to endBig-dataprocessing in Hadoop eco system.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Proof-of-concept to determine feasibility and product evaluation of Big Data products
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Kafka and Storm for real time data injestion and processing.
- Hands-on experience in developing integration withElasticsearchin any of the programming languages. Having knowledge of advance reporting usingElasticsearchand Node JS.
- AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
- UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Design of Redshift Data model, Redshift Performance improvements/analysis
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
Confidential - Seattle, WA
Sr. Big Data Engineer
Responsibilities:
- DevelopedBigDatasolutions focused on pattern matching and predictive modeling.
- Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
- Experience on BI reporting with At Scale OLAP for Big Data.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
- Worked with Spark and Python.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
- Identify query duplication, complexity and dependency to minimize migration efforts
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.