We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Over 9+ years of IT experience with extensive knowledge in Software Development Life Cycle (SDLC) involving Requirements Gathering, Architect, Design, Analysis, Development, Maintenance, Implementation and Testing.
  • Having experience in Hadoop Architect /Developer (with knowledge of Hive, Sqoop, MR, Storm, Pig, HBase, Flume, Spark).
  • Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
  • Experienced in application development using Java, J2EE, JDBC, spring, Junit.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Experienced in Big Data Hadoop Eco System including Map Reduce, Map reduce 2, YARN, flume, Sqoop, Hive, Apache Spark, Scala.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Expertise in Big Data Tools like Map Reduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP etc.
  • Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Selecting appropriate AWS services to design and deploy an application based on given requirements.
  • Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Expertise in NOSQL databases like HBase, MongoDB.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in data analysis, design and modeling using tools like ErWin.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
  • Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
  • Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in Shell programming.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS.
  • Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.

TECHNICAL SKILLS

Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume,Scala, Akka, Kafka, Storm.

Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery

No SQL Databases: Cassandra, mongo DB

Web Technologies: HTML, DHTML, XML, XHTML, JavaScript, CSS, XSLT, Dynamo DB

Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.

AWS: EC2, EMR, S3, ECSLanguages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Databases: Oracle 12c/11g/10g, Microsoft Access, MS SQL, Mongo DB.

Frameworks: MVC Struts, spring, Hibernate.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Network protocols: TCP/IP fundamentals, LAN and WAN.

PROFESSIONAL EXPERIENCE

Confidential

Big Data Engineer

Responsibilities:

  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exportingdatafrom Hive 2.0.0 tables into Netezza 7.2.x database.
  • Implemented theBigDatasolution using Hadoop, hive and Informatica 9.5.1 to pull/load thedata into the HDFS system.
  • Pulling thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets ofBigMachines applications.
  • Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB,Elasticsearch, Virtual Private Cloud (VPC).
  • Involved in Kafka and building use case relevant to our environment.
  • Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark 2.0.0 forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
  • Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
  • Provided thought leadership for architecture and the design ofBigDataAnalytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement aBigDatasolution.
  • Developed numerous MapReduce jobs in Scala 2.10.x forDataCleansing and AnalyzingDatain Impala 2.1.0.
  • CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
  • Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
  • Developed complete end to endBig-dataprocessing in Hadoop eco system.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Kafka and Storm for real time data injestion and processing.
  • Hands-on experience in developing integration withElasticsearchin any of the programming languages. Having knowledge of advance reporting usingElasticsearchand Node JS.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential - Seattle, WA

Sr. Big Data Engineer

Responsibilities:

  • DevelopedBigDatasolutions focused on pattern matching and predictive modeling.
  • Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Identify query duplication, complexity and dependency to minimize migration efforts

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

We'd love your feedback!