Sr. Tech lead (Data engineer) Resume

SUMMARY

8+ years of IT experience which includes 5 years of experience in Bigdata that involves analysis, design, coding, testing and implementation of Hadoop components like Hadoop Framework, Map Reduce Programming, Pig, Hive, HBASE, SCALA, SPARK, Flume, Sqoop, YARN, IMPALA.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Hands on experience in installing, configuring, and using Apache Hadoop ecosystem components like Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
Extending Hive and Pig core functionality by writing customUDFs.
Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
Knowledge in job/workflow scheduling and monitoring tools like Oozie& Zookeeper.
Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA.
Created AWS VPC network for the installed Instances and configured security groups and Elastic IP’s Accordingly
Passionate about Hadoop and Big Data, Horton Works, Cloudera technology.
Hands on experience with AWS Databases such as RDS(Aurora), Redshift and DynamoDB
Expertise in creating Hive Internal/External tables, Hive's analytical functions views and writing scripts in HiveQL.
Experience in NoSQL database HBase.
Experience in Creating Terraform scripts for EC2 Instances, Redshift, Athena, Elastic Load Balancers and S3 Buckets.
Hands on experience in application development using Java, SCALA, SPARK, RDBMS, and Linux shell scripting.
Expertise in Amazon AWS concepts like Athena, EMR and EC2 web services which provides fast and efficient processing of Big Data.
Performed data analysis using Hive and Pig.
Loaded streaming log data from various web servers into HDFS using Flume.
Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
Document and explain implemented processes and configurations in upgrades.
Support development, testing, and operations teams during new system deployments.
Strong working knowledge on Collections, Multithreading, and other core, advanced Java concepts
Experience in client-side Technologies such as HTML, DHTML, CSS, JavaScript, AJAX, jQuery, JSON.
Basic knowledge of using Confidential WMB 6.5 and 7.1 and work flows and nodes.
Experience in using database applications of RDBMS in ORACLE 8i, DB2 and MS Access, SQL Server.
Very good understanding of EDI/XML/Rosetta Net implementation guidelines
Well experienced in testing huge and complex databases, Reporting and ETL tools like Informatica and Data Stage.
Well experienced in testing data loads, data transformation and data quality.
Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
Substantial experience in writing Map Reduce jobs in Java, PIG, Flume, Zookeeper, Hive and Storm
Created multiple Map Reduce Jobs using Java API, Pig and Hive for data extraction
Strong expertise in troubleshooting and performance fine-tuning Spark, Map Reduce and Hive applications
Good experience on working with Amazon EMR framework for processing data on EMR and EC2 instances
Created AWS VPC network for the installed Instances and configured security groups and Elastic IP’s Accordingly
Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
Worked on data warehousing and ETL tools like Informatica, Tableau, and Pentaho
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
Experience with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, cloudera, MapReduce, Hive, Pig, Sqoop, Solr, Flume, Oozie, and Zookeeper, Python.

No SQL Databases: Hbase, mongo DB, Cassandra

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, HiveQL, Perl, Unix shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Spring XD, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Version control: SVN, CVS

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Development Methodologies: Agile/ Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Sr. Tech lead (Data engineer)

Responsibilities:

Build spark jobs for data transformation and aggregation.
Developed Spark Programs using two programming languages Scala and python.
Build Data pipeline using python spark
Using EC2 for load balancing and traffic also using EMR triggering the data for spark DF transformation and hide data using hive query
Build python wrappers for health care analytical products developed in java spark.
Contributed the development of common framework, which is going to be used across all Pyspark applications.
Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc. integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Scala, Nifi etc.
Produce junit tests for spark transformations and helper methods.
Clean, transform and analyze vast amounts of Raw data from various systems using spark to provide ready-to-use data to all developers and business analysts.
Experienced with Microsoft Azure Enhance and Implement Backup and Disaster Recovery.
Developed Spark scripts to import large files from Amazon S3 buckets.
Experience in physical table design in Big Data environment
Experience with full SDLC lifecycle, Lean / Agile development methodologies

Environment: Hadoop 2.7, Python, Scala, SQL, Windows 10, Pycharm IDE, IntelliJ IDE, Java, Cloud Technologies, AWS, Redshift, S3, EC2, Athena, RDBMS, EMR, JDBC, Junit, Minaconda 4.5.11, Virtualenv, Spark 2.3.1

Confidential

Pyspark Developer/Java Spark Developer

Responsibilities:

Build spark jobs for data transformation and aggregation.
Developed Spark Programs using two programming languages Java and python.
Designed data Processing Pipelines and integrated in common prep pipeline.
Designed feature functionality using Java Spark for high performance and robust data.
Build custom hybrid software delivery Framework that draws from principals to learn.
Build Data pipeline using java spark
Build python wrappers for health care analytical products developed in java spark.
Contributed the development of common framework, which is going to be used across all Pyspark applications.
Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc. integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Scala, Nifi etc.
Produce junit tests for spark transformations and helper methods.
Clean, transform and analyze vast amounts of Raw data from various systems using spark to provide ready-to-use data to all developers and business analysts.
Produce Pytest to make sure functionality as we build for python wrappers to get all feature functionality created in java spark for analytical products.
Experienced with data masking, data obfuscation, data creation of sensitive fields and loading masking data into non- production environments
Experienced with Microsoft Azure Enhance and Implement Backup and Disaster Recovery.
Developed Spark scripts to import large files from Amazon S3 buckets. code and develop custom elastic search java-based wrapper client using the “JEST” API
Infrastructure Development on AWS by employing services such as EC2, RDS, Redshift, Cloud Front, Cloud Watch, VPC, etc.
Worked on Nifi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
Worked with Management frameworks and Cloud Administration tools.

Environment: Hadoop 2.7, Python, Pytest, SQL, Windows 10, Pycharm IDE, IntelliJ IDE, Java, Cloud Technologies, AWS, Redshift, S3, EC2, JDBC, Junit, Minaconda 4.5.11, Virtualenv, Spark 2.3.1, pandas 0.23.4, NumPy, Git cli, DB2, Oracle, Groovy, Azure SQL, Databricks, ML

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship