Data Engineer Resume Durham NC - Hire IT People

SUMMARY

Over 12 years of experience as anApplication, Database and Big Data developerwhich includes 4+ years of experience in Web Application development using Hadoop and related Big Data technologieswithin medical, pharmaceutical and recruitment industries.
Hands - on experience on big data and application development using Oracle, MongoDB, J2EE, Cloudera and Hortonworks Hadoop ecosystems technologies and distributed system
CCA175 Cloudera Certified Spark and Hadoop developer
Experience in Hadoop APIs (Spark Scala, P ySpark) and its ecosystem components (HDFS, HBase, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Pig, Spark, Hue)
Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node
Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language
Experience working in a fast paced agile environment including Scrum, XP and TDD. Good exposure to phases of SDLC, which includes analysis, development, testing and implementation, Unit Testing, Git, continuous integration and delivery using Jenkins
Hands-on development in RDBMS (Oracle, SQL Server) and NoSQL DB(MongoDB, HBase), Amazon AWS (EMR, S3, CLI, Redshift, Glue, Lambda, Kinesis), Data Warehouse, ETL, ELK, Unix shell scripting
Extensive professional experience in application support, testing, investigating complex data related issues. Proficient in ITIL Best Practices Framework, in addition to performance tracking and evaluation
Experience in analyzing the log files for Hadoop and ecosystem services and debug the issues

TECHNICAL SKILLS

Programming Technologies: Java, Scala, Python, SAS base, JavaScript, Bash, C-Shell, Perl,R,JQuery, J2EE, JSP, Servlet, EJB, Spring Boot, Struts, JDBC, Web Services (SOAP, WSDL), Rest API

Data platform and Data Science: Oracle,Postgres, Redshift, RDS, MongoDB, HBase, DynamoDB, MySQL, MS SQL Server, ELK, SAS, Machine Learning, SparkML, Scikit-learn, TensorFlow, Mahout

OS/Web/Cloud Platforms: Linux, UNIX, AWS,GCP, Windows Server, IIS, Apache, Node JS, CentOS

Hadoop ecosystem technologies: HortonWorks (2.6.5.0), Cloudera (5.8), HDFS, MapReduce, Spark, Pyspark, Scala, Hive, Sqoop, Pig, HBase, RDD, SparkSQL, DataFrame, Flume,ZooKeeper, Kafka, Impala, Oozie, Hue, Spark Streaming, Storm, Ambari, Yarn, Avro, Flink

Tools: Eclipse, Talend, Pycharm, CA Erwin, PDI, RStudio, Toad, Nifi, Tableau, QlikView

Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing

Project managmt/DevOps: Jira, AWS DevOps tools, Git, Jenkins, Maven, Ant

PROFESSIONAL EXPERIENCE

Confidential, Durham NC

Data Engineer

Responsibilities:

Bulk and incremental loads of large scale of data migrated from different sources to adatawarhouse
Developed modules for procurement, aggregation using SparkSQLon AWS EMR
Redesigning data pipeline and moving from Pentaho/Postgres to AWS Glue/Redshift for its data transformation pipeline.
Collected business requirements to set rules for proper data transfer from Data Source to Data Target in Data Mapping, ETL tools (AWS Glue/Pentaho) for loading data into the staging tables in the data marts.
Built event driven to trigger AWS lambda functions and Glue to call rest API and create a fully automated data cataloging and ETL pipeline to transform the data
Handled column-oriented files format Parquet, ORC for data storage
Implementedbulk and delta load processing using Glue, Spark SQL and Dataframe.
Application integration with Redshift Data Mart using pyspark.
Fetched data from various upstream applications and made it available for reporting in Redshift.
Created Python and UNIX shell scripts while interacting with different AWS services.
Qliksense the data from the Confidential Data Mart API
Performed, supported the Data loads on a daily basis to push them on AWS Data Lake
Implemented industry level best practices in defining and designing application architecture using different AWS techonologies and methodologies
Extensively used Agile methodology as the Organization Standard to implement ETL and cloud data warehouse best practices
Analyzed business requirements and cross-verified them with functionality and features of SQL/NOSQL databases using HBase,DynamoDB, Cassandra, Redshift to determine the optimal DB to migrate the data
Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3
Worked with Amazon Athena query service to analyze data from S3 with files format
Performed data analysis of large data sets using python pandas, numpy, multiprocessing and other libraries for data processing (multiprocessing reduced the time to process the data in AWS EC2)
Hands-on experience with Redshift Spectrum and AWS Athena query services for reading the data from S3 in different format and compression.
Implemented CI/CD pipelines of serverless AWS Glue ETL applications as a part of DevOps role using AWSdevelopers tools.

Environment: s:AWS (EMR(5.23), Lambda, Glue, Athena, API Gateway, S3, Redshift, Elastic Beanstalk, DynamoDB, Cloudwatch, CloudTrail, SNS, Kinesis, DMS, RDS), Pentaho Data Integration PDI, Hadoop 2.8.5, Scala, Java, Spark 2.4.3, Kafka,Sqoop, Hive, HBase,Impala, Presto, Zookeeper, Pig, CI/CDAWS DevOps, Jira, Postgres, Rest API, Cloudera v5.13, CA Erwin Data Modeler, Cassandra, Java, Python 3.7 (Pandas, Numpy), PySpark, Agile environement, Data warehouse, DataMart, Qlikview, Qliksense, Alation, Tableau, Zeppelin,Jupyter,linux(Redhat7), SAP BDO, MongoDB, PHP, Apache Server, Javascript

Confidential

Application and Big Data developer

Responsibilities:

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop suitable programs.
Engineered ETL data pipelines of Clickstream data from Adobedump data (Omniture)into HDFS, after cleansingtransformed into structured format and storedindata warehouse,to recommendmarket expansion by visualizing national bundles’ sales on Tableau
Integrated pipelines to upload, process and analysis data from Web log and job search log to provide weekly performance insights for the jobs posted to the marketing team
Improved the completion of an ETL load processes from 20 hours to 4 hours using Spark jobs and Sqoopcron jobs that allows to reduce customer business review process by 40% through Tableau self-served solution
Implemented procedures to move log files generated from various sources to HDFS for further processing through Flume1.5.2
Hand on writing Hive and Pig(0.16) jobs and extended their functionality using UDFs, UDTFs and UDAFs
Importing and exporting data from MongoDB using Hadoop connector or different RDBMS like MySQL, Oracle into HDFS(2.7.3), Hive (1.2.1)and HBase(1.1.2) using Sqoop(1.4.6)
Experienced in transferring data from different data sources into HDFS systems using Kafka(1.0.0)
Practical exposure on Hortonworks and Cloudera distributions
Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets and data loading and writing hive queries
Used OOZIE (4.2.0)Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra
Developed multiple MapReduce jobs in Java for data cleaning and pre-processing
Involved in working with Spark on top of YARN for interactive and batch analysis
Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
Worked with various HDFS file formats like Avro, Sequence File, ORC, Json
Created tables using Impala, and involved in creating Queries which are stored in HBase
Mentoring analyst and test team for writing Hive queries
Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files
Experience in analyzing the log files for Hadoop and ecosystem services and finding out the root cause
Optimizing Hive queries, improve performance by configuring Hive Query parameters
Implemented test scripts to support test driven development and continuous integration.
Worked on ORC file format, bucketing, partitioning for hive performance enhancement and storage improvement
Used MongoDb and Oracle as a datastore for message persistance

Technologies: HDFS, MapReduce, Spark, PySpark, Scala, Hive, Sqoop, Pig, HBase, RDD, SparkSQL, DataFrame, Flume, Kafka, Oozie, Ambari, HortonWorks HDP, Tableau, MongoDB, Shell scripting, Cassandra, Zookeeper

Database Administration and Developer

Confidential

Responsibilities:

Migrated of Oracle 11g to Amazon web service (AWS), database setup projects on Amazon AWS using EC2 instances and EBS volumes, setup NAT instances and Bastion hosts for connection to the EC2 instances
Optimized PL/SQL procedures, functions and packages using advanced database features like global temporary tables, table functions, collections, bulk loading techniques to improve performance.
Developed database Packages, Stored Procedures, and Triggers using PL/SQL, optimized SQL queries (hints),created and managedcron jobs using UNIX shell scripting,create indexes andused Oracle features such as Import/Export, SQL*Loader, collections, bulk loading techniques to improve performance of data loading and retrieving
DBA daily tasks (shell scripts, monitoring, tuning and troubleshooting queries and DB issues) on Oracle and MongoDB

Technologies: Oracle, MongoDB, Toad, Python, PyCharm, Linux Centos, Shell scripting (cron jobs), MongoDB Compass, Studio 3T, Json, MapReduce, JavaScript, AWS EC, PL/SQL

Java Developer

Confidential

Responsibilities:

Developed, enhanced and support of the Niche Network platform system that manages more than 40 private label job boards for associations and publications
Developed and customized on the Niche J2EE application the pay-per post for the job posting flow which create a new sales lead capture system delivering scrubbed and qualified leads, based on on-site activity, into the sales funnel, increasing leads by 200%
Developed and implemented procedures for encryption of sensitive information (users’ password) on database (Oracle and MongoDB) and on application levels
Implemented test cases and performed unit testing using Junit
Developed a batch post process of incoming xml FileFeed requests from the web service API to create new jobs which result in better and faster integration onATS
Achieved Payment Card Industry (PCI) compliance by redesigning and implementing a new e-commerce customer flow which result in higher customer trust and safety
Worked on refactoring the existing code for better maintainability, scalability and efficiency

Technologies: Agile, Java 8, Struts, Spring boot, Junit, Python, J2EE, XML, XSLT, JQuery, JavaScript, HTML, CSS3, EJB, JSP, JDBC, Servlet, Rest API, Oracle, MongoDB, PL/SQL, Git, AWS (EC2, S3), NetBeans, Pycharm, GlassFish, Apache, Omniture, Toad, Web Service, Jenkins, Bash (cron jobs),ELK Stack (ElasticSearch,LogStachs,Kibana)

Confidential

Clinical trials Data manager

Responsibilities:

Solid understanding of Phase I, II, and III of clinical trials from study start to database lock for RDC and Paper studies, including database design and clinical data management process
Development/Testing of Case Report Forms, Annotated CRFs, Edit Check Specification, Completion Guidelines and Data Handling Plan for paper or electronic studies, AE reconciliation, and database lock
Integrate, reviewed and reconcile uploaded data with oracle clinical and ensure readiness for analysis with SAS.
Knowledge of CDISC, GCP, ICH and FDA regulatory requirements applied to clinical data management.
Develop and maintain general data management standard operating procedures (SOPs) as well as study-specific SOPs and working practices related to the data management needs of the projects.
Track and process all SAEs according to the sponsor's regulatory requirements

Technologies: SAS 9.3, Oracle Clinical v4.5,SAS v9.3, SAS/MACRO, SAS/STAT, SAS/GRAPH, SAS/ODS, SAS/CONNECT and SAS/ACCESS, Windows, Linux

We provide IT Staff Augmentation Services!

Data Engineer Resume

Durham, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship