Senior Data Engineer Resume Charlotte, NC - Hire IT People

SUMMARY

Over 13 years of extensive experience on varied data ware housing technologies, Big data and Hadoop eco systems, Data modeling, Data Integration, Data quality, Data migration and OLAP reporting.
Extensive experience in Requirements gathering, Profiling, Design, Modeling, Development and Testing of Enterprise Data warehouse, Data Marts, ODS and Data Quality Frameworks.
7 years of extensive Data engineering experience on Hadoop eco systems and Spark with hands - on in multiple Greenfield and migration projects in agile and waterfall methodologies.
Life time learner and quick to adapt new technologies.
Has Extensively performed Senior Data Engineer activities such as Data Ingestion, Data Profiling and Data Analysis in Enterprise Retail, Wholesale Credit Authorized Data Source
Has excellent noledge in Data Sourcing, Data Processing and Distribution.
Has extensively worked on activities like Requirements Gathering, Profiling and Analysis, Design, Development and Testing in Wholesale Credit data marts.
Extensive noledge in building new data pipelines, identify existing data gaps and provide automated solutions to deliver analytical capabilities and enriched data to applications.
Building scalable and reusable components using Scala and python for the mostly commonly used ETL operations like SourceOp, TargetOp, JoinerOp,SCD1,AggregatorOp,FilterOp.etc
Extensive usage of shell Scripting to integrate and schedule the jobs with schedulers like Autosys, crontab Has excellent noledge in AWS Cloud Services like S3, Glue, Redshift, Athena etc.
Proficient in writing Packages, Stored Procedures, Functions, Views and Database Triggers using T-SQL, Netezza and Oracle Sound noledge in Data warehousing concepts.
Dimensional data modeling, Relational data modeling, Data Aggregation, Star and snow flake schemas.
Has sound noledge on Hadoop Components like SQOOP, HIVE, BEELINE, SPARK SQL, OOZIE, HUE.
Developed reusable components for logging framework, workflow execution, purging and archival processes using Python, SQL and Scala.
Has worked on different file storage formats like S3,AVRO, PARQUET and TEXT format. Built CI/CD and auto deployment processes using shell scripting.
Has worked on versioning tools like SVN & BITBUCKET. Sound Knowledge on Teradata & Netezza Architecture Expertise in using Teradata utilities - TPT, Fast load, Multi load (Mload), BTEQ, etc
Expertise in using Netezza utilities - NZSQL, NZLOAD, NZMIGRATE Expertise in Informatica Power center Administration and Development, brought innovative thoughts into scalable products, which are being used till date.
Streamlined the Code merge and deployment issues for Informatica and OBIEE objects.
Profound experience in Loading from Flat files to Oracle using Oracle SQL* Loader Expert Knowledge on UNIX shell scripting.
Has excellent noledge in Python programming. Excellent noledge on Performance tuning for optimal performance in SQL Server, Oracle and spark3.0.Sound noledge on Collect statistics, join strategies, join types and Explain/Optimizer plans.
Possess excellent Presentation skills. Prepared presentations, Numerous data flow and process flow diagrams

TECHNICAL SKILLS

Bigdata and Hadoop: Ecosystem Apache Spark 3.0, CDH 5.8.3, HDFS 2.7.3, SPARK 2.0.0, Hive 2.0.0, Impala 2.7.0, Sqoop 1.4.6, Map Reduce,Oozie 3.1.0

RDBMS: SQL Server 2017,Oracle 12g,Netezza,Teradata14.11.0.1

Programming: Pyspark,Python, Scala,Shell Scripting, Bash, Python

Cloud Services: AWS Glue,EMR,Redshift,Athena

ETL Tools: Informatica Power Center 10.2,9.6.1, Administration & Development, Informatica Cloud Services, SSIS

IDE MS Visual: studio,Juypter, IntelliJ,PyCharm

Scheduling: Autosys,Crontab,Airflow

Reporting Tools: OBIEE 11.1.1.7.x,11.1.1.6.x,MSBI

EIM,DQ & DP Tools: Informatica Data Explorer, Informatica Data Quality, MDM

Other tools: GIT,JIRA,Bitbucket,Putty,WinSCP

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Senior Data Engineer

Responsibilities:

Worked on end to end process of requirements gathering through delivery for building the Reusable logging framework which can be embedded to all spark jobs to extract custom logging messages.
Used the datamover framework to integrate with log4j logging and customized it to generate messages related from the job flow, at every stage of the flow.
Wrote scala APIs to customize the logger classes and integrated in the various operations like source,Target,
filter,joiner(transformations).
Wrote Scala API for generated the complete log information with different levels of depth, based on the loglevel, create a csv and finally save to a hive table for auditing.
Worked on the ETL data pipelines for the batch processing, involving various transformations.
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Developed PySpark and SparkSQL code to process the data frames in Apache Spark .
Created data quality job in python to compare two dataframes .

Environment: Spark 2.3.2, Spark SQL, Python 3.0, Pyspark, SQL Server, Scala, Hive, Git, Linux,Shell Scripting.

Confidential - Charlotte, NC

Sr. Big data Developer

Responsibilities:

Requirements gathering with business and defining low level design documents.
Re-Architecting current Sourcing and Distribution System from Talend, and Netezza to Hadoop SPARK.
Working with large data sets for performing sourcing from desperate source systems like Teradata, Flat files, Sql server, SFTP pulls, trigger based.
Developed the end-to end process for building the active customers for the wholesale credit risk application, by performing various transformations for building the Real,Local,Secured Customers, Inactive customers.
Performed the CDC process for the customers for both the daily and monthly flows.
Develop wholesale credit risk application using Scala, hive.
Contribute to the proprietary big data framework for data processing that Confidential uses across multiple teams using big data technologies like Hadoop, Scala, Python, Hive, Impala etc.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Analyze issues in Production environment of the Credit risk application during daily/monthly batch run. Provide technical solutions to the defects raised during the batch.
Building the DAG flow for the execution of the jobs in parallel for the spark jobs.
Creating conf(configuration) file for sequential execution of the jobs, creating persistent tables in the process, where required.
Creating various data frames to hop the data in the process of extraction, standardization, transformation, and loading it to the target.
Writing shell scripts for the ETL, deployment jobs. Creating autosys jils for scheduling the command jobs, box jobs, establishing the dependency between the jobs.
Updating the code repository (bit bucket) and maintain gloden copy of code.
Build minimum viable products, based on the tasks in the sprint, using big data technologies.
Maintained the standards during coding for easy understanding and maintenance Debugging the failures and finding out technical solutions for the bugs.
Performance tuned the processes by identifying the bottle necks and ensured quick job executions
Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to clean useful data and improve vendor negotiations.
Involved in the complete SDLC of multiple assignments, starting from requirements gathering, FSD, design, develop, testing, deployment and production support.
Developed various mappings using transformations like HTTP, Web service Consumer, Application Source Qualified, Aggregator, Filter, Expression, Lookup, SQL, Router and Update Strategy transformations.
Developed Integration solution to implement an Access Management solution which will allow synchronizing Salesforce Access Control with IES access control and pave way for the cross-platform unified experience for user management.
Developed SCD mappings to identify the new, modified and disabled records between the LDAP and Salesforce systems.

Environment: Informatica Power center 9.6.1 and 10.2, Salesforce dotcom, Salesforce marketing Cloud, LDAP, Linux, SQL Server. SPARK 2.0.0, Hive 2.0.0, Impala 2.7.0, Sqoop 1.4.6

Confidential, Charlotte, NC

Sr. Application Developer

Responsibilities:

Requirements gathering with business and defining low level design documents.
Re-Architecting current Sourcing and Distribution System from Talend, and Netezza to Hadoop Haas.
Worked on structured data and semi structured data with daily incremental loads of 1 TB in size and monthly, quarterly loads of several TBs.
Developed the Sourcing logic, which includes Data staging, cleansing, Standardizing, Archiving and purging logic through Pig, Sqoop, hive, Oozie workflows for multiple SORs in financial domains.
Optimized Hive 2.0.0 scripts to use HDFS efficiently by using various compression mechanisms.
Extracted the data from Netezza, Teradata and flat files into the HDFS using SQOOP.
Worked on Fast load and Fast Export to perform data movement from One Environment to other.
Wrote BTEQ scripts to transform data from Netezza to the Teradata staging environment.
Implemented authentication using Kerberos and authentication using Apache Sentry.
Creating Frame Work for Data Quality to Comply with Banks Data Governance Team (EDM) in establishing Peaks Application as an Authorized Data Source (ADS) for Confidential .
Created many Environments like Dev., SIT, UAT and Pre-Prod to Support Quality Assurance.
Created Automation scripts for the refresh of UAT, SIT and PREPROD Hadoop environments, for source code, ddls, environment variables and configuration.
Worked with Business and QA teams in resolve Issues during testing and providing explanation for gaps/differences between Prod and Other Lanes.
Used Netezza and Teradata Windowing functions and load Utilities. Co-ordinated with offshore team for timely execution of Projects.

Environment: Informatica Power center 9.5.1, Talend, Netezza 14.11.0.1, SQOOP, HIVE, FLUME, OOZIE, PIG, RHEL Linux, SQL Server.

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship