We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 13 years of extensive experience on varied data ware housing technologies, Big data and Hadoop eco systems, Data modeling, Data Integration, Data quality, Data migration and OLAP reporting.
  • Extensive experience in Requirements gathering, Profiling, Design, Modeling, Development and Testing of Enterprise Data warehouse, Data Marts, ODS and Data Quality Frameworks.
  • 7 years of extensive Data engineering experience on Hadoop eco systems and Spark with hands - on in multiple Greenfield and migration projects in agile and waterfall methodologies.
  • Life time learner and quick to adapt new technologies.
  • Has Extensively performed Senior Data Engineer activities such as Data Ingestion, Data Profiling and Data Analysis in Enterprise Retail, Wholesale Credit Authorized Data Source
  • Has excellent noledge in Data Sourcing, Data Processing and Distribution.
  • Has extensively worked on activities like Requirements Gathering, Profiling and Analysis, Design, Development and Testing in Wholesale Credit data marts.
  • Extensive noledge in building new data pipelines, identify existing data gaps and provide automated solutions to deliver analytical capabilities and enriched data to applications.
  • Building scalable and reusable components using Scala and python for the mostly commonly used ETL operations like SourceOp, TargetOp, JoinerOp,SCD1,AggregatorOp,FilterOp.etc
  • Extensive usage of shell Scripting to integrate and schedule the jobs with schedulers like Autosys, crontab Has excellent noledge in AWS Cloud Services like S3, Glue, Redshift, Athena etc.
  • Proficient in writing Packages, Stored Procedures, Functions, Views and Database Triggers using T-SQL, Netezza and Oracle Sound noledge in Data warehousing concepts.
  • Dimensional data modeling, Relational data modeling, Data Aggregation, Star and snow flake schemas.
  • Has sound noledge on Hadoop Components like SQOOP, HIVE, BEELINE, SPARK SQL, OOZIE, HUE.
  • Developed reusable components for logging framework, workflow execution, purging and archival processes using Python, SQL and Scala.
  • Has worked on different file storage formats like S3,AVRO, PARQUET and TEXT format. Built CI/CD and auto deployment processes using shell scripting.
  • Has worked on versioning tools like SVN & BITBUCKET. Sound Knowledge on Teradata & Netezza Architecture Expertise in using Teradata utilities - TPT, Fast load, Multi load (Mload), BTEQ, etc
  • Expertise in using Netezza utilities - NZSQL, NZLOAD, NZMIGRATE Expertise in Informatica Power center Administration and Development, brought innovative thoughts into scalable products, which are being used till date.
  • Streamlined the Code merge and deployment issues for Informatica and OBIEE objects.
  • Profound experience in Loading from Flat files to Oracle using Oracle SQL* Loader Expert Knowledge on UNIX shell scripting.
  • Has excellent noledge in Python programming. Excellent noledge on Performance tuning for optimal performance in SQL Server, Oracle and spark3.0.Sound noledge on Collect statistics, join strategies, join types and Explain/Optimizer plans.
  • Possess excellent Presentation skills. Prepared presentations, Numerous data flow and process flow diagrams

TECHNICAL SKILLS

Bigdata and Hadoop: Ecosystem Apache Spark 3.0, CDH 5.8.3, HDFS 2.7.3, SPARK 2.0.0, Hive 2.0.0, Impala 2.7.0, Sqoop 1.4.6, Map Reduce,Oozie 3.1.0

RDBMS: SQL Server 2017,Oracle 12g,Netezza,Teradata14.11.0.1

Programming: Pyspark,Python, Scala,Shell Scripting, Bash, Python

Cloud Services: AWS Glue,EMR,Redshift,Athena

ETL Tools: Informatica Power Center 10.2,9.6.1, Administration & Development, Informatica Cloud Services, SSIS

IDE MS Visual: studio,Juypter, IntelliJ,PyCharm

Scheduling: Autosys,Crontab,Airflow

Reporting Tools: OBIEE 11.1.1.7.x,11.1.1.6.x,MSBI

EIM,DQ & DP Tools: Informatica Data Explorer, Informatica Data Quality, MDM

Other tools: GIT,JIRA,Bitbucket,Putty,WinSCP

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Senior Data Engineer

Responsibilities:

  • Worked on end to end process of requirements gathering through delivery for building the Reusable logging framework which can be embedded to all spark jobs to extract custom logging messages.
  • Used the datamover framework to integrate with log4j logging and customized it to generate messages related from the job flow, at every stage of the flow.
  • Wrote scala APIs to customize the logger classes and integrated in the various operations like source,Target,
  • filter,joiner(transformations).
  • Wrote Scala API for generated the complete log information with different levels of depth, based on the loglevel, create a csv and finally save to a hive table for auditing.
  • Worked on the ETL data pipelines for the batch processing, involving various transformations.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Developed PySpark and SparkSQL code to process the data frames in Apache Spark .
  • Created data quality job in python to compare two dataframes .

Environment: Spark 2.3.2, Spark SQL, Python 3.0, Pyspark, SQL Server, Scala, Hive, Git, Linux,Shell Scripting.

Confidential - Charlotte, NC

Sr. Big data Developer

Responsibilities:

  • Requirements gathering with business and defining low level design documents.
  • Re-Architecting current Sourcing and Distribution System from Talend, and Netezza to Hadoop SPARK.
  • Working with large data sets for performing sourcing from desperate source systems like Teradata, Flat files, Sql server, SFTP pulls, trigger based.
  • Developed the end-to end process for building the active customers for the wholesale credit risk application, by performing various transformations for building the Real,Local,Secured Customers, Inactive customers.
  • Performed the CDC process for the customers for both the daily and monthly flows.
  • Develop wholesale credit risk application using Scala, hive.
  • Contribute to the proprietary big data framework for data processing that Confidential uses across multiple teams using big data technologies like Hadoop, Scala, Python, Hive, Impala etc.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Analyze issues in Production environment of the Credit risk application during daily/monthly batch run. Provide technical solutions to the defects raised during the batch.
  • Building the DAG flow for the execution of the jobs in parallel for the spark jobs.
  • Creating conf(configuration) file for sequential execution of the jobs, creating persistent tables in the process, where required.
  • Creating various data frames to hop the data in the process of extraction, standardization, transformation, and loading it to the target.
  • Writing shell scripts for the ETL, deployment jobs. Creating autosys jils for scheduling the command jobs, box jobs, establishing the dependency between the jobs.
  • Updating the code repository (bit bucket) and maintain gloden copy of code.
  • Build minimum viable products, based on the tasks in the sprint, using big data technologies.
  • Maintained the standards during coding for easy understanding and maintenance Debugging the failures and finding out technical solutions for the bugs.
  • Performance tuned the processes by identifying the bottle necks and ensured quick job executions
  • Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to clean useful data and improve vendor negotiations.
  • Involved in the complete SDLC of multiple assignments, starting from requirements gathering, FSD, design, develop, testing, deployment and production support.
  • Developed various mappings using transformations like HTTP, Web service Consumer, Application Source Qualified, Aggregator, Filter, Expression, Lookup, SQL, Router and Update Strategy transformations.
  • Developed Integration solution to implement an Access Management solution which will allow synchronizing Salesforce Access Control with IES access control and pave way for the cross-platform unified experience for user management.
  • Developed SCD mappings to identify the new, modified and disabled records between the LDAP and Salesforce systems.

Environment: Informatica Power center 9.6.1 and 10.2, Salesforce dotcom, Salesforce marketing Cloud, LDAP, Linux, SQL Server. SPARK 2.0.0, Hive 2.0.0, Impala 2.7.0, Sqoop 1.4.6

Confidential, Charlotte, NC

Sr. Application Developer

Responsibilities:

  • Requirements gathering with business and defining low level design documents.
  • Re-Architecting current Sourcing and Distribution System from Talend, and Netezza to Hadoop Haas.
  • Worked on structured data and semi structured data with daily incremental loads of 1 TB in size and monthly, quarterly loads of several TBs.
  • Developed the Sourcing logic, which includes Data staging, cleansing, Standardizing, Archiving and purging logic through Pig, Sqoop, hive, Oozie workflows for multiple SORs in financial domains.
  • Optimized Hive 2.0.0 scripts to use HDFS efficiently by using various compression mechanisms.
  • Extracted the data from Netezza, Teradata and flat files into the HDFS using SQOOP.
  • Worked on Fast load and Fast Export to perform data movement from One Environment to other.
  • Wrote BTEQ scripts to transform data from Netezza to the Teradata staging environment.
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Creating Frame Work for Data Quality to Comply with Banks Data Governance Team (EDM) in establishing Peaks Application as an Authorized Data Source (ADS) for Confidential .
  • Created many Environments like Dev., SIT, UAT and Pre-Prod to Support Quality Assurance.
  • Created Automation scripts for the refresh of UAT, SIT and PREPROD Hadoop environments, for source code, ddls, environment variables and configuration.
  • Worked with Business and QA teams in resolve Issues during testing and providing explanation for gaps/differences between Prod and Other Lanes.
  • Used Netezza and Teradata Windowing functions and load Utilities. Co-ordinated with offshore team for timely execution of Projects.

Environment: Informatica Power center 9.5.1, Talend, Netezza 14.11.0.1, SQOOP, HIVE, FLUME, OOZIE, PIG, RHEL Linux, SQL Server.

We'd love your feedback!