We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

CharlettE

SUMMARY

  • Overall 11 experience in Developent/Support application using Hadoop/Scala/python/RPA Automation.
  • Certified CCA Spark and Hadoop Developer . CCA - 175 Certified.
  • Certified Certified Aws Solution Architect.
  • Certified Certified Scrum Master.
  • Domain Experts in Supply Chain, Energy Trasfer and Storage sytem.
  • Expertise in creating data wraggling and data cleaning with data dictionary of model.
  • Expertise with the tools in Hadoop Ecosystem including Spark, Map-Reduce Hive, Airflow, Impala, HDFS, Zoo- Keeper, Sqoop, Flume, HBase.
  • Substantial experience in Spark 3.o integration with Kafka 2.4
  • Experience in working Lambda architecture for design ETL batch and Streaming pipeline.
  • Sustaining the BigQuery, PySpark and Hive code by fixing the bugs and providing the enhancements required by the Business User.
  • Working with AWS/Google cloud using in S3 Bucket, Athena, Aws ETL Glue, Step Function, Lambda Function, Redshift, Data Lake, RDS and EC2 Instance with EMR cluster. Knowledge of GCP cloud as well
  • Knowledge of Cloudera plattform & Apache Hadoop 0.20. version.
  • Very good exposure in OLAP and OLTP.
  • Experienced in managing on-shore and off-shore teams, which includes hiring, mentoring, and in dealing with performance appraisals of team members.
  • Extensive experience in SDLC, STLC process development and implementation
  • Worked in Core java application development and maintence supoort of AMS.
  • Project Management level activity and Audit Like (CMMI,Lean & Project Level Configuration Audit (IPWC)).

TECHNICAL SKILLS

Hadoop/Big - Data: HDFS,Hive,Sqoop,Flume & Zookeper (Cloudera Plattform) Spark 2.0, Data Frame & Spark SQL, Impala, AirFlow, Stone Branch, Nifi.

Python /Scala Technologies: Python, Pandas & Java, PCA,Dimension Reduction,TSNE,CDF,Regression & Classification, Navi Byes, KNN

Cloud: AWS/Gcp Strorage, S3 Bucket & EC2, Route53, RDS, EMR, CICD AWS., RDS, Redshift, Kinesis API,Cloud Spanner,Big Query,Big Table.

Automation Tool: RPA Automation (UI PATH & Win Automation ) & Tool, Maven, Jenkins, GIT & AWS Cloud CICD.

Servers: TOMCAT 5.0,6.0, Web Logic, WebSphere 7.0, 6.1.

Database: SQL,MYSQL,DB2,SQLDBX,ORACLE 9I, 10G

OS: DOS, Windows 98, 2000/NT,UNIX.

Tool: Putty,SSh,Filezila,Winscp,ManageNow,IT2B,VSS, & RCC. Build Fordge, Informatica vesioning tool, Tivoli, ICD Tool, Service now, Splunk,Jira

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

  • Working as Developer in hive and impala for more parallel processing data in Cloudera systems.
  • Working in big data technologies like spark 2.3,& 3.0 scala, Hive, Hadoop cluster (Cloudera platform).
  • Making a data pipelining with help Data Fabric job,SQOOP,SPARK,scala and KAFKA. Parallel working in data side oracle and MYSQL server for data designing to source to target.
  • Worked on Nested struct jason for parsing and do flatten and exploed of complex file structure.
  • Design & implement Spark Sql tables, Hive scripts job with stone branch for scheduling and create work flow and task flow.
  • We generally used partitions and bucketing for data in hive to get query faster. This part of hive optimization
  • Wide experience in Aws with lambda component for design and triggering.
  • Used step function for desigfn and create workflow sheduling and implement ETL pipeline.
  • Closely work on AWS serice catalog and EMR services with spark transformation with redshift database.
  • Write programs using Spark to move data from Storage input location to output location by running data loading, validation, and transformation to the data
  • Used scala function, dictionary and data structure (array, list, map) for better code reusability
  • Based on Development, we need to do the Unit Testing.
  • Prepare the Technical Release Notes (TRN) for the application deployment into the DEV/STAGE/PROD environment.

Environment: HDFS,Hive, Spark,Linux,Kafka, python,Stonebranch, Cloudera, Oracle11g/10g, PL/SQL,Unix,Json and Parquet File systems.

Confidential, Charlette

Big Data Engineer

Responsibilities:

  • Analyzing the issue and doing Impact analysis for the same
  • Data Ingest from Sqoop & datafabrics jobs from Orcale,DB2 and salesforce
  • Work on implementing various stages of Data Flow in the Hadoop ecosystem - Ingestion, Processing, Consumption
  • Responsible for wide-ranging data ingestion using Sqoop and HDFS commands. Accumulate ‘partitioned’ data in various storage formats like text, Json, Parquet, etc. Involved in loading data from LINUX file system to HDFS
  • Storing Data Files in S3 Buckets daily basis. Using EC2, EMR & S3, Redshift to develop and maintain AWS cloud base solution.
  • Start working with AWS for storgae and halding for tera byte of data for customer BI Reporting tools
  • Write programs using Spar 2.4 for creating data frame and proccess data with transformation and actions.
  • Working on tickets opened by users regarding various incidents, requests
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Automated all the jobs for scheduling wise we used CA7 jobs and check the jobs logs.

Confidential, Minesota

Senior System Engineer

Responsibilities:

  • Requirements analysis,coding,support the application
  • Analyzing the requirements and doing Impact analysis for the same.
  • Installation and Configuration of WebSphere Application server 7.0
  • Integrating with various webservers, databases & configuration of connection pooling.
  • Developed standard and re-usable mappings and mapplets using various transformations like expression, aggregator, joiner, source qualifier, router, lookup, update strategy, etc.
  • We working on cloudera paltform and with hadoop 2.0 big data.
  • We closely working on hive and sqoop etl data pipe line and design data ware house.
  • Created the unit test cases for mappings developed and verified the data.
  • Supporting ETL testing and working on various ticktes on data warehouse.
  • Working on QA & DEV enviorment for ETL developemnt phase.

Environment: Core Java, L2 & L3 Support Activity,Putty,RCC,WebSphere7.0, SSH, Build fordge.Informatica 9.6, Oracle 10G, Unix, SQL Developer, Flat File,Hadoop, Hive,Sqoop.

Confidential

Senior System Engineer

Responsibilities:

  • Supporting middleware team application.
  • Analyzing the issue and doing Impact analysis for the same.
  • L2 & L3 Supporting application with CC&B system and external system.
  • Monitring the CC&B system and IBM APPS team component.
  • Monitring CDC job for IVR as part of RDW Team.
  • Extensively worked on Data Extraction, Transformation, and Loading with RDBMS, Flat files.
  • Data Ingest from Sqoop & flume from Orcale data base.
  • Work on implementing various stages of Data Flow in the Hadoop ecosystem - Ingestion, Processing, Consumption
  • Working with hive for structure data & Create tables and load data into tables using Hive.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive

Confidential

System Engineer

Responsibilities:

  • Gas Transfer System (GTS) is a web based application standalone java application.
  • GTS primarily provides TRUenergy with the capability to manage gas customer transfers among distributors through market systems. In addition, GTS provides fRe Request SEMA Letter Extract - CMPCLTPB Batch # 1018 23-JUL-2016 unctionality to perform MIRN Discovery andStanding Data requests with participating Distribution Businesses.
  • GTS sends transaction Acknowledgments to market systems for Transfer transactions received. GTS is built on a three-tier, client server architecture, Internet explorer is used for presentation on client workstations, a Java Application Server provides all processing capabilities, and an Oracle database on a separate server machine fulfils data requirements. This three-tier approach minimizes deployment complexity, distributes workload and utilizes existing.

We'd love your feedback!