We provide IT Staff Augmentation Services!

Senior Data Analyst Resume

5.00/5 (Submit Your Rating)

SUMMARY:

  • 8+ years of experience as Data Analyst/Big Data Engineer
  • Proficient in Data Analysis, Cleansing, Transformation, Data Migration, Data Integration, Data Import, and Data Export through use of ETL tools such as Informatica.
  • Analyzed data and provided insights with R Programming and Python Pandas
  • Expertise in Business Intelligence, Data warehousing technologies, ETL and Big Data technologies.
  • Experience in Creating ETL mappings using Informatica to move Data from multiple sources like Flat files, Oracle into a common target area such as Data Warehouse.
  • Experience in writing PL/SQL statements - Stored Procedures, Functions, Triggers and packages.
  • Expertise in Hadoop components - HDFS, YARN, Name Node, Data Node and Apache Spark.
  • Implemented large scale technical solutions using Object Oriented Design and Programming concepts using Python
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Hands On experience on Spark Core, Spark SQL, Spark Streaming and creating the Data Frames handle in SPARK with Scala.
  • Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real time data processing and performance improvements based on data access patterns.
  • Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts.
  • Experience in building large scale highly available Web Applications. Working knowledge of web services and other integration patterns.
  • Understanding of data storage and retrieval techniques, ETL and databases to include Key-Value data stores, document data stores.
  • Involved in creating database objects like tables, views, procedures, triggers, and functions using T-SQL to provide definition, structure and to maintain data efficiently.
  • Hands on experience

PROFESSIONAL EXPERIENCE:

Confidential

Senior Data Analyst

Responsibilities:

  • Evaluating client needs and translating their business requirement to functional specifications thereby onboarding them onto Hadoop ecosystem. Extracted and updated the data into HDFS using Sqoop import and export.
  • Developed HIVE UDFs to in corporate external business logic into Hive script and Developed join data set scripts using HIVE join operations. Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning, and Bucketing. Worked with various HDFS file formats like Parque, IAM, Json for serializing and deserializing. Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark
  • SQL, Azure, PySpark, Impala, Tealeaf, Pair RDD's, Nifi, Spark YARN. Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data. Good experience in using Relational databases Oracle, SQL, GCP Server and PostgreSQL Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources. Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure
  • Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Data bricks. Used IAM to detect and stop risky identity behaviors using rules, machine learning, and other statistical algorithms. Responsible to manage data coming from different sources through Kafka. Installed Kafka Producer on different severs and Scheduled to produce data for every 10 seconds. Implemented Data Quality in
  • ETL Tool Talend and having good knowledge in Data Warehousing. Developed Apache Spark applications by using spark for data processing from various streaming sources. Strong Knowledge on architecture and components of Tealeaf, and efficient in working with Spark Core, SparkSQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka. Exposure to Spark, Spark Streaming, Spark MLlib, snowflake, Scala, and Creating the Data Frames handled in Spark with Scala.
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS. Experienced Good understanding of NoSQL databases and hands on work experience in writing applications
  • No SQL Databases HBase, Cassandra and MongoDB. Very good implementation experience of Object-Oriented concepts, Multithreading and Java/Scala Experienced with the S

Confidential

Sr. Data Engineer/Big Data Developer

Responsibilities:

  • As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre - production stage and up to 24 nodes in production. Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology. Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive. Built pipelines to move hashed and un-hashed data from XML files to Data lake.
  • Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs. Strong understanding of the principles of Data Warehousing concepts using Fact tables
  • Dimension tables and Star / Snowflake Schema modeling. Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data. Experience with Cloud Service Providers such as Amazon AWS, Microsoft Azure, and Google GCP Experience in change implementation, monitoring and troubleshooting of AWS Snowflake databases and cluster related issues. Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala. Experience in change implementation, monitoring and troubleshooting of AWS Snowflake databases and cluster related issues. Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB)
  • Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts. Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order. Wrote Junit tests and Integration test cases for those Microservice. Worked in Azure environment for development and deployment of Custom Hadoop Applications. Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Work heavily with Python, C++, Spark, SQL, Airflow, and Looker Developed NiFi workflow to pick up the multiple files from ftp location and move those to HDFS on daily basis. Scripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.
  • Worked with developer teams on NiFi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka. Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations. Proven experience with ETL frameworks (Airflow, Luigi, or our own open sourced garcon) Created Hive schemas using performance techniques like partitioning and bucketing. Used Hadoop YARN to perform analytics on data in Hive. Developed and maintained batch data f

Confidential

Big Data Engineer

Responsibilities:

  • Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica. Worked on
  • Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production. Built APIs that will allow customer service representatives to access the data and answer queries. Designed changes to transform current Hadoop jobs to HBase.
  • Handled fixing of defects efficiently and worked with the QA and BA team for clarifications. Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files. Extending the functionality of Hive with custom UDF s and UDAF's. The new Business Data Warehouse (BDW) improved query/report performance reduced the time needed to develop reports and established self-service reporting model in Cognos for business users. Implemented Bucketing and Partitioning using hive to assist the users with data analysis. Used Oozie scripts for deployment of the application and perforce as the secure versioning software. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Develop database management systems for easy access, storage, and retrieval of data. Perform DB activities such as indexing, performance tuning, and backup and restore. Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom Map Reduce programs in Java. Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented
  • AJAX, JSON, and Java script to create interactive web screens. Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB. Involved in loading and transforming large sets of Structured, Semi-Structured and
  • Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Created Session Beans and controller Servlets for handling HTTP requests from Talend. Performed Data Visualization and
  • Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders. Used Git for version control with Data Engineer team and Data Scientists colleagues. Involved in creating Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server Performed statistical analysis usi

Confidential

Data Engineer / Hadoop Developer

Responsibilities:

  • Experience in Big Data Analytics and design in Hadoop ecosystem using Map Reduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka Build the Oozie pipeline which performs several actions like file move process, Sqoop the data from the source
  • Teradata or SQL and exports into the hive staging tables and performing aggregations as per business requirements and loading into the main tables. Running of Apache Hadoop, CDH and Map - R distros, dubbed Elastic Map Reduce (EMR) on (EC2). Performing the forking action whenever there is a scope of parallel process for optimization of data latency. Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python. Performed pig script which picks the data from one Hdfs path and performs aggregation and loads into another path which later pulls populates into another domain table. Converted this script into a jar and passed as parameter in Oozie script. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Build an ETL which utilizes spark jar inside which executes the business analytical model.
  • Hands on experiences on Git bash commands like Git pull to pull the code from source and developing it as per the requirements, Git add to add files, Git commit after the code build and Git push to the pre prod environment for the code review and later used screwdriver. yaml which actually build the code, generates artifacts which releases into production. Created logical data model from the conceptual model and its conversion into the physical database design using Erwin. Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop. Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP. Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency. Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules. Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources Developed and implemented R and Shiny application which showcases machine learning for business forecasting. Developed predictive models using Python & R to predict customers churn and classification of customers. Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams. Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and

Confidential

Java/Hadoop Developer

Responsibilities:

  • Involved in review of functional and non - functional requirements. Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce Installed and configured Pig and also written Pig Latin scripts. Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis. Developing Scripts and Batch Job to schedule various Hadoop Program. Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL. Experienced in defining job flows. Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS. Build and deployed war file in WebSphere application server. Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC. Involved in frequent meeting with clients to gather business requirement & converting them to technicalspecification for development team. Adopted agile methodology with pair programming technique and addressed issues during system testing. Involved in Bug fixing and Enhancement phase, used find bug tool. Version Controlled using SVN.
  • Developed application in Eclipse IDE. Experience in developing spring Boot applications for transformations. Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX. Used struts framework to build MVC architecture and separate presentation from business logic. Involved in rewriting middle-tier on WebLogic application server. Actively involved in Code-Reviews & Coding Standards, Unit testing & Integration Testing. Importing and exporting data into HDFS from Oracle Database and vice versa using sqoop Involved in creating Hive tables, loading the data and writing hive queries that will run internally in a map reduce way. Developed a custom FileSystem plugin for Hadoop so it can access files on Data Platform. The custom FileSystem plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. Designed and implemented
  • MapReduce-based large-scale parallel relation-learning system Setup and benchmarked Hadoop/HBase clusters for internal use

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Linux, XML, Eclipse, Oracle 10g, PL/SQL, MongoDB, Toad.

We'd love your feedback!