We provide IT Staff Augmentation Services!

Pyspark/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY:

  • Over all 9 Years of IT experience which includes Hadoop Ecosystem components, Java development.
  • Having 5 years of experience on Big Data Hadoop Ecosystem, Hive, Impala, Map Reduce, Sqoop, oozie, spark sql, spark, Python, pig, Hbase, Kafka.
  • Experience in deployment of Hadoop Ecosystems like Map Reduce, Yarn, Sqoop, Flume, Pyspak,Pig, Hive, Hbase, SPARK, SCALA, Cassandra, Zoo Keeper, Storm, Impala, Kafka
  • Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions) I have been key in delivering innovative database / data warehousing solutions to the Retail, Pharma and Finance Industries.
  • Expertise in Nosql and Relational databases, Mysql, Hbase, Oracle database and data integrating with hive database.
  • Worked creating receivers and topics in Kafka streaming.
  • Experience in reporting tools like SAP BO, tableau to visualise the to integrate with Hadoop eco systems.
  • Having good experience on Python scripting language to use in spark development life cycle like
  • Developed multiple programs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Having good experience in GIT and Maven
  • Worked on SparkSql to create data frames on the data coming from hdfs with the different file formats like ORC, JSON, PARQUETAVRO and storing the data back to hdfs.
  • Worked on Spark RDDS to do the transformations on the data.
  • Experience in importing data from different sources like HDFS/Hbase into Spark RDD.
  • Exported the required data to relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Solid understanding of the Hadoop distributed file system data handling in the hdfs which is coming from other sources.
  • Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.

TECHNICAL SKILLS:

Environments: Hadoop/Bigdata, Hortonworks, cloudera aws

Programming Languages: Python, Scala, Shell Scripting, SQL, Java.

Reporting Tools: Business Objects, Tableau

Database/NO SQL: Mysql, Oracle 11i,SQL server

Hadoop: Pyspark, Spark streaming, Hive, Pig, Hbase, Impala, Scala, Spark, Sqoop, Oozie, Storm, Kafka, Team city, Rabbitmq, java services, PCF

Operating Systems: Linux, ubuntu

PROFESSIONAL EXPERIENCE:

Confidential, Jacksonville, FL

PySpark/Hadoop Developer

Responsibilities:

  • Using Sqoop the structured data in Mysql has been brought into HDFS.
  • Did the integration with spark and Kafka to get the message from Kafka servers?
  • Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
  • Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
  • Applied the hive views, so as to acquire the different types of transactions as per our requirement.
  • Created RDD to do the transformation on the data which we get from different data sources.
  • Developed spark accumulators to analyze the data recursions which we written on the data transformations.
  • Created Kafka producers to get the data from different servers and published to topics
  • Developed on broadcast variable to put the data on single variable and used it for the entire process.

Environment :SPARK,python,scala,DB 2,HDFS,Sqoop,cloudera,pyspark,jira,github .

Confidential, CIncinnati, OH

PySpark/Hadoop Developer

Responsibilities:

  • Using Sqoop the structured data in Mysql has been brought into HDFS.
  • Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
  • Application performance tuning to optimize resource and time utilization.
  • Design application flow and implement end to end from gathering requirements, Build Code, perform testing, deploying into production
  • Developed pyspark programs and created the data frames and worked on transformations.
  • Worked spark transformations on source files to load the data into in hdfs.
  • Developed performance tuning in spark program for different source systems domains and inserted into harmonized layer.
  • Automated scripts using oozie and implement in production.
  • Developed atomic scripts for scheduling oozie, Sqoop jobs daily or weekly basis.
  • Worked on agile environment, Jira, GitHub version control and team city for continuous build.

Environment: Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark

Confidential, Cincinnati, OH

Spark/Hadoop Developer

Responsibilities:

  • Ingest data from different sources to BDA to build Enterprise Big Data data warehouse.
  • Migrating Legacy applications into informatica IDQ leveraging the Big Data cluster and its ecosystem.
  • Transform and analyze the data using Spark, HIVE based on ETL mappings
  • Used DataStage, informatica BDM and Exadata to perform ETL and prepare data lakes for various domains
  • Extract data to HDFS from Teradata/Exadata using Sqoop for settlement and billing domain.
  • Application performance tuning to optimize resource and time utilization.
  • Design application flow and implement end to end from gathering requirements, Build Code, perform testing.
  • Performing functional and regression testing in support of quality of IT products for business users creating
  • Developed spark programs and created the data frames for hive tables
  • Worked spark transformations on source table to load the data into harmonized tables in hive
  • Developed performance tuning in spark program for different source systems domains and and inserted into harmonized layer.
  • Automated scripts using oozie and implement in production.

Environment : Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark

Confidential

Spark/Hadoop Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop and Spark with Cloudera distribution.
  • Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS sources.
  • Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
  • Used HiveQL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
  • Experience in using Avro, Parquet, RC File and JSON file formats, developed UDFs in Hive and Pig.
  • Worked with Log4j framework for logging debug, info & error data.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Sqoop, Hive scripts to create Data pipelines.
  • Generated various kinds of reports using Power BI and Tableau based on Client specification.
  • Used Jira for bug tracking and Bitbucket to check - in and checkout code changes.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark SQL, HDFS, Hive, Sqoop, Scala, Shell scripting, Linux, Oozie, Tableau.

Environment : Hive, SPARK, python,, SPARK SQL, HDFS, Sqoop, cloudera, pyspark

Confidential

Spark/Hadoop lead Developer

Responsibilities :

  • Using Sqoop the structured data in Mysql has been brought into HDFS.
  • Did the integration with spark and Kafka to get the message from Kafka servers?
  • Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
  • Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
  • Applied the hive views, so as to acquire the different types of transactions as per our requirement.
  • Created RDD to do the transformation on the data which we get from different data sources.
  • Developed spark accumulators to analyze the data recursions which we written on the data transformations.
  • Created Kafka producers to get the data from different servers and published to topics
  • Developed on broadcast variable to put the data on single variable and used it for the entire process.
  • Created persist and cache to store the required RDDs on the memory.
  • Developed shell script to pull data from HDFS and apply the incremental and full load to the Hive tables.
  • Responsible for preparing and presenting the data metrics as an input to senior management for user data based upon the age, demographics and other user criteria.
  • Created shell Scripts to log failed transactions and find their root cause.
  • Uploading the processed data into SAP BO for report generation.
  • Analyzing buying patterns of customers using Customer logs in the form of JSON file format Sending the analysed data to hdfs for further use
  • Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.
  • Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.

Environment : Hive, SPARK, SCALA, SPARK SQL, HDFS SAPBO, Sqoop, Kafka, SPARK streaming, cloudera

Confidential

Spark/Hadoop Lead developer

Responsibilities:

  • Responsible for writing Spark jobs to handle files in multiple formats (JSON, Text, and Avro)
  • Created external tables and managed to as per with requirements.
  • Extensively used Korn Shell Scripts for doing manipulations of the flat files, given by the share brokers.
  • Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.
  • Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts
  • Developed Sqoop scripts to import and export data from relational sources by handling incremental data loading on the customer transaction data by date.
  • Worked on spark accumulators to analyze the data recursions and which are variables that are only “added” to, such as counters and sums.
  • Written a code using python for different transformations in SPARK. And creating RDD over that.
  • Worked on persist and cache to store the required RDDs on the memory and to use when it is necessary on the other transformations
  • Worked on different rdds to transform the data coming from different sources and transform the data into required format
  • Developed different action in SPARK to retrieve the results of the data sources which will be required transformed format
  • Developed spark coding using python scripting to analyze the data we are getting from different sources
  • Created data frames in a SPARK SQL from data in HDFS and did transformations analyzed the data and stores back into the HDFS
  • Developed transformations and actions using python programming
  • Integrated the spark with Hadoop eco systems and stores the data Hadoop file system(HDFS)
  • Worked extensively on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
  • Worked on broadcast variable to put the data on single variable and used it for the entire process.
  • Involved in loading and transforming large sets of structured, semi structured data from databases into HDFS using Sqoop imports.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats
  • To stay at par with the developments in Hadoop and worked with multiple data sources
  • Importing and exporting different kinds of data like incremental, updated, and column base data from RDBMS to HIVE

Environment : Spark, python, Hdfs, Sparksql, oracle, Pyspark, Kafka, Tableau, Impala and Hive, Hortonworks

Confidential

Java Developer

Responsibilities :

  • Involved in development and maintenance of Product development and fixed the issues.
  • Implemented new Functional module using the J2EE and customized framework (OA).
  • Developed new screens for using JSP and Servlets.
  • Customized Business module using the EJB and JAVA.
  • Worked on database tables creation and transformations on that using SQL
  • Having good experience SQL /PLSQL
  • Involved in SCRUM meetings and developed and fixed the issues.
  • Developed Framework Manager Models and analyzed those models in analysis studio.
  • Fixed the standard issues and client generated issues.
  • Involved in maintaining and developing the metadata model using Framework Manager.
  • Installing and configuring Applications.

Environment: Java, Open Architecture, Linux, Core Java (OOPs and collections), J2EE Framework, JSPServlets, ANT, MAVEN, GIT, Java Script, Shell scripting, oracle sql

Confidential

Java developer

Responsibilities:

  • Analysis of the specifications provided by the clients.
  • Involved in bug fixes as well as enhancements to the existing project.
  • Preparing the High level design as per the requirement.
  • Worked on exception handling using Core Java
  • Worked on multithreading in Java
  • Worked on sql queries for database development and alterations
  • Worked on stored procedures using sql
  • Proficiency in SQl/PLSQ
  • Developing the application and do the unit testing.
  • Planning and scheduling various releases based on the customer requirements.

Environment: CoreJava, C++, Sql, Tortoise SVN, JDBC, Hibernate, J2EE

Confidential

PL/SQL Programmer

Responsibilities:

  • Worked on building up the database in Oracle
  • Created Data Structures. i.e. tables & views and applied the referential integrity.
  • Worked as an administrator and assigned rights to the users, groups for accessing the database.
  • Responsible for creating and modifying the PL/SQL Procedure, Function, Triggers according to the business requirement.
  • Created Indexes, Sequences and Constraints.
  • Created Materialized views for summary tables for better Query performance.
  • Identified source system, their connectivity, related tables and fields and ensured data consistency for mapping.
  • Worked closely with users, decision makers to develop the Transformation Logic to be used in Informatica Power Center.
  • Converted the business rules into technical specifications for ETL process for populating fact and dimension table of data warehouse.
  • Created mappings, transformations using Designer, and created sessions using Workflow Manager.
  • Created staging tables to do validations against data before loading data into original fact and dimension tables.
  • Involved in loading large amounts of data using utilities such as SQL Loader.
  • Designed and developed Oracle Reports for the analysis of the data.

Environment: Visual Basic 6.0, Oracle 8i, PL/SQL, Crystal Report 6, Erwin, Windows NTs

We'd love your feedback!