PySpark/Hadoop Developer Resume Jacksonville, FL - Hire IT People

SUMMARY:

Over all 9 Years of IT experience which includes Hadoop Ecosystem components, Java development.
Having 5 years of experience on Big Data Hadoop Ecosystem, Hive, Impala, Map Reduce, Sqoop, oozie, spark sql, spark, Python, pig, Hbase, Kafka.
Experience in deployment of Hadoop Ecosystems like Map Reduce, Yarn, Sqoop, Flume, Pyspak,Pig, Hive, Hbase, SPARK, SCALA, Cassandra, Zoo Keeper, Storm, Impala, Kafka
Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions) I have been key in delivering innovative database / data warehousing solutions to the Retail, Pharma and Finance Industries.
Expertise in Nosql and Relational databases, Mysql, Hbase, Oracle database and data integrating with hive database.
Worked creating receivers and topics in Kafka streaming.
Experience in reporting tools like SAP BO, tableau to visualise the to integrate with Hadoop eco systems.
Having good experience on Python scripting language to use in spark development life cycle like
Developed multiple programs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
Having good experience in GIT and Maven
Worked on SparkSql to create data frames on the data coming from hdfs with the different file formats like ORC, JSON, PARQUETAVRO and storing the data back to hdfs.
Worked on Spark RDDS to do the transformations on the data.
Experience in importing data from different sources like HDFS/Hbase into Spark RDD.
Exported the required data to relational databases using Sqoop for visualization and to generate reports for the BI team.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
Solid understanding of the Hadoop distributed file system data handling in the hdfs which is coming from other sources.
Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.

TECHNICAL SKILLS:

Environments: Hadoop/Bigdata, Hortonworks, cloudera aws

Programming Languages: Python, Scala, Shell Scripting, SQL, Java.

Reporting Tools: Business Objects, Tableau

Database/NO SQL: Mysql, Oracle 11i,SQL server

Hadoop: Pyspark, Spark streaming, Hive, Pig, Hbase, Impala, Scala, Spark, Sqoop, Oozie, Storm, Kafka, Team city, Rabbitmq, java services, PCF

Operating Systems: Linux, ubuntu

PROFESSIONAL EXPERIENCE:

Confidential, Jacksonville, FL

PySpark/Hadoop Developer

Responsibilities:

Using Sqoop the structured data in Mysql has been brought into HDFS.
Did the integration with spark and Kafka to get the message from Kafka servers?
Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
Applied the hive views, so as to acquire the different types of transactions as per our requirement.
Created RDD to do the transformation on the data which we get from different data sources.
Developed spark accumulators to analyze the data recursions which we written on the data transformations.
Created Kafka producers to get the data from different servers and published to topics
Developed on broadcast variable to put the data on single variable and used it for the entire process.

Environment :SPARK,python,scala,DB 2,HDFS,Sqoop,cloudera,pyspark,jira,github .

Confidential, CIncinnati, OH

PySpark/Hadoop Developer

Responsibilities:

Using Sqoop the structured data in Mysql has been brought into HDFS.
Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
Application performance tuning to optimize resource and time utilization.
Design application flow and implement end to end from gathering requirements, Build Code, perform testing, deploying into production
Developed pyspark programs and created the data frames and worked on transformations.
Worked spark transformations on source files to load the data into in hdfs.
Developed performance tuning in spark program for different source systems domains and inserted into harmonized layer.
Automated scripts using oozie and implement in production.
Developed atomic scripts for scheduling oozie, Sqoop jobs daily or weekly basis.
Worked on agile environment, Jira, GitHub version control and team city for continuous build.

Environment: Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark

Confidential, Cincinnati, OH

Spark/Hadoop Developer

Responsibilities:

Ingest data from different sources to BDA to build Enterprise Big Data data warehouse.
Migrating Legacy applications into informatica IDQ leveraging the Big Data cluster and its ecosystem.
Transform and analyze the data using Spark, HIVE based on ETL mappings
Used DataStage, informatica BDM and Exadata to perform ETL and prepare data lakes for various domains
Extract data to HDFS from Teradata/Exadata using Sqoop for settlement and billing domain.
Application performance tuning to optimize resource and time utilization.
Design application flow and implement end to end from gathering requirements, Build Code, perform testing.
Performing functional and regression testing in support of quality of IT products for business users creating
Developed spark programs and created the data frames for hive tables
Worked spark transformations on source table to load the data into harmonized tables in hive
Developed performance tuning in spark program for different source systems domains and and inserted into harmonized layer.
Automated scripts using oozie and implement in production.

Environment : Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark

Confidential

Spark/Hadoop Developer

Responsibilities:

Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop and Spark with Cloudera distribution.
Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS sources.
Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
Used HiveQL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
Experience in using Avro, Parquet, RC File and JSON file formats, developed UDFs in Hive and Pig.
Worked with Log4j framework for logging debug, info & error data.
Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
Developed Oozie coordinators to schedule Sqoop, Hive scripts to create Data pipelines.
Generated various kinds of reports using Power BI and Tableau based on Client specification.
Used Jira for bug tracking and Bitbucket to check - in and checkout code changes.
Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Spark, Spark SQL, HDFS, Hive, Sqoop, Scala, Shell scripting, Linux, Oozie, Tableau.

Environment : Hive, SPARK, python,, SPARK SQL, HDFS, Sqoop, cloudera, pyspark

Confidential

Spark/Hadoop lead Developer

Responsibilities :

Using Sqoop the structured data in Mysql has been brought into HDFS.
Did the integration with spark and Kafka to get the message from Kafka servers?
Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
Applied the hive views, so as to acquire the different types of transactions as per our requirement.
Created RDD to do the transformation on the data which we get from different data sources.
Developed spark accumulators to analyze the data recursions which we written on the data transformations.
Created Kafka producers to get the data from different servers and published to topics
Developed on broadcast variable to put the data on single variable and used it for the entire process.
Created persist and cache to store the required RDDs on the memory.
Developed shell script to pull data from HDFS and apply the incremental and full load to the Hive tables.
Responsible for preparing and presenting the data metrics as an input to senior management for user data based upon the age, demographics and other user criteria.
Created shell Scripts to log failed transactions and find their root cause.
Uploading the processed data into SAP BO for report generation.
Analyzing buying patterns of customers using Customer logs in the form of JSON file format Sending the analysed data to hdfs for further use
Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.
Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.

Environment : Hive, SPARK, SCALA, SPARK SQL, HDFS SAPBO, Sqoop, Kafka, SPARK streaming, cloudera

Confidential

Spark/Hadoop Lead developer

Responsibilities:

Responsible for writing Spark jobs to handle files in multiple formats (JSON, Text, and Avro)
Created external tables and managed to as per with requirements.
Extensively used Korn Shell Scripts for doing manipulations of the flat files, given by the share brokers.
Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.
Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts
Developed Sqoop scripts to import and export data from relational sources by handling incremental data loading on the customer transaction data by date.
Worked on spark accumulators to analyze the data recursions and which are variables that are only “added” to, such as counters and sums.
Written a code using python for different transformations in SPARK. And creating RDD over that.
Worked on persist and cache to store the required RDDs on the memory and to use when it is necessary on the other transformations
Worked on different rdds to transform the data coming from different sources and transform the data into required format
Developed different action in SPARK to retrieve the results of the data sources which will be required transformed format
Developed spark coding using python scripting to analyze the data we are getting from different sources
Created data frames in a SPARK SQL from data in HDFS and did transformations analyzed the data and stores back into the HDFS
Developed transformations and actions using python programming
Integrated the spark with Hadoop eco systems and stores the data Hadoop file system(HDFS)
Worked extensively on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
Worked on broadcast variable to put the data on single variable and used it for the entire process.
Involved in loading and transforming large sets of structured, semi structured data from databases into HDFS using Sqoop imports.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats
To stay at par with the developments in Hadoop and worked with multiple data sources
Importing and exporting different kinds of data like incremental, updated, and column base data from RDBMS to HIVE

Environment : Spark, python, Hdfs, Sparksql, oracle, Pyspark, Kafka, Tableau, Impala and Hive, Hortonworks

Confidential

Java Developer

Responsibilities :

Involved in development and maintenance of Product development and fixed the issues.
Implemented new Functional module using the J2EE and customized framework (OA).
Developed new screens for using JSP and Servlets.
Customized Business module using the EJB and JAVA.
Worked on database tables creation and transformations on that using SQL
Having good experience SQL /PLSQL
Involved in SCRUM meetings and developed and fixed the issues.
Developed Framework Manager Models and analyzed those models in analysis studio.
Fixed the standard issues and client generated issues.
Involved in maintaining and developing the metadata model using Framework Manager.
Installing and configuring Applications.

Environment: Java, Open Architecture, Linux, Core Java (OOPs and collections), J2EE Framework, JSPServlets, ANT, MAVEN, GIT, Java Script, Shell scripting, oracle sql

Confidential

Java developer

Responsibilities:

Analysis of the specifications provided by the clients.
Involved in bug fixes as well as enhancements to the existing project.
Preparing the High level design as per the requirement.
Worked on exception handling using Core Java
Worked on multithreading in Java
Worked on sql queries for database development and alterations
Worked on stored procedures using sql
Proficiency in SQl/PLSQ
Developing the application and do the unit testing.
Planning and scheduling various releases based on the customer requirements.

Environment: CoreJava, C++, Sql, Tortoise SVN, JDBC, Hibernate, J2EE

Confidential

PL/SQL Programmer

Responsibilities:

Worked on building up the database in Oracle
Created Data Structures. i.e. tables & views and applied the referential integrity.
Worked as an administrator and assigned rights to the users, groups for accessing the database.
Responsible for creating and modifying the PL/SQL Procedure, Function, Triggers according to the business requirement.
Created Indexes, Sequences and Constraints.
Created Materialized views for summary tables for better Query performance.
Identified source system, their connectivity, related tables and fields and ensured data consistency for mapping.
Worked closely with users, decision makers to develop the Transformation Logic to be used in Informatica Power Center.
Converted the business rules into technical specifications for ETL process for populating fact and dimension table of data warehouse.
Created mappings, transformations using Designer, and created sessions using Workflow Manager.
Created staging tables to do validations against data before loading data into original fact and dimension tables.
Involved in loading large amounts of data using utilities such as SQL Loader.
Designed and developed Oracle Reports for the analysis of the data.

Environment: Visual Basic 6.0, Oracle 8i, PL/SQL, Crystal Report 6, Erwin, Windows NTs

We provide IT Staff Augmentation Services!

Pyspark/hadoop Developer Resume

Jacksonville, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship