Pyspark/hadoop Developer Resume
Jacksonville, FL
SUMMARY:
- Over all 9 Years of IT experience which includes Hadoop Ecosystem components, Java development.
- Having 5 years of experience on Big Data Hadoop Ecosystem, Hive, Impala, Map Reduce, Sqoop, oozie, spark sql, spark, Python, pig, Hbase, Kafka.
- Experience in deployment of Hadoop Ecosystems like Map Reduce, Yarn, Sqoop, Flume, Pyspak,Pig, Hive, Hbase, SPARK, SCALA, Cassandra, Zoo Keeper, Storm, Impala, Kafka
- Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions) I have been key in delivering innovative database / data warehousing solutions to the Retail, Pharma and Finance Industries.
- Expertise in Nosql and Relational databases, Mysql, Hbase, Oracle database and data integrating with hive database.
- Worked creating receivers and topics in Kafka streaming.
- Experience in reporting tools like SAP BO, tableau to visualise the to integrate with Hadoop eco systems.
- Having good experience on Python scripting language to use in spark development life cycle like
- Developed multiple programs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
- Having good experience in GIT and Maven
- Worked on SparkSql to create data frames on the data coming from hdfs with the different file formats like ORC, JSON, PARQUETAVRO and storing the data back to hdfs.
- Worked on Spark RDDS to do the transformations on the data.
- Experience in importing data from different sources like HDFS/Hbase into Spark RDD.
- Exported the required data to relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
- Solid understanding of the Hadoop distributed file system data handling in the hdfs which is coming from other sources.
- Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
TECHNICAL SKILLS:
Environments: Hadoop/Bigdata, Hortonworks, cloudera aws
Programming Languages: Python, Scala, Shell Scripting, SQL, Java.
Reporting Tools: Business Objects, Tableau
Database/NO SQL: Mysql, Oracle 11i,SQL server
Hadoop: Pyspark, Spark streaming, Hive, Pig, Hbase, Impala, Scala, Spark, Sqoop, Oozie, Storm, Kafka, Team city, Rabbitmq, java services, PCF
Operating Systems: Linux, ubuntu
PROFESSIONAL EXPERIENCE:
Confidential, Jacksonville, FL
PySpark/Hadoop Developer
Responsibilities:
- Using Sqoop the structured data in Mysql has been brought into HDFS.
- Did the integration with spark and Kafka to get the message from Kafka servers?
- Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
- Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
- Applied the hive views, so as to acquire the different types of transactions as per our requirement.
- Created RDD to do the transformation on the data which we get from different data sources.
- Developed spark accumulators to analyze the data recursions which we written on the data transformations.
- Created Kafka producers to get the data from different servers and published to topics
- Developed on broadcast variable to put the data on single variable and used it for the entire process.
Environment :SPARK,python,scala,DB 2,HDFS,Sqoop,cloudera,pyspark,jira,github .
Confidential, CIncinnati, OH
PySpark/Hadoop Developer
Responsibilities:
- Using Sqoop the structured data in Mysql has been brought into HDFS.
- Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
- Application performance tuning to optimize resource and time utilization.
- Design application flow and implement end to end from gathering requirements, Build Code, perform testing, deploying into production
- Developed pyspark programs and created the data frames and worked on transformations.
- Worked spark transformations on source files to load the data into in hdfs.
- Developed performance tuning in spark program for different source systems domains and inserted into harmonized layer.
- Automated scripts using oozie and implement in production.
- Developed atomic scripts for scheduling oozie, Sqoop jobs daily or weekly basis.
- Worked on agile environment, Jira, GitHub version control and team city for continuous build.
Environment: Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark
Confidential, Cincinnati, OH
Spark/Hadoop Developer
Responsibilities:
- Ingest data from different sources to BDA to build Enterprise Big Data data warehouse.
- Migrating Legacy applications into informatica IDQ leveraging the Big Data cluster and its ecosystem.
- Transform and analyze the data using Spark, HIVE based on ETL mappings
- Used DataStage, informatica BDM and Exadata to perform ETL and prepare data lakes for various domains
- Extract data to HDFS from Teradata/Exadata using Sqoop for settlement and billing domain.
- Application performance tuning to optimize resource and time utilization.
- Design application flow and implement end to end from gathering requirements, Build Code, perform testing.
- Performing functional and regression testing in support of quality of IT products for business users creating
- Developed spark programs and created the data frames for hive tables
- Worked spark transformations on source table to load the data into harmonized tables in hive
- Developed performance tuning in spark program for different source systems domains and and inserted into harmonized layer.
- Automated scripts using oozie and implement in production.
Environment : Hive, SPARK, python, SPARK SQL, HDFS SAPBO, Sqoop, cloudera, pyspark
Confidential
Spark/Hadoop Developer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop and Spark with Cloudera distribution.
- Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
- Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS sources.
- Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
- Used HiveQL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Experience in using Avro, Parquet, RC File and JSON file formats, developed UDFs in Hive and Pig.
- Worked with Log4j framework for logging debug, info & error data.
- Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Developed Oozie coordinators to schedule Sqoop, Hive scripts to create Data pipelines.
- Generated various kinds of reports using Power BI and Tableau based on Client specification.
- Used Jira for bug tracking and Bitbucket to check - in and checkout code changes.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: Spark, Spark SQL, HDFS, Hive, Sqoop, Scala, Shell scripting, Linux, Oozie, Tableau.
Environment : Hive, SPARK, python,, SPARK SQL, HDFS, Sqoop, cloudera, pyspark
Confidential
Spark/Hadoop lead Developer
Responsibilities :
- Using Sqoop the structured data in Mysql has been brought into HDFS.
- Did the integration with spark and Kafka to get the message from Kafka servers?
- Worked on consuming Kafka messages to spark and created D streams for the data aggregations on it
- Processing the data (this is customer transaction related data) and developed the views day wise, weekly and monthly transaction summary customer wise branch wise and zonal wise.
- Applied the hive views, so as to acquire the different types of transactions as per our requirement.
- Created RDD to do the transformation on the data which we get from different data sources.
- Developed spark accumulators to analyze the data recursions which we written on the data transformations.
- Created Kafka producers to get the data from different servers and published to topics
- Developed on broadcast variable to put the data on single variable and used it for the entire process.
- Created persist and cache to store the required RDDs on the memory.
- Developed shell script to pull data from HDFS and apply the incremental and full load to the Hive tables.
- Responsible for preparing and presenting the data metrics as an input to senior management for user data based upon the age, demographics and other user criteria.
- Created shell Scripts to log failed transactions and find their root cause.
- Uploading the processed data into SAP BO for report generation.
- Analyzing buying patterns of customers using Customer logs in the form of JSON file format Sending the analysed data to hdfs for further use
- Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.
- Developed data requirements, performed database queries to identify test data, and to create data procedures with expected results.
Environment : Hive, SPARK, SCALA, SPARK SQL, HDFS SAPBO, Sqoop, Kafka, SPARK streaming, cloudera
Confidential
Spark/Hadoop Lead developer
Responsibilities:
- Responsible for writing Spark jobs to handle files in multiple formats (JSON, Text, and Avro)
- Created external tables and managed to as per with requirements.
- Extensively used Korn Shell Scripts for doing manipulations of the flat files, given by the share brokers.
- Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.
- Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts
- Developed Sqoop scripts to import and export data from relational sources by handling incremental data loading on the customer transaction data by date.
- Worked on spark accumulators to analyze the data recursions and which are variables that are only “added” to, such as counters and sums.
- Written a code using python for different transformations in SPARK. And creating RDD over that.
- Worked on persist and cache to store the required RDDs on the memory and to use when it is necessary on the other transformations
- Worked on different rdds to transform the data coming from different sources and transform the data into required format
- Developed different action in SPARK to retrieve the results of the data sources which will be required transformed format
- Developed spark coding using python scripting to analyze the data we are getting from different sources
- Created data frames in a SPARK SQL from data in HDFS and did transformations analyzed the data and stores back into the HDFS
- Developed transformations and actions using python programming
- Integrated the spark with Hadoop eco systems and stores the data Hadoop file system(HDFS)
- Worked extensively on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Worked on broadcast variable to put the data on single variable and used it for the entire process.
- Involved in loading and transforming large sets of structured, semi structured data from databases into HDFS using Sqoop imports.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats
- To stay at par with the developments in Hadoop and worked with multiple data sources
- Importing and exporting different kinds of data like incremental, updated, and column base data from RDBMS to HIVE
Environment : Spark, python, Hdfs, Sparksql, oracle, Pyspark, Kafka, Tableau, Impala and Hive, Hortonworks
Confidential
Java Developer
Responsibilities :
- Involved in development and maintenance of Product development and fixed the issues.
- Implemented new Functional module using the J2EE and customized framework (OA).
- Developed new screens for using JSP and Servlets.
- Customized Business module using the EJB and JAVA.
- Worked on database tables creation and transformations on that using SQL
- Having good experience SQL /PLSQL
- Involved in SCRUM meetings and developed and fixed the issues.
- Developed Framework Manager Models and analyzed those models in analysis studio.
- Fixed the standard issues and client generated issues.
- Involved in maintaining and developing the metadata model using Framework Manager.
- Installing and configuring Applications.
Environment: Java, Open Architecture, Linux, Core Java (OOPs and collections), J2EE Framework, JSPServlets, ANT, MAVEN, GIT, Java Script, Shell scripting, oracle sql
Confidential
Java developer
Responsibilities:
- Analysis of the specifications provided by the clients.
- Involved in bug fixes as well as enhancements to the existing project.
- Preparing the High level design as per the requirement.
- Worked on exception handling using Core Java
- Worked on multithreading in Java
- Worked on sql queries for database development and alterations
- Worked on stored procedures using sql
- Proficiency in SQl/PLSQ
- Developing the application and do the unit testing.
- Planning and scheduling various releases based on the customer requirements.
Environment: CoreJava, C++, Sql, Tortoise SVN, JDBC, Hibernate, J2EE
Confidential
PL/SQL Programmer
Responsibilities:
- Worked on building up the database in Oracle
- Created Data Structures. i.e. tables & views and applied the referential integrity.
- Worked as an administrator and assigned rights to the users, groups for accessing the database.
- Responsible for creating and modifying the PL/SQL Procedure, Function, Triggers according to the business requirement.
- Created Indexes, Sequences and Constraints.
- Created Materialized views for summary tables for better Query performance.
- Identified source system, their connectivity, related tables and fields and ensured data consistency for mapping.
- Worked closely with users, decision makers to develop the Transformation Logic to be used in Informatica Power Center.
- Converted the business rules into technical specifications for ETL process for populating fact and dimension table of data warehouse.
- Created mappings, transformations using Designer, and created sessions using Workflow Manager.
- Created staging tables to do validations against data before loading data into original fact and dimension tables.
- Involved in loading large amounts of data using utilities such as SQL Loader.
- Designed and developed Oracle Reports for the analysis of the data.
Environment: Visual Basic 6.0, Oracle 8i, PL/SQL, Crystal Report 6, Erwin, Windows NTs