Sr.hadoop & Cloud Developer Resume
Beaverton, OregoN
SUMMARY:
- 5.7 years of experience as a professional Hadoop developer in Batch and real - time data processing using various Hadoop components- Spark, Solr, Kafka, Hbase, Hive, Nifi, Sqoop, Storm and Java
- Having experience in building Hortonworks Hadoop Cluster - HDP2.5
- Having working experience in PySpark with AWS Cloud components like S3, Redshift Db
- Deputed to TCS-Singapore for a period of 6 months to work closely for Confidential to build a Business layer for various transactional sources using Hadoop Components
- Experience in working with Pyspark, Spark, Spark Streaming & Spark Sql and also in extending Spark integration with various components - Solr, Kafka, HDFS, Hbase, Hive & Amazon Kinesis
- Extensive Experience in building SolrCloud cluster, Confidential, Banana dashboards and in extending prepare custom Solr schema & configurations
- Utilized Storm, Kafka & Amazon kinesis for processing large volume of datasets
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to RDBMS and vice-versa
- Experience in working with MR,PIG scripts & HIVE query language, Hcatalog and also in extending Hive and Pig functionality by writing custom UDFs
- Experience in analyzing data using Hive QL,Pig,SPARK and custom MR programs in Java
- Having hands on experience of Nifi(HDF) in building data routing & transformation dataflows and integrated with various components( Hdfs, Hive, Hbase, Solr, Mssql & Kafka) as source/target
- Experience in working with multiple data formats - Avro, Parquet, Json, Xml, Csv
- Utilized Oozie workflow to schedule Sqoop, Java, Hive,Hive2, Pig, MR & Shell script in HDP Kerberos Cluster
- Having experienced knowledge in HBase and Phoenix
- Hands on experience on Azure Cloud and Amazon AWS cloud services: EC2,S3,Data pipeline and EMR,S3
- In depth understanding knowledge of Hadoop Architecture and various components such as HDFS,YARN,Zookeeper and MapReduce concepts
- Having work experience in various Hadoop distributions(Cloudera, Hortonworks) & cloud platforms (AWS Cloud, Microsoft Azure)
TECHNICAL SKILLS:
Hadoop Distributions: Cloudera, Hortonworks
Cloud Platforms: Amazon Cloud and Microsoft Azure
Data Movement and integration: Nifi (HDF), Sqoop, Kafka, Amazon Kinesis
Search Engine: Solr, Elastic Search
Processing/Computing Frameworks: Pyspark, Spark, Spark Streaming, MapReduce, Storm
Query Languages: HiveQL, Spark Sql, Sql, Impala
Security: Kerberos, Ranger
File formats: Avro, Parquet, XML, JSON, CSV, XLSX
Workflow schedulers: Oozie, Unix Cron, APScheduler
Other Big Data Components: YARN, Zookeeper, Ambari, Hue, Tez, Pig
Cluster Installation: Hortonworks HDP 2.5 Using Ambari 2.4
Databases: HBase, Oracle, MsSql, Redshift
Languages: Java, Python, D3
Development / Build Tools: Eclipse, Maven,SVN, Jira, BitBucket, Confluence
Java Frameworks: Hibernate, Jboss Drools Engine
Operating Systems: Linux, Windows
EXPERIENCE:
Sr.Hadoop & Cloud Developer
Confidential, Beaverton, Oregon
Responsibilities:
- Involved in Discussions with business users to gather the required knowledge
- Analysing the requirements to Design and develop the framework
- Developed PySpark scripts to perform incremental updates on hive data .
- Developed airflow scripts to automate pyspark, hive, Athena scripts in required regular intervals
- Perform the continuous deployments / integration using Jenkins
Tools: /Components: AWS S3, Python 2.7, Spark 2.1.2, Airflow, Hive, AWS EMR, Athena
Confidential, CT
Sr.Hadoop & Cloud Developer
Responsibilities:
- Involved in Discussions with business users to gather the required knowledge
- Analysing the requirements to develop the framework
- Developed Java Spark streaming scripts to load raw files and corresponding processed metadata files into AWS S3 and Elasticsearch cluster.
- Developed Python Scripts to get the recent S3 keys from Elasticsearch
- Developed Python Scripts to fetch/get S3 files using Boto3 module .
- Implemented Pyspark logic to transform and process various formats of data like XLSX, XLS, JSON, TXT
- Built scripts to load pyspark processed files into Redshift Db
- Developed scripts to monitor and capture state of each file which is being through Pyspark logic
- Implemented Shell script to automate the whole process
Tools: AWS S3, Java 1.8, Maven, Python 2.7, Spark 1.6.1, Kafka, ElasticSearch 5.3, MapR Cluster, Amazon Redshift Db, Shell script
Python Modules: Boto3, pandas, Elasticsearch,certifi, pyspark, Psycopg2, Json,io
Confidential, Mi
Sr.Hadoop Developer
Responsibilities:
- Analysing the requirements to develop the framework
- Import plants event data from various plants using Apache Nifi
- Implement transformation logic on plant events using Apache Nifi
- Build HBase data lake and created secondary indexes using Phoenix
- Manage and tuning Hbase to improve Performance
Tools: /Components: Apache Nifi 1.0, JSON expression Language, HBase1.1.2, Phoenix 4.7, Microsoft Azure HDP Cluster-2.5
Confidential
Hadoop DeveloperResponsibilities:
- Build Solrcloud cluster with external Zookeeper quorum
- Index on real time HDFS plants events using Solr and Spark Streaming
- Index on HBase Cycle time events using lucidworks HBase indexer
- Build Banana dashboards
- Configurable changes in Banana to make them available to end users
Tools: /Components: Lucidworks-Solr 5.5.2, Apache Spark 1.6, HBase1.1.2, Banana 1.6 Dashboard, Java1.8, Shell Script, Microsoft Azure HDP Cluster-2.5
Confidential
Hadoop DeveloperResponsibilities:
- Importing data using Sqoop from MsSql Server into HDFS
- Build Hive scripts to perform queries and transformations
- Build oozie coordinator workflows of Sqoop and Hive to schedule daily and incremental jobs
- Helps to team to build SAP-BO reports on Hive using ODBC Driver
Tools: /Components: Sqoop 1.4, Hive 1.2, Oozie 4.2, Shell Script
Confidential
Hadoop Developer
Responsibilities:
- Align the accounting entries from source systems to populate Standard set of PSGL chartfields (Ledger, Business Unit, Account, Product, PC Code, Chartfield3, Original CCY, Base CCY.
- Summarised Accounting entries to be sent to PSGL in order to maintain the performance/EOD processing of PSGL.
- Detailed (non-summarised) accounting entries to be sent to ODS and any other downstream that require such information.
- Reduce Manual journal entries (MJE) posted across operations.
- Decommission legacy mainframe systems
- Faster Book closing
- Involved in Discussion with business users to gather the required knowledge
- Analysing the requirements to develop the framework.
- Importing data using Sqoop.
- Data Ingestion into Hive.
- Processing Hive data using Spark and Spark Sql
- Integration of JBoss Drools with Spark transformations
- Sending files to PSGL, ODS
Tools: /Components: Cloudera 5.4.3 Cluster, Apache Spark 1.3, JBoss Drools, Java, Maven, Sqoop, Hdfs
Confidential
Hadoop Developer
Responsibilities:
- Analysing the requirements to develop the framework.
- Data Ingestion into HDFS and then integrated into Hive.
- Develop the script integrate HBase with Hive data
- Build the scripts to index data using HBase lily indexer + Solr
- Developed Solr Java code to bring up the relation among materials
- Build the logic functionality to fetch the hierarchical data and give a provision to search with component number using Java, JSP integration
- Visualize the results using D3 Javascript dashboards
Tools: /Components: Cloudera CDH5.2,Solr,, Java, Sqoop, Hive, HBase,D3.js,JSP
Confidential
Hadoop DeveloperResponsibilities:
- Analysing the requirements to develop the framework.
- Developed Sqoop scripts and Data Services to pull delta data and store them into HDFS.
- Developed hive scripts to merge delta data with existing hive data.
- Worked on oozie scripts to schedule above process for every 30 mints
- Developed Reconciliation Java framework is used to record level comparison
Tools: /Components: Cloudera CDH5.2,, Java, Sqoop, Hive
Confidential
Hadoop Developer
Responsibilities:
- Analysing the requirements to develop the framework.
- Developed Sqoop scripts to import data from Oracle to Hdfs
- Developed Map-Reduce programs for Cleansing & Validating data on imported Hdfs data
- Implemented Custom key and partitioning techniques in MapReduce Programming
- Developed Hive table Structures to inject cleansed data
- Configuration changes in Hive, MR programming as a part of performance tuning
- Executed queries in impala to get query performance
- Built a workflow of Sqoop, MapReduce and Hive scripts and schedule them in Oozie via Hue
- Helps to tableau team to build reports by connecting Impala
Tools: /Components: Cloudera 4, Java, Sqoop, MapReduce Hive, Impala, Oozie, Hue