Sr.Hadoop & Cloud Developer Resume Beaverton, Oregon - Hire IT People

SUMMARY:

5.7 years of experience as a professional Hadoop developer in Batch and real - time data processing using various Hadoop components- Spark, Solr, Kafka, Hbase, Hive, Nifi, Sqoop, Storm and Java
Having experience in building Hortonworks Hadoop Cluster - HDP2.5
Having working experience in PySpark with AWS Cloud components like S3, Redshift Db
Deputed to TCS-Singapore for a period of 6 months to work closely for Confidential to build a Business layer for various transactional sources using Hadoop Components
Experience in working with Pyspark, Spark, Spark Streaming & Spark Sql and also in extending Spark integration with various components - Solr, Kafka, HDFS, Hbase, Hive & Amazon Kinesis
Extensive Experience in building SolrCloud cluster, Confidential, Banana dashboards and in extending prepare custom Solr schema & configurations
Utilized Storm, Kafka & Amazon kinesis for processing large volume of datasets
Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to RDBMS and vice-versa
Experience in working with MR,PIG scripts & HIVE query language, Hcatalog and also in extending Hive and Pig functionality by writing custom UDFs
Experience in analyzing data using Hive QL,Pig,SPARK and custom MR programs in Java
Having hands on experience of Nifi(HDF) in building data routing & transformation dataflows and integrated with various components( Hdfs, Hive, Hbase, Solr, Mssql & Kafka) as source/target
Experience in working with multiple data formats - Avro, Parquet, Json, Xml, Csv
Utilized Oozie workflow to schedule Sqoop, Java, Hive,Hive2, Pig, MR & Shell script in HDP Kerberos Cluster
Having experienced knowledge in HBase and Phoenix
Hands on experience on Azure Cloud and Amazon AWS cloud services: EC2,S3,Data pipeline and EMR,S3
In depth understanding knowledge of Hadoop Architecture and various components such as HDFS,YARN,Zookeeper and MapReduce concepts
Having work experience in various Hadoop distributions(Cloudera, Hortonworks) & cloud platforms (AWS Cloud, Microsoft Azure)

TECHNICAL SKILLS:

Hadoop Distributions: Cloudera, Hortonworks

Cloud Platforms: Amazon Cloud and Microsoft Azure

Data Movement and integration: Nifi (HDF), Sqoop, Kafka, Amazon Kinesis

Search Engine: Solr, Elastic Search

Processing/Computing Frameworks: Pyspark, Spark, Spark Streaming, MapReduce, Storm

Query Languages: HiveQL, Spark Sql, Sql, Impala

Security: Kerberos, Ranger

File formats: Avro, Parquet, XML, JSON, CSV, XLSX

Workflow schedulers: Oozie, Unix Cron, APScheduler

Other Big Data Components: YARN, Zookeeper, Ambari, Hue, Tez, Pig

Cluster Installation: Hortonworks HDP 2.5 Using Ambari 2.4

Databases: HBase, Oracle, MsSql, Redshift

Languages: Java, Python, D3

Development / Build Tools: Eclipse, Maven,SVN, Jira, BitBucket, Confluence

Java Frameworks: Hibernate, Jboss Drools Engine

Operating Systems: Linux, Windows

EXPERIENCE:

Sr.Hadoop & Cloud Developer

Confidential, Beaverton, Oregon

Responsibilities:

Involved in Discussions with business users to gather the required knowledge
Analysing the requirements to Design and develop the framework
Developed PySpark scripts to perform incremental updates on hive data .
Developed airflow scripts to automate pyspark, hive, Athena scripts in required regular intervals
Perform the continuous deployments / integration using Jenkins

Tools: /Components: AWS S3, Python 2.7, Spark 2.1.2, Airflow, Hive, AWS EMR, Athena

Confidential, CT

Sr.Hadoop & Cloud Developer

Responsibilities:

Involved in Discussions with business users to gather the required knowledge
Analysing the requirements to develop the framework
Developed Java Spark streaming scripts to load raw files and corresponding processed metadata files into AWS S3 and Elasticsearch cluster.
Developed Python Scripts to get the recent S3 keys from Elasticsearch
Developed Python Scripts to fetch/get S3 files using Boto3 module .
Implemented Pyspark logic to transform and process various formats of data like XLSX, XLS, JSON, TXT
Built scripts to load pyspark processed files into Redshift Db
Developed scripts to monitor and capture state of each file which is being through Pyspark logic
Implemented Shell script to automate the whole process

Tools: AWS S3, Java 1.8, Maven, Python 2.7, Spark 1.6.1, Kafka, ElasticSearch 5.3, MapR Cluster, Amazon Redshift Db, Shell script

Python Modules: Boto3, pandas, Elasticsearch,certifi, pyspark, Psycopg2, Json,io

Confidential, Mi

Sr.Hadoop Developer

Responsibilities:

Analysing the requirements to develop the framework
Import plants event data from various plants using Apache Nifi
Implement transformation logic on plant events using Apache Nifi
Build HBase data lake and created secondary indexes using Phoenix
Manage and tuning Hbase to improve Performance

Tools: /Components: Apache Nifi 1.0, JSON expression Language, HBase1.1.2, Phoenix 4.7, Microsoft Azure HDP Cluster-2.5

Confidential

Hadoop Developer

Responsibilities:

Build Solrcloud cluster with external Zookeeper quorum
Index on real time HDFS plants events using Solr and Spark Streaming
Index on HBase Cycle time events using lucidworks HBase indexer
Build Banana dashboards
Configurable changes in Banana to make them available to end users

Tools: /Components: Lucidworks-Solr 5.5.2, Apache Spark 1.6, HBase1.1.2, Banana 1.6 Dashboard, Java1.8, Shell Script, Microsoft Azure HDP Cluster-2.5

Confidential

Hadoop Developer

Responsibilities:

Importing data using Sqoop from MsSql Server into HDFS
Build Hive scripts to perform queries and transformations
Build oozie coordinator workflows of Sqoop and Hive to schedule daily and incremental jobs
Helps to team to build SAP-BO reports on Hive using ODBC Driver

Tools: /Components: Sqoop 1.4, Hive 1.2, Oozie 4.2, Shell Script

Confidential

Hadoop Developer

Responsibilities:

Align the accounting entries from source systems to populate Standard set of PSGL chartfields (Ledger, Business Unit, Account, Product, PC Code, Chartfield3, Original CCY, Base CCY.
Summarised Accounting entries to be sent to PSGL in order to maintain the performance/EOD processing of PSGL.
Detailed (non-summarised) accounting entries to be sent to ODS and any other downstream that require such information.
Reduce Manual journal entries (MJE) posted across operations.
Decommission legacy mainframe systems
Faster Book closing
Involved in Discussion with business users to gather the required knowledge
Analysing the requirements to develop the framework.
Importing data using Sqoop.
Data Ingestion into Hive.
Processing Hive data using Spark and Spark Sql
Integration of JBoss Drools with Spark transformations
Sending files to PSGL, ODS

Tools: /Components: Cloudera 5.4.3 Cluster, Apache Spark 1.3, JBoss Drools, Java, Maven, Sqoop, Hdfs

Confidential

Hadoop Developer

Responsibilities:

Analysing the requirements to develop the framework.
Data Ingestion into HDFS and then integrated into Hive.
Develop the script integrate HBase with Hive data
Build the scripts to index data using HBase lily indexer + Solr
Developed Solr Java code to bring up the relation among materials
Build the logic functionality to fetch the hierarchical data and give a provision to search with component number using Java, JSP integration
Visualize the results using D3 Javascript dashboards

Tools: /Components: Cloudera CDH5.2,Solr,, Java, Sqoop, Hive, HBase,D3.js,JSP

Confidential

Hadoop Developer

Responsibilities:

Analysing the requirements to develop the framework.
Developed Sqoop scripts and Data Services to pull delta data and store them into HDFS.
Developed hive scripts to merge delta data with existing hive data.
Worked on oozie scripts to schedule above process for every 30 mints
Developed Reconciliation Java framework is used to record level comparison

Tools: /Components: Cloudera CDH5.2,, Java, Sqoop, Hive

Confidential

Hadoop Developer

Responsibilities:

Analysing the requirements to develop the framework.
Developed Sqoop scripts to import data from Oracle to Hdfs
Developed Map-Reduce programs for Cleansing & Validating data on imported Hdfs data
Implemented Custom key and partitioning techniques in MapReduce Programming
Developed Hive table Structures to inject cleansed data
Configuration changes in Hive, MR programming as a part of performance tuning
Executed queries in impala to get query performance
Built a workflow of Sqoop, MapReduce and Hive scripts and schedule them in Oozie via Hue
Helps to tableau team to build reports by connecting Impala

Tools: /Components: Cloudera 4, Java, Sqoop, MapReduce Hive, Impala, Oozie, Hue

We provide IT Staff Augmentation Services!

Sr.hadoop & Cloud Developer Resume

Beaverton, OregoN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship