Senior Hadoop Developer Resume
New, YorK
PROFESSIONAL SUMMARY:
- 10 Years of IT industry experience encompassing a wide range of skill sets. Roles and industry verticals.
- Certified Big data (HADOOP) developer, Certified in IBM Database Developer and lean concepts
- Hands on experience in performing development and data analytics using HADOOP (BIG - DATA) tools and technologies which included HDFS, MAP REDUCE, HIVE, PIG, HBASE, SPARK, FLUME, SQOOP, DMX-sync sort, HDINSIGHT, AZURE and OOZIE.
- Strong database skills in DB2, HIVE, Oracle, PL/SQL, MySQL, BigSQL and No-SQL databases like HBASE, familiarity with CASSANDRA.
- Experienced in installing, configuring, managing, and testing Big-data “HADOOP” ecosystem components.
- Experienced in developing map reduce program using java.
- Used Apache Spark with Scala/Python for large-scale data processing, handling real-time analytics and designed ETL.
- Experienced in Data warehouse concepts and ETL tools (Teradata).
- Experienced usingTeradataSQL Assistant, data import/export, data loading with utilities like BTEQ, Multi Load, Fast Load, and Fast Export on UNIX environments.
- Experienced in Stored Procedure, Function, Trigger and macros, SQL Loader.
- Experienced in UNIX Shell Scripting
- Good knowledge in Maestro, StartTeam, Buildforge, TWS Schduler.
- Experienced with workflow schedulers, data architecture including data ingestion pipeline design and data modeling.
- Possess functional knowledge in the areas of Insurance systems, Financial Systems, Banking System and Healthcare System.
- Good experience in all phases of systems life cycle Development, Testing (Unit test, System test, Integration Testing and Regression Testing) and Pre-Production support.
- Proficient in analyzing and translating business requirements to technical requirements and architecture.
- Performed Knowledge management in the form of AIDs and Project knowledge and change documents.
- Experienced in handling internal and external functional, process and data audits.
TECHNICAL SKILLS:
Big Data Ecosystems: HDFS, Hive, Pig, Map Reduce, Spark Sqoop, HBase, Cassandra, Zookeeper, Flume, DMX, Oozie, Avaro and Hue
Languages: C#, .NET, Java, PL/SQL, Python, Scala Unix shell scripting, Hiveql, Pig scripts.
Data Base: MY SQL, BIGSQL, NOSQL, SQL SERVER, Oracle, Exadata, DMS1100 and DB2, PostgreSQL
Operating System: Unix, Windows, MVS/ESA, ZOS
ETL/Reporting: Teradata
Methodologies: Waterfall, Scrum, and Agile
Tools: RPM, MPP, Test Direct, TWS Scheduler, Clarity, Quality Center, Service Center, SFTP, Teradata Sql assistant, Toda, SSH, HUE, Eclipse, Maven, Putty, BigInsight, Cloudera, Beeline Connect, Visual Studio, Visual Code, Cute FTP, SQL Management Studio, Azure Devops, Team server,Powerbi
PROFESSIONAL EXPERIENCE:
Confidential
Senior Hadoop Developer
Responsibilities:
- Planning and designing & end to end setup ofAzuresandbox instances
- Actively involved in set up CDH and integrating withAzureData Lake store (ADLS).
- Developed numerous Spark jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0.
- Developed FTP scripts to bring files from different source to Hadoop data lake.
- Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
- Developed python script for NLP pattern search.
- Responsible for creating Hive tables, partitions, loading data and writing hive queries.
- Imported and exported the data using Sqoop betweenHadoopDistributed File System (HDFS) and Relational database systems.
- Responsible to build different reporting dashboards in Powerbi and publish to cloud for user.
Environment: Azure, HDFS, Azure Devops, HIVE, Scala, PYTHON, OOZIE, Java, SQL Server, UBUNTU/UNIX, Visual code, Scrum/agile, Powerbi, HUE, Putty, Cloudera, Beeline connect, TWS Scheduler.
Confidential
Lead Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hive, Impala, Scala & DMX
- Importing and exporting data into HDFS and Hive using Sqoop
- Developed FTP scripts to bring files from different source to Hadoop.
- Used DMX Sync sort to build ETL pipeline for packed data.
- Implanted AB switch logic with Oracle Exadata for parallel processing.
- Implemented Partitioning HIVE.
- Load and transform large sets of structured, semi structured and unstructured
- Deployed Algorithms in Pyspark, using complex datasets.
- Experience in using Sequence files AVRO and PARQUET file formats.
- Come up with project planning and estimations
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, PYTHON, HBASE, OOZIE, DMX sync sort yarn, Spark, Core Java, Oracle Exadata, UBUNTU/UNIX, eclipse, JDBC drivers, MySQL, Linux, XML, CRM, SVN, HUE, Putty, Cloudera, Beeline connect, TWS Scheduler.
Confidential, New York
Lead Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hive, Impala, Spark & Greenplum
- Implemented Partitioning, Bucketing in HIVE.
- Load and transform large sets of structured, semi structured and unstructured
- Deployed Algorithms in Scala with Spark, using complex datasets and done Spark based development with Scala
- Created Java UDFs in PIG and HIVE.
- Experience in using Sequence files, AVRO, PARQUET and TEXT file formats.
- Good working knowledge of Amazon Web Service components like EC2, EMR, S3, EBS, ELB
- Come up with estimations and Technical Design Specifications for projects.
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, HBASE, OOZIE, yarn, Spark, Core Java, Teradata, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, MySQL, Linux, AWS, XML, CRM, SVN, HUE, Putty, Cloudera, Beeline connect, TWS Scheduler.
Confidential, CA
Lead Hadoop Developer
Responsibilities:
- Create the project using HIVE, BIGSQL, PIG
- Involved in data modeling in Hadoop.
- Creating Hive tables and working on them using Hiveql.
- Written Apache PIG scripts to process the HDFS data.
- Involved in data modeling in Hadoop.
- Automated tasks using UNIX shell scripts.
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, Python, HBASE, OOZIE, yarn, Spark, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights
Confidential
Senior Developer
Responsibilities:
- Understand the requirement and build the HBASE data model
- Loaded history Data as well as incremental customer and other data to Hadoop through Hive.
- Importing and exporting large data sets from various data sources into HDFS using Sqoop.
- Load balancing of data across the cluster and performance tuning of various jobs running on the cluster.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Developed applications using Eclipse
- Performed process enhancement by SQL Tuning.
Environment: HADOOP, HDFS, MAPREDUCE, java, HIVE, Hue, PIG, Flume, SQOOP, HBASE, OOZIE, Yarn, Zookeeper eclipse, Maven, BigInsight
Confidential
Senior Developer
Responsibilities:
- Designed TDD (low level) from SRS (High level)
- Used Python script to transform the data.
- Fixed issues with the existing Fast Load/ Multi Load Scripts in for smooth loading of data in the warehouse more effectively.
- Created Bteq scripts with data transformations for loading the base tables.
- Generated reports usingTeradataBTEQ.
- Worked on optimizing and tuning theTeradataSQLs to improve the performance of batch and response time of data for users.
- Fast Export utility to extract large volume of data and send files to downstream applications
Environment: TeradataV2R12,TeradataSQL Assistant, MLOAD, FASTLOAD, BTEQ, Erwin, Unix Shell Scripting, Macros, Stored procedure, Db2, Cobol, Python, SAS, PL/SQL, FileZilla
Confidential
Developer
Responsibilities:
- Created and reorganized all types of database objects including tables, views, indexes, sequences, synonyms and setting proper parameters and values for all the objects.
- Wrote database triggers, stored procedures, stored functions, and stored packages to perform various automated tasks for better performance.
- Created Shell Scripts for invoking SQL scripts.
- Effectively made use of Table Functions, Indexes, Table Partitioning, Analytical functions, and Materialized Views
- Experience with Performance Tuning for Oracle RDBMS using Explain Plan and HINTS.
- Involved in the continuous enhancements and fixing of production problems.
- Verified and validated data using SQL queries.
Environment: Oracle 10g, .NET, SQL, PL/SQL, UNIX, SQL*Loader, SQL Navigator, TOAD, SQL DEVELOPER.