Sr. Hadoop Developer/Spark Developer Resume , Vancouver, WA - Hire IT People

SUMMARY:

7 years of extensive IT experience in steering projects from inception to delivery, passionate for turning data into products, actionable insights, and meaningful stories.
Strong experience working on Apache Hadoop ecosystem, Apache Spark and AWS.
3 years of relevant experience in Hadoop Ecosystem and architecture (Map Reduce, YARN, HDFS, Hbase, Impala, Drill, Hive, Pig, Oozie, Kylo, AWS, Apache Spark) for ingestion, storage, querying, processing and analysis of data.
Performance tuning in Hive & Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions and cost based optimization.
Hands on experience with data ingestion tools like Sqoop, Kafka, Flume and workflow management tools Oozie.
Hands on experience handling different file formats like JSON, AVRO, ORC and Parquet and compression techniques like snappy, zlib and lzo.
Hands on experience using Kylin for OLAP cube building and Drill for low latency queries for business users.
Experience on creating and analyzing HBase tables to load large sets of data coming from variety of portfolios.
Hands - on experience with creating dashboards and worksheets in Tableau.
Extensively used Scala, Spark improving the performance and optimization of the existing algorithms/queries in Hadoop and Hive using Spark Context, Spark-SQL (Data Frames and Datasets) and Pair RDD's.
Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and storing data in S3 buckets and creating Elastic Load Balancers(ELB) for Hadoop front end Web UI’s.
Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through Ambari and using IAM (Identity and Access Management) for creating groups, users and assigning permissions.
Extensively used ETL methodology for supporting data Extraction, Transformation and Loading in a corporate-wide- ETL solution using SAP BW with strong Knowledge on OLAP, OLTP, Extended Star, Star, Snowflake Schema methodologies.
Good Knowledge on in-memory data base technology i.e. SAP HANA.
Extensive programming experience in C# concepts like OOPS, Multithreading, Collections and IO.
Experience using Remedy for ticketing issues and Jenkins for continuous integration.
Successfully working in fast-paced collaborative environment, being a smooth team player with excellent interpersonal skills. Exceptional ability to quickly master in new concepts and technologies.
Seasoned in Agile-Scrum methodologies with a focus on perfecting quality and improving data availability.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, MapReduce, PIG, Hive, Hbase, Sqoop, Impala, Oozie, Drill, Kylin, Zookeeper, Flume, Kafka, Kylo, Elastic search, YARN and Spark.

Databases: Microsoft SQL Server, MySQL, OracleNoSQL: Hbase.

Scripting Languages: PHP, JavaScript, HTML, Python.

Tools: Eclipse, IntelliJ IDEA, Maven and SBT.

Platforms: Windows(2000/XP), Linux, Centos, and MacOS.

Programming Languages: Java, Scala, C# and C/C++.

Currently Exploring: Apache Kylo, Nifi, Flink and Alluxio.

PROFESSIONAL EXPERIENCE:

Confidential, Vancouver, WA

Sr. Hadoop Developer/Spark Developer

Responsibilities:

Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Oozie, Airflow, Spark, Drill, Kylin, Sqoop, Kylo, Nifi, EC2, ELB, S3 and EMR.
Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop.
Created hive plugin in Apache Drill and exposed it to BI Tool Birst for low latency queries for business users.
Created streaming cubes with data from Kafka and persist into Hbase for building OLAP cubes on Kylin.
Used parquet file format with snappy compression for performance and solved hive small files problem by using merge files, and merge mapred files in hive.
Converted existing Snowflake schema data into Star schema in hive for building OLAP cubes in Kylin.
Used AWS EMR (Elastic Map Reduce) for resource intensive transformation jobs.
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
Converted some existing sqoop, hive jobs to SparkSQL applications to read data from Oracle using JDBC and write it to hive tables.
Developed shell scripts for removal of orphan partitions for hive tables, and archive retention in HDFS.
Extensively Used Spark for improving the performance and optimization of the existing jobs in Hadoop using Spark Core and Spark-SQL.
Installed and configured apache airflow for workflow management and created workflows in python.
Working on a POC on using apache KYLO for data lake framework based on Apache Spark and NiFi. Kylo automates many of the tasks associated with data lakes, such as data ingest, preparation, discovery, profiling, and management.

Environment: Hive, Spark, Drill, S3, AWS, IAM, Impala, Tableau, Git, Kafka, Zookeeper, YARN, Unix shell scripting, Kylo, Zeppelin, Kylin, Hbase.

Confidential, Portland, OR

Hadoop Developer

Responsibilities:

Created data pipeline for different events of Adidas applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy.
Developed Impala scripts for end user / analyst requirements for adhoc analysis.
Used various Hive optimization techniques like partitioning, bucketing and Mapjoin.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location.
Developed UDF’s in spark to capture values of a key-value pair in encoded Json string.
Developed spark application for filtering Json source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Json files.

Environment: Hive, Spark, AWS S3, EMR, Cloudera, Jenkins, Shell scripting, Hbase, Airflow, Intellij IDEA, Sqoop, Impala.

Confidential

Hadoop Developer

Responsibilities:

Developed MapReduce programs using Java to perform various transformations, cleaning and scrubbing tasks and analyzed the information with Hive.
Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Collected the logs data from Web Servers and integrated in to HDFS using Flume.
Writing customized User Defined Function’s (UDF) to process further in Java to ease the processing in Hive.
Storing the data in Hive databases from HDFS and migrating the data into SQL Server from Hive using Sqoop.
Worked on performance tuning of Hadoop jobs by applying techniques such as Map Side Joins and Partitioning.
Addressing the issues occurring due to the huge volume of data and transitions.
Experience in designing both time driven and data driven automated workflows using Oozie.
Created design and technical documentation of the solution.

Environment: Hadoop, (HDFS & MapReduce), Hive, Sqoop, Flume, Oozie, Java, SQL Server.

Confidential

Java Developer

Responsibilities:

Responsible and active in analysis, design, implementation and deployment of full Software Development Lifecycle.
Designed and developed user interface using JSP, JavaScript, HTML and CSS.
Implemented the application using Spring MVC Framework which is based on MVC design pattern.
Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
Validated the fields of user registration screen and login screen by writing JavaScript validations.
Developed build and deployment scripts using Maven to customize WAR and EAR files.
Used Data Access Objects and JDBC for database access.
Developed stored procedures and triggers using PL/SQL to calculate and update tables to implement the logic.
Design and develop XML processing components for dynamic menus on the application.
Involved in postproduction support and maintenance of the application.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, Maven, Tomcat 6.

Confidential

Java Developer

Responsibilities:

Involved in the design and development of the entire application. Created UML diagrams (use case, class, sequence, and activity) based on the business requirements.
Designed and developed dynamic Web pages using HTML and JSP with Struts tag libraries.
Designed JSP layout using MVC. Used JavaScript for client-side validation and Struts Validator Framework for form validations.
Created data sources and helper classes which will be utilized by all the interfaces to access data and manipulate data.
Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
Wrote JUnit test cases to test the functionality of each method in the DAO layer. Used CVS for version control. Configured and deployed the WebSphere application Server.
Used Log4j for tracking errors and bugs in the project source code.
Prepared technical reports and documentation manuals for efficient program development.
Was awarded the “Most Valuable Player” from Senior Director Client side, for rendering exemplary.
Gave presentations and demos on java and j2ee to cross functional teams, stakeholders.

Environment: JSP, HTML, Servlets, Struts Framework, JavaScript, XML, JDBC, Oracle9i, PL/SQL, WebSphere, Eclipse, JUnit, CVS, Log4j.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/spark Developer Resume

Vancouver, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship