Hadoop And Spark Developer Resume
New, YorK
SUMMARY
- Over all 5+ years of professional experience as Hadoop Developer using Apache Spark Framework and also Oracle Database Administrator
- Hands on experience in installing configuring and using Hadoop ecosystem components like Apache Spark, HDFS, HBase, Spark SQL, Sqoop, Zookeeper, Kafka, and Flume.
- Good Knowledge on Apache Cassandra and Mongo DB
- Hands - on fundamental building blocks of Spark - RDDs and related manipulations for implementing business logics Like Transformations, Actions and Functions performed on RDD.
- Depth understanding of Data-frames and Data-Sets in Spark SQL
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Designed good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Created Hive external tables, views and scripts for transformations such as filtering, aggregation and partitioning tables.
- Expert in performing business analytical scripts using Hive SQL.
- Worked on IDE’s such as Eclipse and IntelliJ for developing, deploying and debugging the applications.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.
- Experienced in work with different file formats like Text, Sequence, Xml and JSON.
- Expertise in working with relational databases such as Oracle 10g, SQL Server 2012.
- Good knowledge of stored procedures, functions, etc. using SQL and PL/SQL.
- Configured Oracle Data guard for Disaster Recovery implementations.
- Planning and Support upgrades oracle 10g to 11g to 12c. RMAN Backup and Recovery strategies. Oracle High Availability Solutions
- Hands on Data Analysis, Logical and Physical Design, Backup and Recovery, Performance and Tuning, Database installation, and upgrades
- Collaborated with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.
- Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation
- Excellent Communication Skills, Ability to perform at a high level and meet deadlines
TECHNICAL SKILLS
Big Data: HDFS, Apache Spark, Spark SQL, Spark streaming, Zookeeper, Hive, Sqoop, HBase, Kafka, Flume, Yarn, Cassandra, Mongo dBLanguages Java, Scala, SQL/PLSQL, Shell Scripting.
Java Technologies: JSP, Servlets, JDBC, OOPS Concept
Database: MySQL, Mongo DB, Cassandra, Oracle 10g/11g, Microsoft SQL Server 2014
IDE / Testing Tools: Eclipse, IntelliJ IDEA
Operating System: Windows, UNIX, Linux
Tools: SQL Developer, Maven. Hue, TOAD
PROFESSIONAL EXPERIENCE
Confidential, New York
Hadoop and Spark Developer
Responsibilities:
- Involved in requirement gathering to connect with business Analysis.
- Responsible for creating technical Documents like High-Level Design and low-Level Design specifications.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
- Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Used Sqoop to transfer data between RDBMS and HDFS.
- Worked with business functional lead to review and finalize requirements and data profiling analysis.
- Implemented complex Spark programs to perform Joins from Different tables
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Scala, Spark-SQL, Data Frame, and Pair RDD's.
- Responsible for creating tables based on business requirements
- Show data visualization and to generate reports for clear result.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, XML and JSON.
- Utilized Agile Scrum Methodology to help manage and organize a Project with professor and regular code review sessions.
Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.8, CHD 5, Sqoop, MySQL, CentOS Linux
Confidential
Hadoop and Spark Developer
Responsibilities:
- Worked with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed iterative algorithms using Spark Streaming in Scala for near real-time dashboards.
- Developed custom aggregate functions using Spark SQL and performed interactive querying
- Designed good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Created Hive external tables, views and scripts for transformations such as filtering, aggregation and partitioning tables.
- Handled importing of data from various data sources, performed transformations using Hive, and loaded data into Teradata to HDFS
- Expert in performing business analytical scripts using Hive SQL.
- Responsible for Building automation jobs and scheduling using atomic scheduler with aorta framework,
- Worked with data in multiple file formats including Parquet, Sequence files and Text/ CSV.
- Expertise in creating Thought-spot pin boards and Bringing all data for reports as per Business Requirements
- Participate in meetings with clients (internal and external), assist in framing projects and designing solutions based on client needs and problems to be solved
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Gained very good business knowledge on different category of products and designs within.
- Involved in developing Thought spot reports and work flows automated to load data
Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.7, Sqoop, Eclipse, MySQL, AWS EC2, HBase, CentOS Linux and ZooKeeper
Confidential
Hadoop and Spark Developer
Responsibilities:
- Involved in Requirement Gathering to connect with BA.
- Working Closely with BA & Client for creating technical Documents like High-Level Design and low-Level Design specifications.
- Implemented best income logic using Spark SQL
- Experienced on loading and transforming of large sets of structured data, semis structured data and unstructured data.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing RDDS to schedule various Hadoop Program.
- Written SPARK SQL Queries for data analysis to meet the business requirements.
- Experienced in defining job flows.
- Cluster coordination services through Kafka and Zookeeper.
- Serializing JSON data and storing the data into tables using Spark SQL.
- Writing Shell scripts to automate the process flow.
- Storing the extracted data into HDFS using Flume
- Experienced in multiple file formats including XML, JSON, CSV and other compressed file formats
- Experience on Kafka and Spark integration for real time data processing
- Developed Kafka producer and consumer components for real time data processing.
- Experienced writing queries in Spark SQL using Scala
- Communicated all issues and participated in weekly strategy meetings
Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.7, Sqoop, Eclipse, MySQL, CentOS Linux, ZookeeperProject 4
Confidential
Oracle Database Administrator
Responsibilities:
- Managing over 20 critical applications single.
- Configuring Dataguard for OLTP databases.
- Upgrading databases to 12C.
- Work with Application team to performance related issues.
- Rebuilding of Indexes for better performance, maintenance of Oracle Database.
- Generated performance reports and Daily health checkup of the database using utilities like AWR, Statspack to gather performance statistics.
- Identified and tuned poor SQL statements using EXPLAIN PLAN, SQL TRACE and TKPROF, analyzed tables, indexes for improving the performance of the Query.
- Troubleshooting various issues like database connectivity to users, privileges issue.
- Created users and allocated appropriate table space quotas with necessary privileges and roles for all databases.
- Wrote script to monitor the database with shell and PL/SQL code or SQL code such as procedure, function and package.
- Created or cloned the oracle Instance and databases on ASM. Performed database cloning and re-location activities.
- Managed tablespaces, data files, redo logs, tables and its segments.
- Maintained data integrity also managed profiles, resources and password security manage Users, privileges and roles.
- Performed RMAN backups, restores, cloning, or refreshing databases and applications.
- Monitoring and planning ASM storage in all databases.
Environment: Oracle 11g/12C, TOAD, Linux, UNIX, Putty, E-Manager, SQL SERVER, Windows Server, Web services, WebLogic