Big Data Developer Resume
Phoenix, AZ
SUMMARY
- Performance oriented Big Data Developer with proven record in developing multiple projects for financial, Retail, health care, aerospace and consumer electronics domains.
- 8 years of IT industry experience in designing, implementing and maintaining very large, complex Big Data and data warehouse solutions.
- Possess strong analytical and problem - solving skills with proven ability to mentor and provide technical leadership to other team members.
- Design and development of streaming and batch-oriented Big data analytics solutions, with hands-on experience using a wide range of cutting edge technologies including Apache Spark, HDFS, Scala, Kafka, Apache NIFI, Hive, PIG, SQOOP, HBase, Hue, Zookeeper, YARN and Cassandra
- Excellent experience in Cloudera and Hortonworks Hadoop distribution and maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS)
- In-depth understanding of Spark Architecture including Spark Core, Spark Streaming, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Knowledge of Spark, RDDs, DAGs, Spark Streaming, Spark SQL and Graph X.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
- Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
- Experience in working with Map Reduce programs using Apache Hadoop to analyze large data sets efficiently.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using SQOOP, Pig & OOZIE.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Expertise in using Kafka as a messaging system to implement real-time Streaming solutions.
- Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS and Amazon EC2, Amazon EMR.
- Good understanding and knowledge of NoSQL databases like Cassandra and HBase.
- Experience in handling various file formats like AVRO, Sequential, Parquet etc.
- Writing complex SQL queries using analytical functions and developing high performance PL/SQL code, Oracle stored procedures, functions, packages, triggers and various database objects.
- ETL mappings and workflows to extract and load data into Staging using Informatica.
- Well-versed with ETL Informatica performance tuning process involving bottleneck identifications, analyzing thread statistics, optimizing components and using parallel partitions.
- Experienced in writing UNIX shell scripts for job scheduling and automation
- Proficient level of experience in Core Java, JEE technologies as JDBC, Servlets, JSP and Web-Services (SOAP and RESTful).
- 6 years of experience in agile SCRUM methodology - sprint planning, creating backlogs/stories, sprint reviews, delivery and retrospection.
TECHNICAL SKILLS
Big Data/ Hadoop Technologies: Spark 2.0, Scala, Cloudera Hadoop, Amazon AWS (S3, EC2, Redshift, EMR, Data Pipeline), Hive, Pig, HBase, Kafka, Storm, SQOOP, Flume, Apache NiFi, Java, Cassandra and Zookeeper
Programming Languages: Scala, Spark, Python, Oracle 11g/10g, SQL, PL/SQL, JAVA/J2EE, C, C++, C#
ETL & Other Tools: IntelliJ IDEA, Eclipse, JIRA, Informatica, Datastage 8.1, Business Objects BI 4.0, Tableau, Kerberos, SBT
Scripting Languages: Python, Shell, Windows Batch Script, Perl, DevOps
Database Design Tools: Erwin 7.2, MS Visio
Version Control: Perforce, Clear Case, Git, Github
Operating Systems -: Windows, Unix (Solaris 9/10), Linux, HP-UX
Databases: Database administration, backup/restore, monitoring, Oracle Streams, SQL Server, MySQL and Amazon Red Shift
PROFESSIONAL EXPERIENCE
Big Data Developer
Confidential - Phoenix, AZ
Responsibilities:
- Work with BA's and end users to define and process requirements.
- Design and develop ETL work flows for delivery of data from source systems into ODS and downstream data marts and files.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Efficient joins & Transformations.
- Design and develop Spark programs to filter, transform data using RDD, Dataset/Data frame APIs.
- Design and develop Spark programs for data cleansing, preparation and manipulation
- Import data into stage tables from various data sources using SQOOP.
- Performed POC on Spark streaming using Kafka to achieve real time data analytics.
- Used OOZIE scheduler to create workflows and scheduled jobs in Hadoop Cluster
- Written Hive UDFs to extract data from staging tables.
- Involved in creating Hive tables & views to load, transform the data
- Involved in writing Pig scriptsand tuning of Sparkjobs for performance optimization.
- Supported and Monitored Map Reduce Programs running on the cluster and provide production support.
- Worked on data quality framework to generate reports to business on the quality of data processed in Hadoop
- Troubleshoot data issues and provide solutions to the issues.
- Participated in Sprint review/retro meetings, daily SCRUM meetings and give the daily status report.
Environment: Java 1.8, Scala 2.11, Spark 1.6, Hadoop, Pig0.12, Hive 1.1, Map Reduce, HDFS, My SQL, SQOOP 1.4.6, CDH5.8.2, OOZIE, Eclipse, Avro, Parquet, Toad, Shell Scripting, Teradata, Impala, J2EE.
Big Data Developer
Confidential - Woonsocket, RI
Responsibilities:
- Participated in requirements sessions to gather requirements from product owners.
- Written MapReduce code to process and parsing the data from various sources.
- Hands on experience in loading data from UNIX file system to HDFS. Also performed parallel transfer of data from landing zone to the HDFS file system using DistCp.
- Used SQOOP to import structured and semi structured data into Hadoop distributed file system for further processing.
- Automate processes in Cloudera environment and building OOZIE workflows.
- Written Hive UDF to sort Structure fields and return complex data type.
- Created Hive tables based on the business requirements.
- Hive queries were used to analyze the large data sets.
- Handled data from different datasets, join and preprocess them using Pig join operations.
- Transferred the analyzed data across relational database from HDFS using SQOOP enabling BI team to visualize analytics.
- Involved in creating Hive tables, loading data and running hive queries in those data
- Developed custom aggregate functions using Spark SQL and performed interactive querying
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
- Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.
- Involved in processing the data in the Hive tables using HQL high-performance, low-latency queries.
- Extensive working knowledge of partitioned table, UDFs, performance tuning, compression-related properties in Hive.
- Working on QA support activities, test data creation and unit testing activities.
- Reviewing and managing Hadoop log files.
- Monitor Autosys jobs and resolve issues in case of failure.
- Participated in daily SCRUM meetings and give the daily status report.
Environment: Hadoop, HDFS, Java, MR, HIVE, SQOOP, Spark SQL, HQL, OOZIE, Autosys, Oracle, Putty, Confluence.
Big Data Engineer
Confidential, Texas
Responsibilities:
- Worked in the Big Data Competency Centre designing and requirements gathering for a DWH to house external as well as in house data.
- Migrated 16 years of historical data into Hadoop.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HDFS.
- Used SQOOP to import ePOS, primary/secondary sales data from Enterprise Data Warehouse(EDW) into HDFS, perform transformations, publish the data into parquet format.
- Responsible for building ETL pipeline to process data.
- Designed and developed HIVE UDF’s to filter, evaluate, load and store the data.
- Based upon business use cases, responsible for regular ad-hoc execution of pig and Hive queries.
- Responsible for loading data to and from HDFS.
- Developed data quality reports to check the correctness of the processed data using shell and BTEQ scripts for business users.
- Created Hive tables, loaded data into it and written Hive queries to perform data analysis.
- Designed and Developed social media (twitter, FB, Google Analytics) digital data platform using flume and Hive on Hadoop architecture on free Cloudera Hadoop Distribution.
- Used OOZIE workflow to schedule interdependent Hadoop jobs and monitored using Autosys.
- Involved in code migration, validation and support during PROD environment.
Environment: Hadoop, HDFS, Python, Java, MR, HIVE, SQOOP, OOZIE, Autosys, Oracle, Putty, Confluence
Oracle Developer
Confidential, AZ
Responsibilities:
- Prepared necessary documentation like requirement documents, program specs, test specs and Release Notes adhering to predefined standards.
- Interpretation and understanding of business requirement documents.
- Involve in writing complex SQL queries.
- Managing database schema Objects - Tables, Indexes, Views, Sequences and synonyms
- Used oracle 10g SQL and PL/SQL for mass updates and error checking in the database.
- Responsibilities include doing PLSQL & Java Build, updating process documents, project plan, Code Reviews, Defect Tracking mechanism, Decision Analysis and Root Cause Analysis and status.
- Designing a maintenance plan for data warehouse: purging of old data
- Development of new modules, code enhancements for performance enhancement
- This includes development and release of UNIX Shell scripts and Oracle components such as PL/SQL procedures, packages and SQL scripts
- High volume of data loading and data movement using oracle 9i SQL and PL/SQL Bulk collect functionality
- Managing database schema Objects, Partitioning the fact tables
- ETL mappings and workflows to extract and load data into Staging using Informatica.
- Replication of data between prod server and reporting server using oracle streams created UNIX shell scripts for scheduling and automation
- Designed, created and maintained Oracle Forms & Reports
- Troubleshooting and bug fixing.
- Migrating data from prod server to dev and QA databases by writing stored procedures and EXPORT & IMPORT utilities.
- Software release management and Maintenance of the overall backend applications and processes.
- Application support (3rd line support) of generating navigation databases - Carrying out root cause analysis of various incidents/defects and providing database fixes/hot-fixes in patches ensuring incidents are resolved within defined SLA’s.
- Involve in developing, customizing and maintaining database activities using UNIX shell scripts, PL/SQL to automate the data manipulation.
- Involved in Setting up Change Control Board (CCB) Meeting, attending the meeting, recording the Board’s decision.
- Involved in doing the Dev/QA and production build.
Environment: Linux, Sun Solaris, Windows, Java, WebLogic, SQL, PL/SQL, Oracle 10g, Oracle Forms and Reports, ERwin, Informatica, VBA 5.0/6.0, C, Shell Scripting, TOAD, CVS, CM21
Oracle Developer
Confidential
Responsibilities:
- Design, develop, amend & optimization of database schema design.
- Query optimization, Performance Tuning, generating and analysing the AWR Reports.
- Responsible for development of database stored procedures, functions, packages and triggers using Oracle SQL, PL/SQL and defining test plans, test data/parameters, and evaluate test results to ensure that system functionality/outputs meet system specifications.
- Estimate the effort and time required for implementation of database changes and providing the technical solution for the database service/change requested by applications teams.
- Involved in implementing the scheduler through which we will be able to schedule/view the test case execution details, estimate the effort and time required for implementation of database changes
- Involve in writing complex SQL queries and massive data load using PL/SQL Bulk collect techniques
- Provide deployment support for the project releases.
- Acting as a gatekeeper to the QA and UAT to ensure acceptable application performance.
- Wide table implementation using New features of Oracle 11g.
- Involved in Database Design for Scheduler using Enterprise Architect tool and forwarding Engineering the model.
- Cloning the Databases, oracle streams and replication.
Environment: Sun Solaris, Windows, Java, J2EE, XML, Web Services, Oracle WebLogic, SQL, PL/SQL, Oracle 10g/11g, RAC, ERwin, Business Objects, Oracle ER - Golden Gate, TOAD, VSTS, Clarity.