Spark/Sr. Hadoop Developer Resume Round Rock, TX - Hire IT People

SUMMARY

6+ years of experience in software development
4+ years of professional IT experience in ingestion, storage, querying, processing and analysis of Big Data using Big data ecosystem related technologies likeHadoopHDFS, Map Reduce, Apache Pig, Hive, Sqoop, Hbase, Flume, Oozie, Spark, Cassandra, Kafka and Zookeeper
2 Years of experience as JAVA and J2EE Developer
Hands on experience in Installing, Configuring and Troubleshooting Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop,Spark, Flume, Zookeeper, Kafka & Impala
Hands on experience in working with ClouderaCDH3 and CDH4 platforms
Good understanding of HDFS Designs and HDFS high availability (HA)
Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node
Experience in working with Hadoop in Stand - alone, pseudo and distributed modes
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
Experience in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON, Parquet, ORC and Avro
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice-versa
Hands on experience using YARN and tools like Pig and Hive for data analysis and Zookeeper for coordinating cluster resources
Expert in working with Hive data warehouse for creating tables anddistributing data by implementing Partitioning and Bucketing
Expertise in implementing complex Ad-hoc queries using Hive QL
Extending HIVE and PIG core functionality by developing custom User Defined Functions (UDF)
Expertise in implementing Spark and Scala applications using higher order functions for both batch and interactive analysis requirement
Good working experience in using Spark SQL to manipulate Data Frames in Python
Experiencein working with Spark tools like RDD transformations, Spark MLlib and spark SQL
Experience in executing Spark SQL queries against data in Hive in spark context andensured performance optimization
Transform, move, and synchronize data across all heterogeneous sources and targets using Talend
Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data
Worked with Tableau to connect with Hive Data Warehouse to represent the data in dashboards
Hands on experience of UNIX and shell scripting
Extensive knowledge in using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC and Front Controller
Expertise in using IDE like Eclipse, WebSphere (WSAD), NetBeans, My Eclipse, WebLogic Workshop
Experience in developing and designing Web Services (SOAP and Restful Web services)
Experience in developing Web Interface using Servlets, JSP and Custom Tag Libraries
Experience in developing applications using SCRUM methodology and Agile Methodology

TECHNICAL SKILLS

Hadoop/Big Data: Apache Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Flume, Zookeeper, Impala, Spark, Ambari, Impala, Kafka, YARN, HDFS, Talend, Ranger, Hortonworks and Cloudera distributions

NOSQL Databases: Apache HBase, Cassandra

RDBMS: Oracle, MySQL, SQL Server, Teradata, DB2

Languages: C, C++, Java, Scala, Python, PL/SQL, Transact SQL

Scripting Languages: Unix, Perl, Java Script, Linux Bash Shell Scripting

Operating System: Windows 8/7/Vista, Red Hat, Ubuntu

Application Servers: Web Logic, Web Sphere, Apache Tomcat

Other Tools: Putty, WinSCP, FileZilla, Toad, MAVEN, Autosys, Jmeter, WinSCP, Jenkins, GitHub, Subversion

Methodologies: Agile, SCRUM, Waterfall, Lean, Kanban

Collaboration Tools: SharePoint, Wiki, Confluence, Team Foundation Server (TFS), JIRA

PROFESSIONAL EXPERIENCE

Spark/Sr. Hadoop Developer

Confidential, Round Rock, TX

Responsibilities:

Involved in all phases of Software Development Life Cycle (SDLC) activities such as development, implementation and support forHadoop
Imported and exported data from different Relational Database Systems like Mysql and Oracle into HDFS and Hive and vice-versa, using Sqoop
Developed data pipe line using Kafka, HBase, Spark and Hive to ingest, transform and analysedata
Migrated complex Map Reduce programs into Apache Spark RDD transformations
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data and handled Data Skewness in Spark-SQL
Used Talend for connecting, cleansing and sharing cloud and on-premises data
Developed Spark, Pig and Hive Jobs to summarize and transform data
Designed and developed Pig and Hive UDF's for Data enrichments
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
Worked streaming data using Kafka and Spark Streaming for Data preparation
Used HBase for creating Snapshot tables for updating and deleting records
Working with AWS to migrate the entire Data Centres to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services
Developed entire spark applications in python(PySpark) on distributed environment
Migrated tables from SQL Server to HBase, which are being used actively till date
Worked on performance tuning of Hive and Spark jobs
Worked on creating secondary indexes in Hbase to join tables

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Kafka, Zookeeper, Spark, Python, Talend, Hbase, Scala, Shell Scripting, Maven, MapReduce, Amazon EMR, EC2, S3

Sr. Hadoop Developer

Confidential, Woonsocket, RI

Responsibilities:

Imported data using Sqoop to load data from MySQL to HDFS on regular basis
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis
Developed multiple Kafka Producers and Consumers from scratch implementing organization's requirements
Extensively worked with Talendfor data discovery, visualization and enrichment
Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions and TTL
Wrote and tested complex MapReducejobs for aggregatingidentified and validated data
Created Managed and External Hive tables with static/dynamic partitioning
Written Hive queries for data analysis to meet the Business requirements
Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them
Extensively involved in performance tuning of the HiveQL by performing bucketingon large Hive tables
Used open source web scraping framework for python to crawl and extract data from web pages
Optimized the Hive queries by setting different combinations of Hive parameters
Developed UDFs(User Defined Functions)to extend core functionality of PIG and HIVE queriesas per requirement
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data
Used Spark API overHadoopto analyse data in Hive
Worked on implanting pipeline using Kafka, Spark Streaming for streaming Data
Implemented workflow using Oozie for running Map Reduce jobs and Hive Queries

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Talend, Apache Kafka, Zookeeper, Spark, Hbase, Python, Shell Scripting, Oozie, Maven, Hortonworks.

Hadoop Developer

Confidential, New York City, NY

Responsibilities:

Responsible for building scalable distributed data solutions usingHadoop
Worked comprehensively with Apache Sqoop and developed Sqoop scripts to interface data from a MySQL database into theHadoopDistributed File System (HDFS)
Utilize parallel processes of theHadoopFramework to ensure resource efficiency
Created Managed tables and External tables in Hive and loaded data from HDFS
Worked on debugging and performance tuning of Hive & Pig Jobs
Used python sub-process module to perform UNIX shell commands
Extracted data from Agent Nodes into HDFS using Python scripts
Implemented Hive Generic UDF's to in corporate business logic into Hive Queries
Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website etc.
Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Used pig and hive Upon the Hcatalog tables to analyse the data and create schema for the Hbase tables in Hive
Coordinated with the BI team to visualize the transformed data into a dashboard using Tableau
Assisted in creating and maintaining Technical documentation to launchingHadoopClusters and executing Hive queries and Pig Scripts

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Zookeeper, Hbase, Python, Shell Scripting, Oozie, Maven, Cloudera, Tableau

Hadoop Developer

Confidential, Chicago, IL

Responsibilities:

Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on theHadoop cluster
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
Implemented scripts to transmit data from Oracle to HBase using Sqoop and vice-versa
Worked on bucketing and partitioning the HIVE table and running the scripts in parallel to reduce the run time
Developed and optimized Map Reduced Jobs to use HDFS efficiently by using various compression mechanisms
Analysed data by preforming Hive queries and running Pig scripts
Developed Spark Code using python for faster processing of data
Implemented business logic by writing Pig UDF's in Java and used various UDF's from Piggybanks and other sources
Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required
Exported the analysed data to the relational databases using Scoop for visualization and to generate reports for the BI team
Implemented testing scripts to support test driven development and continuous integration

Environment: Hadoop, Map Reduce, HDFS, Hive, Sqoop, Pig, Java, Python, Flume, Oozie, Java, Maven, Eclipse.

Java/J2EE Developer

Confidential, Atlanta, GA

Responsibilities:

Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
Developed and integrated REST web services to display data or search results
Designed CSS based page layouts that are cross-browser compatible and standards-compliant
Responsible for design and development of the web pages from mock- ups
Designed and developed creative intuitive user interfaces that address business and end-user needs, while considering the technical, physical and temporal constraints of the users
Used Bootstrap library to quickly build project UI's and used AngularJS framework to associate HTML elements to models
Extensive experience on using Angular directives, working on attribute level, element level and class level directives
Utilized modular structure within the Angular JS application in which different functionalities within the application were divided into different modules
Developed a single page, cross-device/cross-browser web application for real-time location sharing utilizing AngularJS, JavaScript API
Designed dynamic and browser compatible pages using HTML5, DHTML, CSS3, JQuery and JavaScript
Developed code to call the web service/APIs to fetch the data and populate on the UI using JQUERY/AJAX
Involved in CoreJavacoding by usingJavaAPIs such as Collections, Exception Handling, Generics, Enumeration, andJavaI/O to fulfil the implementation of business logic
Participated in development of a well responsive single page application using AngularJS framework, JavaScript, and jQuery in conjunction with HTML5, CSS3 standards with front-end UI team
Developed front end UI using HTML5, CSS3, JQuery, JavaScript (AngularJS), AJAX and Spring for back-end development

Environment: Java, Spring MVC REST-ful, HTML5, SVN, CSS3, JQuery, JavaScript, Angular JS, Oracle, Eclipse

Java Developer

Confidential

Responsibilities:

Involved in various phases of Software Development Life Cycle, such as requirements gathering, modelling, analysis, design and development
Ensured clear understanding of customer's requirements before developing the final proposal
Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase
UsedJavaDesign Patterns like DAO, Singleton etc
Written complex SQL queries for retrieving and updating data
Involved in implementing multithreaded environment to generate messages
Used JDBC Connections and WebSphere Connection pool for database access
Used Struts tag libraries (like html, logic, tab, bean etc.) and JSTL tags in the JSP pages
Involved in development using Struts components - Struts-config.xml, tiles, form-beans and plug-ins in Struts architecture
Involved in design and implementation of document based Web Services
Used prepared statements and callable statements to implement batch insertions and access stored procedures
Involved in bug fixing and for the new enhancements
Responsible for handling the production issues and provided solutions
Configured connection pooling using WebLogic application server
Developed and Deployed the Application on WebLogic using ANT build.xml script
Developed SQL queries and stored procedures to execute the backend processes using Oracle
Deployed application on WebLogic Application Server and development using Eclipse

Environment: Java1.4, Servlets, JSP, JMS, Struts, Validation Framework, tag Libraries, JSTL, JDBC, PL/SQL, HTML, JavaScript, Oracle 9i (SQL), UNIX, AJAX, Eclipse 3.0, LINUX, CV

We provide IT Staff Augmentation Services!

Spark/sr. Hadoop Developer Resume

Round Rock, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship