Spark/sr. Hadoop Developer Resume
Round Rock, TX
SUMMARY
- 6+ years of experience in software development
- 4+ years of professional IT experience in ingestion, storage, querying, processing and analysis of Big Data using Big data ecosystem related technologies likeHadoopHDFS, Map Reduce, Apache Pig, Hive, Sqoop, Hbase, Flume, Oozie, Spark, Cassandra, Kafka and Zookeeper
- 2 Years of experience as JAVA and J2EE Developer
- Hands on experience in Installing, Configuring and Troubleshooting Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop,Spark, Flume, Zookeeper, Kafka & Impala
- Hands on experience in working with ClouderaCDH3 and CDH4 platforms
- Good understanding of HDFS Designs and HDFS high availability (HA)
- Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node
- Experience in working with Hadoop in Stand - alone, pseudo and distributed modes
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
- Experience in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON, Parquet, ORC and Avro
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice-versa
- Hands on experience using YARN and tools like Pig and Hive for data analysis and Zookeeper for coordinating cluster resources
- Expert in working with Hive data warehouse for creating tables anddistributing data by implementing Partitioning and Bucketing
- Expertise in implementing complex Ad-hoc queries using Hive QL
- Extending HIVE and PIG core functionality by developing custom User Defined Functions (UDF)
- Expertise in implementing Spark and Scala applications using higher order functions for both batch and interactive analysis requirement
- Good working experience in using Spark SQL to manipulate Data Frames in Python
- Experiencein working with Spark tools like RDD transformations, Spark MLlib and spark SQL
- Experience in executing Spark SQL queries against data in Hive in spark context andensured performance optimization
- Transform, move, and synchronize data across all heterogeneous sources and targets using Talend
- Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data
- Worked with Tableau to connect with Hive Data Warehouse to represent the data in dashboards
- Hands on experience of UNIX and shell scripting
- Extensive knowledge in using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC and Front Controller
- Expertise in using IDE like Eclipse, WebSphere (WSAD), NetBeans, My Eclipse, WebLogic Workshop
- Experience in developing and designing Web Services (SOAP and Restful Web services)
- Experience in developing Web Interface using Servlets, JSP and Custom Tag Libraries
- Experience in developing applications using SCRUM methodology and Agile Methodology
TECHNICAL SKILLS
Hadoop/Big Data: Apache Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Flume, Zookeeper, Impala, Spark, Ambari, Impala, Kafka, YARN, HDFS, Talend, Ranger, Hortonworks and Cloudera distributions
NOSQL Databases: Apache HBase, Cassandra
RDBMS: Oracle, MySQL, SQL Server, Teradata, DB2
Languages: C, C++, Java, Scala, Python, PL/SQL, Transact SQL
Scripting Languages: Unix, Perl, Java Script, Linux Bash Shell Scripting
Operating System: Windows 8/7/Vista, Red Hat, Ubuntu
Application Servers: Web Logic, Web Sphere, Apache Tomcat
Other Tools: Putty, WinSCP, FileZilla, Toad, MAVEN, Autosys, Jmeter, WinSCP, Jenkins, GitHub, Subversion
Methodologies: Agile, SCRUM, Waterfall, Lean, Kanban
Collaboration Tools: SharePoint, Wiki, Confluence, Team Foundation Server (TFS), JIRA
PROFESSIONAL EXPERIENCE
Spark/Sr. Hadoop Developer
Confidential, Round Rock, TX
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) activities such as development, implementation and support forHadoop
- Imported and exported data from different Relational Database Systems like Mysql and Oracle into HDFS and Hive and vice-versa, using Sqoop
- Developed data pipe line using Kafka, HBase, Spark and Hive to ingest, transform and analysedata
- Migrated complex Map Reduce programs into Apache Spark RDD transformations
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data and handled Data Skewness in Spark-SQL
- Used Talend for connecting, cleansing and sharing cloud and on-premises data
- Developed Spark, Pig and Hive Jobs to summarize and transform data
- Designed and developed Pig and Hive UDF's for Data enrichments
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
- Worked streaming data using Kafka and Spark Streaming for Data preparation
- Used HBase for creating Snapshot tables for updating and deleting records
- Working with AWS to migrate the entire Data Centres to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services
- Developed entire spark applications in python(PySpark) on distributed environment
- Migrated tables from SQL Server to HBase, which are being used actively till date
- Worked on performance tuning of Hive and Spark jobs
- Worked on creating secondary indexes in Hbase to join tables
Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Kafka, Zookeeper, Spark, Python, Talend, Hbase, Scala, Shell Scripting, Maven, MapReduce, Amazon EMR, EC2, S3
Sr. Hadoop Developer
Confidential, Woonsocket, RI
Responsibilities:
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis
- Developed multiple Kafka Producers and Consumers from scratch implementing organization's requirements
- Extensively worked with Talendfor data discovery, visualization and enrichment
- Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions and TTL
- Wrote and tested complex MapReducejobs for aggregatingidentified and validated data
- Created Managed and External Hive tables with static/dynamic partitioning
- Written Hive queries for data analysis to meet the Business requirements
- Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them
- Extensively involved in performance tuning of the HiveQL by performing bucketingon large Hive tables
- Used open source web scraping framework for python to crawl and extract data from web pages
- Optimized the Hive queries by setting different combinations of Hive parameters
- Developed UDFs(User Defined Functions)to extend core functionality of PIG and HIVE queriesas per requirement
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data
- Used Spark API overHadoopto analyse data in Hive
- Worked on implanting pipeline using Kafka, Spark Streaming for streaming Data
- Implemented workflow using Oozie for running Map Reduce jobs and Hive Queries
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Talend, Apache Kafka, Zookeeper, Spark, Hbase, Python, Shell Scripting, Oozie, Maven, Hortonworks.
Hadoop Developer
Confidential, New York City, NY
Responsibilities:
- Responsible for building scalable distributed data solutions usingHadoop
- Worked comprehensively with Apache Sqoop and developed Sqoop scripts to interface data from a MySQL database into theHadoopDistributed File System (HDFS)
- Utilize parallel processes of theHadoopFramework to ensure resource efficiency
- Created Managed tables and External tables in Hive and loaded data from HDFS
- Worked on debugging and performance tuning of Hive & Pig Jobs
- Used python sub-process module to perform UNIX shell commands
- Extracted data from Agent Nodes into HDFS using Python scripts
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries
- Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website etc.
- Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Used pig and hive Upon the Hcatalog tables to analyse the data and create schema for the Hbase tables in Hive
- Coordinated with the BI team to visualize the transformed data into a dashboard using Tableau
- Assisted in creating and maintaining Technical documentation to launchingHadoopClusters and executing Hive queries and Pig Scripts
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Zookeeper, Hbase, Python, Shell Scripting, Oozie, Maven, Cloudera, Tableau
Hadoop Developer
Confidential, Chicago, IL
Responsibilities:
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on theHadoop cluster
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Implemented scripts to transmit data from Oracle to HBase using Sqoop and vice-versa
- Worked on bucketing and partitioning the HIVE table and running the scripts in parallel to reduce the run time
- Developed and optimized Map Reduced Jobs to use HDFS efficiently by using various compression mechanisms
- Analysed data by preforming Hive queries and running Pig scripts
- Developed Spark Code using python for faster processing of data
- Implemented business logic by writing Pig UDF's in Java and used various UDF's from Piggybanks and other sources
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required
- Exported the analysed data to the relational databases using Scoop for visualization and to generate reports for the BI team
- Implemented testing scripts to support test driven development and continuous integration
Environment: Hadoop, Map Reduce, HDFS, Hive, Sqoop, Pig, Java, Python, Flume, Oozie, Java, Maven, Eclipse.
Java/J2EE Developer
Confidential, Atlanta, GA
Responsibilities:
- Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
- Developed and integrated REST web services to display data or search results
- Designed CSS based page layouts that are cross-browser compatible and standards-compliant
- Responsible for design and development of the web pages from mock- ups
- Designed and developed creative intuitive user interfaces that address business and end-user needs, while considering the technical, physical and temporal constraints of the users
- Used Bootstrap library to quickly build project UI's and used AngularJS framework to associate HTML elements to models
- Extensive experience on using Angular directives, working on attribute level, element level and class level directives
- Utilized modular structure within the Angular JS application in which different functionalities within the application were divided into different modules
- Developed a single page, cross-device/cross-browser web application for real-time location sharing utilizing AngularJS, JavaScript API
- Designed dynamic and browser compatible pages using HTML5, DHTML, CSS3, JQuery and JavaScript
- Developed code to call the web service/APIs to fetch the data and populate on the UI using JQUERY/AJAX
- Involved in CoreJavacoding by usingJavaAPIs such as Collections, Exception Handling, Generics, Enumeration, andJavaI/O to fulfil the implementation of business logic
- Participated in development of a well responsive single page application using AngularJS framework, JavaScript, and jQuery in conjunction with HTML5, CSS3 standards with front-end UI team
- Developed front end UI using HTML5, CSS3, JQuery, JavaScript (AngularJS), AJAX and Spring for back-end development
Environment: Java, Spring MVC REST-ful, HTML5, SVN, CSS3, JQuery, JavaScript, Angular JS, Oracle, Eclipse
Java Developer
Confidential
Responsibilities:
- Involved in various phases of Software Development Life Cycle, such as requirements gathering, modelling, analysis, design and development
- Ensured clear understanding of customer's requirements before developing the final proposal
- Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase
- UsedJavaDesign Patterns like DAO, Singleton etc
- Written complex SQL queries for retrieving and updating data
- Involved in implementing multithreaded environment to generate messages
- Used JDBC Connections and WebSphere Connection pool for database access
- Used Struts tag libraries (like html, logic, tab, bean etc.) and JSTL tags in the JSP pages
- Involved in development using Struts components - Struts-config.xml, tiles, form-beans and plug-ins in Struts architecture
- Involved in design and implementation of document based Web Services
- Used prepared statements and callable statements to implement batch insertions and access stored procedures
- Involved in bug fixing and for the new enhancements
- Responsible for handling the production issues and provided solutions
- Configured connection pooling using WebLogic application server
- Developed and Deployed the Application on WebLogic using ANT build.xml script
- Developed SQL queries and stored procedures to execute the backend processes using Oracle
- Deployed application on WebLogic Application Server and development using Eclipse
Environment: Java1.4, Servlets, JSP, JMS, Struts, Validation Framework, tag Libraries, JSTL, JDBC, PL/SQL, HTML, JavaScript, Oracle 9i (SQL), UNIX, AJAX, Eclipse 3.0, LINUX, CV