Hadoop Developer Resume
San Jose, CA
SUMMARY
- Around 8+ years of professional IT experience in Analysis, Development, Integration and maintenance of Web based and Client/Server applications using Java and Big Data technologies.
- 5+ years of experience as Hadoop Development and analysis. Worked on various technologies like Hive, Pig, Java MapReduce, UNIX, and HDFS.
- Strong experience working with HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Yarn, Oozie and HBase.
- Over 2+ years of experience in development, Linux administration, implementation and maintenance of web servers and distributed Enterprise applications.
- Experience in all phases of software development life cycle (SDLC), which includes User Interaction, Business Analysis/Modelling, Design/Architecture, Development, Implementation, Integration, Documentation, Testing, and Deployment.
- Experience in analyzing the Business requirement and creating the hive or pig scripts to process the aggregate data.
- Good understanding in processing of real - time data using Spark.
- Involved in preparation of Test Plans, Test Cases & Test Scripts based on business requirements, rules, data mapping requirements and system specifications.
- Ingest data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
- Experience in implementation of Open-Source frameworks like spring, Hibernate, Web Services etc.
- Troubleshooting issues in development and operational environments on configuration of Hadoop environments.
- Experience in Continuous Integration and Continuous Deployment by the tools like Jenkins.
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Experience with databases such as PostgreSQL, and MySQL Server with cluster setup and writing the SQL queries Triggers & Stored Procedures.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Python and Scala.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Proficient in working with NoSQL database like MongoDB, Cassandra and HBase.
- Good Knowledge in NoSQL databases HBASE (Column family DB).
- Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Communicated to diverse communities of clients at offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating the Offshore Development activities.
- Highly organized and dedicated with positive Attitude, possess good time management and organizational skills with the ability to handle multiple tasks with positive attitude.
- A team player with good interpersonal, communication and leadership skills.
- Easily adaptable to the work conditions and can consistently deliver the quality work and capable of adapting to new technologies and facing new challenges.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, Teradata, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Oozie, Storm, Scala, Kafka and Flume.
Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python.
Web Development: JavaScript, jQuery, HTML 5.0, CSS 3.0, AJAX, JSON
Development Tools: Net Beans 8.0.2, Visual Studio 2013, Eclipse Neon, Android Studio, SQL developer
Testing Tools: J-Unit Testing, HP- Unified functional testing, HP- Performance Center, Selenium, win runner, Load Runner, QTP
UNIX Tools: Apache, Yum, RPM
Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, Horton Works, Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, Couch, MS SQL server
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- The GVS-CS project is having multiple teams. Among them, me involved in the Data Engineering team.
- The focus of the team is getting the data from different vendors and process dat data by using business rules.
- After processing the data, we will send the data to the Eloqua tool.
- Myself, involved in the Hadoop security architecture, which added different users to the same YARN queue in development and productions clusters.
- After adding the users, we validate some jobs and checked dat the new users are allocated the same YARN queue or not in respective clusters.
- Also involved in the security architecture for Google Platform, we are in the process of implementing dis to Google cloud projects.
- The security architecture for Google platform is basically a two-step verification whose is going to access the cloud projects.
- In dis project, we are spark-Sql and hive to validate large data sets with the business rules.
- Also, involved in the discussion of the Hadoop data pipe lines automation to implement on our Hadoop.
- In Hadoop data pipe line automation, we want to implement Jenkins to automate the Git commits when we push.
- their are number of offers which are going live every week, and monthly based on the client requirements.
- We are involved in cleaning the database when it is required like hive tables, python scripts etc.
Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Jenkins, UNIX Shell Scripting, Big Data, Map Reduce, Git, Eloqua.
Confidential, Plano, TX
Hadoop Developer
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Used flume, Sqoop, Hadoop, spark and Oozie for building data pipeline.
- Cluster coordination services through Zookeeper.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Experienced in managing and reviewing Hadoop log files.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
- Worked with highly unstructured and semi-structured data of 30TB in size (90TB with replication factor of 3).
- Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, and NoSQL to Apache Kafka or Spark cluster.
- Migrating data from Spark-RDD into HDFS and NoSQL like Cassandra/HBase.
- Implement Pig in Pig-Latin to handle the preprocessing of data and make data regular.
- Worked on reading multiple data formats on HDFS using PySpark.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Developed MapReduce programs by using Java.
- Worked on the core and Spark SQL modules of Spark extensive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, SQL, Teradata, UNIX Shell Scripting, Big Data, Map Reduce, Sqoop, Oozie, Pig, Flume, LINUX, Java, Eclipse.
Confidential, South Portland, ME
Hadoop Developer
Responsibilities:
- Worked in Multi Clustered Hadoop Eco-System environment.
- Created Map Reduce programs using Java API dat filter un-necessary records and find out unique records based on different criteria.
- Used Unit Test Pythonlibrary for testing many Python programs and block of codes.
- Parse JSON and XML data using Python.
- Rewrite existing Java application in Pythonmodule to deliver certain format of data.
- Load and transform large sets of unstructured data from UNIX system to HDFS.
- Use Apache Scoop to dump the data user data into the HDFS on a weekly basis.
- Created production jobs using Oozie work flows dat integrated different actions like Map Reduce, Sqoop, and Hive.
- Involved in importing the real time data to Hadoop usingKafkaand implemented the Oozie job for Daily.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Experienced in transferring data from different data sources into HDFS systems usingKafka Producers.
- Prepared ETL pipeline with the help of Sqoop for consumption.
- Written PIG Scripts to analyze Hadoop logs.
- Created tables, loading with data and writing HIVE queries which will run internally in map.
- Troubleshoot and debug HADOOP ecosystem run-time issues.
- Participated in all phases of SDLC includes areas of requirement gathering, analysis, estimation, design, coding, testing and documentation.
- Developed SOAP web service as publisher/producer.
- Developed different GUI screens JSPs using HTML, JavaScript and CSS.
- Designed the user interface of the application using Angular JS, Bootstrap, HTML5, CSS3 and JavaScript.
- Designed and developed front-end Graphic User Interface with JSP, HTML5, CSS3, JavaScript, and JQuery.
- Developed entire frontend and backend modules using Pythonon Django Web Framework.
- Developed tools using Python, Shell scripting, XML, BIG DATA to automate some of the menial tasks.
- Performed Single Point of Technical Contact for different application teams and DEV, QA, Line Managers.
Environment: Hadoop MapReduce, HIVE, HDFS, Java, CSV files, Python, Django, Java, AWS, XML, Shell Scripting, MySQL, HTML, XHTML, Jenkins, Linux.
Confidential, Boston, MA
Hadoop Data Analyst
Responsibilities:
- Used Hive quires and Pig scripts to analyze data.
- Used Hive for partitioning and bucketing of data from different kind of sources to improve the performance.
- Following agile methodology (SCRUM) during development of the project and oversee the software development by attending daily stand-ups.
- Used Oozie to automate the flow of jobs and Zookeeper for coordination.
- Used Flume to distribute Unstructured and semi structured data.
- Used Sqoop to distribute structured data.
- Wrote the Shell scripts to run the Cron Jobs to automate the data migration process from external servers and FTP sites.
- Prepared ETL pipeline with the help of Sqoop, PIG, and HIVE to be able to frequently bring in data from the source and make it available for consumption.
- Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
- Involved in loading data from UNIX file system to HDFS.
- Created portioned tables in HIVE.
- Developed MapReduce programs by using Java.
- Developed various UDF's in hive for various hive scripts achieving various functionalities.
- Implemented Kafka messaging services to stream large data and insert into database.
- Analysed large amounts of data sets by writing Pig scripts.
- Developed Map reduce programs for the files generated by hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.
Environment: HDFS, Hive, MapReduce, Java, NoSQL, Unix, Linux, Jenkins, shell scripting, MySQL, Spreadsheet.
Confidential, Fort Worth, TX
Data Analyst
Responsibilities:
- Communicated TEMPeffectively in both a verbal and written manner to client and offshore team.
- Completed documentation on all assigned systems and databases, including business rules, logic, and processes.
- Created Test data and Test Cases documentation for regression and performance.
- Designed, built and implemented relational databases.
- Determined changes in physical database by studying project requirements.
- Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.
- Facilitated gathering moderately complex business requirements by defining the business problem.
- Facilitated the monthly Opportunities for Improvement (OFI) meeting.
- Identified Opportunities for Improvement (OFI), recommended and implemented, as applicable, processed improvement plans in collaboration with identified departments.
- Identified and addressed outliers in an efficient and professional manner following a predetermined protocol.
- Identified data requirements and isolated data elements.
- Leveraged a basic understanding of multiple data structures and sources.
- Maintained and assisted in the development of moderately complex business solutions, which included data, reporting, business intelligence/analytics.
- Maintained data dictionary by revising and entering definitions.
- Maintained direct, timely and appropriate communication with clients.
- Supported data governance, integrity, quality and audit functions.
- Supported the implementation of technical data solutions and standards.
- Utilized and prepared analysis reports summarizing Opportunities for Improvements (OFIs).
- Worked closely with other members of the database group.
Environment: Linux, Unix, Java, spreadsheet, QlikView, SQL, Excel, shell scripting, MySQL.
Confidential
Java Developer
Responsibilities:
- Used Eclipse as an IDE for development of the application.
- Developed Application in Jakarta Struts Framework using MVC architecture.
- Implemented J2EE design patterns Session Facade pattern, Singleton Pattern.
- Created Action Forms and Action classes for the modules.
- Customizing all the JSP pages with same look and feel using Tiles, CSS.
- Developed JSP's to validate the information automatically using Ajax.
- Created struts-config.xml and tiles-def.xml files.
- Involved in coding for the presentation layer using Apache Struts, XML and JavaScript.
- Used XSLT for UI to display XML Data.
- Utilized JavaScript for client-side validation. Participated in designing the user interface for the application using HTML and connected them to database using JDBC.
- Created web pages based on the requirements and styled them using CSS.
- Involved in writing client-Side Scripts using Java Scripts and server-Side scripts using Java Beans and used Servlets for handling the business
- Developed the Form Beans and Data Access Layer classes.
- Involved in writing complex sub-queries and used Oracle for generating on-screen reports
- Worked on database interaction layer for insertions, updating and retrieval operations on data.
- Involved in deploying the application in test environment using Apache Tomcat.
Environment: JSP, Core Java, Servlets, Struts, UML, AJAX, SQL, JUNIT, JavaScript, Eclipse, JIRA, HTML, CSS.