Hadoop Developer Resume
Mountain View, CA
SUMMARY
- Over 8+ years of overall experience in IT Industry in Health care and Finance which includes experience in Java, Big data technologies and web applications in multi - tieredenvironment using Java, Hadoop, Hive, HBase, Pig, Sqoop, J2EE (Spring, JSP, Servlets), JDBC, HTML, JavaScript(AngularJS).
- Working knowledge with various other Cloudera Hadoop technologies (Impala, Sqoop, HDFS, SPARK, SCALA etc)
- More TEMPthan 4 years of experience in JAVA, J2EE, Web Services, SOAP, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
- Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN, and NoSQL).
- Wellversedininstallation, configuration, supporting andmanagingofBigDataandunderlyinginfrastructure of Hadoop Cluster along with CDH3&4 clusters.
- Worked on designed and implemented a Cassandra based database and related web service for storing unstructured data.
- Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions
- Has re-engineered many Legacy Mainframe Applications into Hadoop using MapReduce API to reduce mainframe MIPS and Storage Cost.
- Experience in managing and reviewing Hadoop log files.
- Experience on NoSQL databases including HBase, Cassandra.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation.
- Designed and implemented a Cassandra NoSQL based database and associated restful web service dat persists high-volume user profile data for vertical teams.
- Experience in building large scale highly available Web Applications.Working knowledge of web services and other integration patterns.
- Experience in managing and reviewing Hadoop log files.
- Involved in installation and configuration of Pig.
- Experience in using Pig, Hive, Scoop and Cloudera Manager.
- Experience in importing and exporting data usingSQOOP from HDFS to Relational Database Systems and vice-versa.
- Expertise in all components of Hadoop Ecosystem - Hive, Pig, HBase, Impala, Sqoop, HUE, Flume, Zookeeper, Oozie and Apache Spark.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Hands on JAXWS, JSP, Servlets, Struts, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, Unix, WSDL, XML, HTML, AWS and Scala and Vertica.
- Hands on experience in RDBMS, and Linux shell scripting
- Extending Hive and Pig core functionality by writing customUDFs.
- Developed MapReduce jobs to automate transfer of data from HBase.
- Knowledge in job work-flow scheduling and monitoring tools likeoozie and Zookeeper.
- Knowledge of data warehousing and ETL tools likeInformaticaand Pentaho.
- Experienced in Oracle Database Design and ETL with Informatica.
- Procedures, Functions, Packages, Views, materialized views, function-based indexes and Triggers, Dynamic SQL, ad-hoc reporting using SQL.
- Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Extensive experience in using MVC architecture, Struts, hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON.
- Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Programming Languages: Java, python, Scala, Shell Scripting, SQL, PL/SQL
J2EE Technologies: Core Java, Spring, Servlets, SOAP/REST services, JSP, JDBC, SML, Hibernate.
BigData Ecosystem: HDFS, HBase, Hortonworks, MapReduce, Hive, Pig, Sqoop, Impala, Cassandra, Oozie, Zookeeper, Flume, Ambary, Storm, Spark and Kafka.
Databases: NoSQL, Oracle10g/11g/12C, SQL Server 2008/2008 R 2/2012/2014/2016/2017, MySQL .
Database Tools: Oracle SQL Developer, MongoDB, TOAD and PLSQL Developer
Modeling Tools: UML on Rational Rose 4.0/7.5/7.6/8.1
Web Technologies: HTML5, JavaScript, XML, JSON, jQuery, Ajax, CSS3
Web Services: Web Logic, Web Sphere, Apache Cassandra, Tomcat
IDEs: Eclipse, NetBeans, WinSCP.
Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos, Ubuntu, Windows Server 2003/2006/2008/2009/2012/2013/2016.
Frameworks: MVC, Struts, Log4J, Junit, Maven, ANT, WebServices.
PROFESSIONAL EXPERIENCE
Confidential, Mountain View, CA
Hadoop Developer
Responsibilities:
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) dat includes Development, Testing, Implementation and Maintenance Support.
- Installed and Configured multi-nodes fully distributed Hadoop cluster.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Designed and implemented Mapreduce based large-scale parallel relation-learning system.
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
- Involved in scripting (python and shell) to provision and spin up virtualized hadoop clusters
- Worked with NoSQL databases like Base to create tables and store the dataCollected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
- Developed custom aggregate functions usingSparkSQL and performed interactive querying.
- Wrote Pig scripts to store the data into HBase
- Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
- Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
- Involved in Installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources.
- Setup Hadoop Cluster environment administration dat includes adding and removing cluster nodes, cluster capacity planning and performance tuning.
- Written Complex Map reduceprograms.
- Installed and configured Hive and written Hive UDFs.
- Involved in HDFS maintenance and administering it through Hadoop-Java API
- Configured Fair Scheduler to provide service level agreements for multiple users of a cluster
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
- Sound knowledge in programming Spark using Scala.
- Involved in writing Java API’s for interacting with HBase
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used HBase as the data storage
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Create interface to convert mainframedata into ASCII.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Participated in development/implementation of Cloudera Hadoop environment.
- Got good experience with NOSQL database such as HBase.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Installed and configured Hiveand written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to HIVE and IMPALA.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined detain partitioned tables in the EDW.
- Monitored and managed the Hadoop cluster using Apache Ambary
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and reviewdata backups, manage and reviewHadooplog files.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Hortonworks, Oracle 10g/11g/12C, Teradata, Cassandra, HDFS, Data Lake, Spark, MapReduce, Ambari, Cloudera, Tableau, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar.
Confidential, Englewood, CO
Hadoop Developer
Responsibilities:
- Involved in the high-level design of the Hadoop architecture for the existing data structure and Problem statement and setup the 64-node cluster and configured the entire Hadoop platform.
- Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce 2.0 and store into HDFS (Hortonworks)
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
- Used NoSQL, Cassandra, Mongo databases and assisted in troubleshooting and optimization of MapReduce jobs and Pig Latin scripts.
- Extensively worked with Cloudera Distribution Hadoop, CDH5.x, CDH4.x
- Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4.
- Created Airflow Scheduling scripts in Python to automate the process of sqooping wide range of data sets.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark).
- Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Atana and Snowflake.
- Imported the data from different sources like AWS S3, Local file system into Spark RDD.
- Designed, developed and maintained data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NoSQL data stores for data access and analysis.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Developed multiple Map Reduce jobs for data cleaning and preprocessing.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance
- Implemented Partitioning, Dynamic partitions and Bucketing in Hive.
- Extensively used Pig for data cleansing and processing and performing transformations.
- Used Mahout to understand the machine learning algorithms for efficient data processing.
- Developed and configured Oozie workflow engine for scheduling and managing the Pig, Hive and Sqoop jobs.
- Used Zookeeper for various types of centralized configurations.
- Used Apache Tez for performing batch and interactive data processing applications on Pig and Hive jobs.
Environment: Hadoop, AWS, CDH, Elastic Map Reduce, Hive, Spark, Airflow, Zepplin, Source Tree, Bit Bucket, SQL Workbench, GenieLogs, Snowflake, Atana, Jenkins.
Confidential, Wellesley, MA
Hadoop Developer
Responsibilities:
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) dat includes Development, Testing, Implementation and Maintenance Support.
- Installation and Configuration of Hadoop Cluster.
- Created dashboards according to user specifications and prepared stories to provide an understandable vision.
- Extensive data validation using HIVE and written Hive UDF.
- Resolving User Support requests.
- Administer and Support Hadoop Clusters
- Loaded data from RDBMS to Hadoop using Sqoop
- Providing solutions to ETL/Data warehousing teams as to where to store the intermediate and final output file in the various layers in Hadoop.
- Cluster HA Setup.
- Worked collaboratively to manage build outs of large data clusters.
- Loaded data from UNIX file system to HDFS and created Hive tables, loaded and analyzed data using Hive queries.
- Data was loaded back to the Teradata for the BASEL reporting and for the business users to analyze and visualize the data using Datameer.
- Developed UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage. Created security and encryption systems for big data.
- Performed administration troubleshooting and maintenance of ETL and ELT processes
- Collaborated with multiple teams for design and implementation of big data clusters in cloud environments.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions).
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed a custom file system plugin for Hadoop to access files on data platform.
- The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Teradata vast knowledge experience.
- Extracted feeds from social media sites such
- Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing scripts and batch jobs to schedule various Hadoop Programs.
- Has written Hive Queries for data analysis to meet the business requirements.
- Creating Hive Tables and working on them using Hive QL.
Environment: HDFS, Hive, ETL, PIG, UNIX, Linux, CDH 4 distribution, Tableau, Impala, Teradata, Pig, sqoop, flume, oozie
Confidential - Cambridge, MA
Hadoop Developer
Responsibilities:
- Oversee the performance of Design to develop technical solutions from Analysis documents.
- Exported data from DB2 to HDFS using Sqoop.
- Developed MapReduce jobs using Java API.
- Installed and configured Pig and also wrote Pig Latin scripts.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Develop monitoring and performance metrics for Hadoop clusters.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Involved in Creating, Upgrading, and Decommissioning of Cassandraclusters.
- Involved in working on Cassandra database to analyze how the data get stored.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVAAPI (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6).
Confidential
Java Developer
Responsibilities:
- Gatheird user requirements followed by analysis and design. Evaluated various technologies for the client.
- Developed HTML and JSP to present Client-sideGUI.
- Involved in development of JavaScript code for client-side Validations.
- Designed the HTML based web pages for displaying the reports.
- Developed the HTML based web pages for displaying the reports.
- Developed java classes and JSP files.
- Extensively used JSF framework.
- Created Cascading Style Sheets dat are consistent across all browsers and platforms
- Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
- Developed dynamic content of presentation layer using JSP.
- Develop user-defined tags using XML.
- Developed Cascading Style Sheets(CSS) for creating TEMPeffects in Visualforce pages
- Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
- Used Struts Framework to implement J2EE design patterns (MVC).
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State Full Session beans) and Message Driven Beans.
Environment: Java, J2EE 6, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1, Windows 2003.
Confidential
Java Developer
Responsibilities:
- Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine.
- Designed and developed a fully functional generic n-tiered J2EE application platformthe environment was Oracle technology driven. The entire infrastructure application was developed using Oracle Java Developer in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces.
- Created an entity object (business rules and policy, validation logic, default value logic, security)
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Used Cascading Style Sheet (CSS) to attain uniformity through all the pages
- Create Reusable Component (ADF Library and ADF Task Flow)
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
- Creating Modules Using Task Flow with Bounded and Unbounded
- Generating WSDL (Web Services) And Create Work Flow Using BPEL
- Handel the AJAX functions (partial trigger, partial Submit, auto Submit)
- Created the Skin for the layout.
Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (Bell), Oracle WebLogic.