Hadoop Developer Resume
NyC
PROFESSIONAL SUMMARY:
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, MapReduce, NameNode, DataNode, Resource Manager, Node Manager, Job Tracker, Task Tracker programming paradigm and Hadoop Ecosystem (Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark).
- Total 8+ Years' experience in Software development includes development of Big - Data and Hadoop framework.
- Well versed in Installation, Configuration, Supporting and Managing of Big Data and Underlying infrastructure of Hadoop Cluster.
- Experience with Cloudera Manager Administration also experience In Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Cloudera Manager.
- Experience in Database Administration, performance tuning and backup & recovery and troubleshooting in large scale customer facing environment.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Impala and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Extensive experience with Database administration, maintenance, and schema design for PostgreSQL and MySQL.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data Ingestion, Oozie for scheduling and HBase as a NoSQL data store.
- Experienced in deployment of Hadoop Cluster using Ambari, Cloudera Manager.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Flume & Sqoop to the existing Hadoop cluster.
- Having good Knowledge in Apache Flume, Sqoop, Hive, Hcatalog, Impala, Zookeeper, Oozie.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, Ignite and RabbitMQ, Kafka.
- Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
- Experience in analyzing the log files for Hadoop and ecosystem services and finding out the root cause.
- Performed Thread Dump Analysis for stuck threads and Heap Dump Analysis for leaked memory with Memory analyzer tool manually.
- Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
- Hands on experience with Chef, Confidential and Ansible.
- Involved in all phases of Software Development Life Cycle (SDLC) in large-scale enterprise software using Object Oriented Analysis and Design.
- Provided 24/7 on-call Support for production.
- Co-ordination with different tighter schedules and efficient in meeting deadlines.
- Self- starter, fast learner and a team player with strong communication and interpersonal skills.
TECHNICAL SKILLS:
Hadoop Ecosystems: Hadoop, HDFS, MapReduce, Hive, YARN, Oozie, Zookeeper, Spark, Spark SQL, Spark Streaming, Impala, Hue, Kafka, RabbitMQ, Solar, Sqoop, NiFi, Knox, Ranger, and Kerberos.
Cloud Services: Elastic MapReduce, Amazon Cloud Compute, Simple Storage Service, RedShift and Microsoft Azure.
Languages: Java, Scala, Python, PL/SQL, Unix Shell Scripting.
Java Technologies: Spring MVC, JDBC, JSP, JSON, Applets, Swing, JDBC, JNDI, JSTL, RMI, JMS, Servlets, EJB, JSF.
UI Technologies: HTML5, JavaScript, CSS3, Angular, XML, JSP, JSON AJAX.
Development Tools: Eclipse, IntelliJ, Maven, Insomnia, Postman's, Scala IDE.
Frameworks/Web Server: Spring, JSP, Hibernate, Web Logic, Web Sphere, Tomcat.
SQL/ NoSQL Databases: Teradata, PostgreSQL, Oracle, HBase, MongoDB, Cassandra, CouchDB, MySQL and DB2.
Other tools: GitHub, BitBucket, SVN, JIRA, Source Tree, Maven
WORK EXPERIENCE:
Hadoop Developer
Confidential, NYC
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Capacity planning, hardware recommendations, performance tuning and benchmarking.
- Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, MRv2.
- Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
- Implemented Fair schedulers on the Resource Manager to share the resources of the Cluster for the MRv2 jobs given by the users.
- Configuring LDAP for Hadoop cluster
- Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users, includes setting up Linux users and testing HDFS, Hive, Pig and MRv2 access for the new users.
- Experience in Setting up Data Ingestion tools like Flume, Sqoop.
- Install and setup Hbase, Hive, and Impala.
- Setting up Quotas on HDFS, implementing Rack Topology Scripts.
- Perform investigation and migration from MRv1 to MRv2.
- Worked with Big Data Analysts, Designers and Scientists in troubleshooting MRv1/MRv2 job failures and issues with Hive, Pig, Flume, and Apache Spark.
- Utilized Apache Spark for Interactive Data Mining and Data Processing.
- Accommodate load in its place before the data is analyzed using Apache Kafka with its fast, scalable, fault-tolerant system.
- Configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
- Used Hive and create Hive tables involved in data loading.
- Expertise in Hadoop Stack MRv2, Sqoop, Flume, Pig, Hive, Hbase, Kafka, Spark.
- Extensively involved in querying using Hive, Pig.
- Experience in writing custom UDF's for extending Pig core functionality.
- Experience in writing custom MR jobs which utilize Java API.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Setup automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
- Set up independent testing lifecycle for CI/CD scripts with Vagrant and Virtualbox.
- Involve in different automation activities.
Environment: Hadoop, MR, MRv2, Hive, Pig, HDFS, Sqoop, Oozie, CDH, Flume, Kafka, Spark, HBase, Zookeeper, Impala, LDAP, NoSQL, MySQL, Infobright, Linux, AWS, Ansible, Confidential, AWS, Chef.
Hadoop Developer
Confidential, Boston, MA
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Experience in vanilla Hadoop administration, management without any GUI tools.
- Enhance and develop all LSB scripts for vanilla Hadoop and ecosystem.
- Capacity planning, hardware recommendations, performance tuning and benchmarking.
- Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, and MapReduce.
- Collaborating with software engineers to setup user experiencing service based on Flume and Hadoop.
- Taking Backups of NameNode Metadata. Test backup consistency.
- Setup HA for Hadoop CDH4 cluster, Apache Tomcat Web Server, using open source tools such as Corosync, Pacemaker, DRBD.
- Configure HA multiregional cluster of the PostgreSQL database using existing PostgreSQL possibilities, such as Log Shipping, Hot Standby.
- R&D researches with Multi Master Replication in PostgreSQL based on Bucardo.
- Organize and implement different Load Balancing solutions for PostgreSQL cluster including PgPool-II, PgBouncer.
- Perform an performance auditing of the PostgreSQL RDBMS on everyday basic which includes profiling of the SQL queries, log and database health metric collection.
- Management and administrating of the Citrix Netscaler application.
- Develop and open source Netscaler Manager app which utilized NITRO api to communicate with Citrix Netscaler.
- Set up Confidential master and agent to provisioning bare metal and virtual infrastructure.
- Manage Hadoop/PostgreSQL cluster configuration with Confidential .
- Implement HA cluster of the Redis instances using Corosync and Pacemaker.
- Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
- Managing Hadoop in multi tenant environment which includes proper scheduler configuration for MRv1.
- Configuring LDAP for Hadoop cluster.
- Managing HA LDAP cluster on Linux.
- Configure replication for HA LDAP cluster on Linux.
- Deploy new hardware and software environments required for PostgreSQL/Hadoop and expand existing environment.
- Working with data delivery teams to setup new Hadoop users, includes setting up Linux users, testing HDFS, Hive, Impala and MapReduce access for the new users, through the Hue Web UI.
Environment: PostgreSQL, Hadoop, MapReduce, Hive, HDFS, CDH, Flume, HBase, Zookeeper, Impala, Splunk, LDAP, NoSQL, Linux,
Confidential, NYC
MS SQL Developer/Administrator
Client: OFI Global Asset Management
- Worked as developer and administrator on MS SQL Server 2014.
- • Maintained client relationship by communicating the daily status and weekly status of the project. • Developed complex T-SQL code. •
- Created Database Objects - Tables, Indexes, Views, User defined functions, Cursors, Triggers, Stored Procedure, Constraints and Roles. •
- Used SQL profiler to view indexes performance to mostly eliminate table scan
- Maintained the table performance by following the tuning tips like normalization, creating indexes and c
Environment: Oracle, PostgreSQL, Apache Tomcat, Apache HTTP, Nginx, Linux, UNIX, JDK1.6, Solaris 9/10, RHEL 4/5, SQL.