Hadoop Engineer/ Bigdata Admin Resume
Woodcliff Lake, NJ
SUMMARY
- Around 8 years of expertise in Hadoop, Big Data Analytics and Linux including architecture, design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experience in configuring, installing and managing MapR, Hortonworks& Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Working knowledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App - Dynamics, New Relic, and Nagios.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks,Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Involved in the release process from development to Informatica production.
- Experience in hbase replication and maprdb replication setup between two clusters
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience with scripting languages python, Perl or shell script also.
- Modified reports and Talen ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in workflow scheduling and monitoring tool Rundeck and Control-M.
- Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
TECHNICAL SKILLS
Hadoop/BigData Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala, HCatalog, Apache Spark, Spark Streaming, Spark SQL, HBase, NiFi and Cassandra, AWS (EMR, EC2), Horton Works, Cloudera
Languages: Java, SQL
Protocols: TCP/IP, HTTP, LAN, WAN
Network Services: SSH, DNS/BIND, NFS, NIS, Samba, DHCP, Telnet, FTP, IPtables, MS AD/LDS/ADC and OpenLdap.
Other Tools: Tableau, SAS
Mails Servers and Clients: Microsoft Exchange, Lotus Domino, Send mail, Postfix.
Databases: Oracle 9g/10g & MySQL 4.x/5.x, HBase, NoSQL, Postgres
Platforms: Red Hat Linux, Centos, Solaris, and Windows
Methodologies: Agile Methodology -SCRUM, Hybrid
PROFESSIONAL EXPERIENCE
Confidential - Woodcliff Lake, NJ
Hadoop Engineer/ Bigdata Admin
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
- Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive Sqoop.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Experienced in developing Spark scripts for data analysis in python.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working to implement MapR stream to facilitate realtime data ingestion to meet business needs
- Built on-premise data pipelines using Kafka and Spark for real time data analysis.
- Performed streaming data ingestion using Kafka to the spark distribution environment.
- Implemented Hive complex UDF’s to execute business logic with Hive Queries
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Built a prototype for real time analysis using Spark streaming and Kafka.
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Importing and exporting data into HDFS and Hive using Sqoop
- Worked extensively on building Nifi data pipelines in docker container environment in development phase.
- Implemented Kerberos security in all environments. Defined file system layout and data set permissions.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience with scripting languages python, Perl or shell script also.
- Designing, developing, and ongoing support of a data warehouse environments.
- Involved in the release process from development to Informatica production.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL with data in Oracle Database
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Developed applications, which access the database with JDBC to execute queries, prepared statements, and procedures.
- Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing.
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Used Nifiprocessor to process and deploy end to end data processing pipelines and scheduling the work flows.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.
Environment: HBase,Hadoop 2.2.4, Hive, Kerberos,Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, MapR, Yarn, Sqoop, Cassandra.
Confidential, Mountain view, CA
Hadoop Engineer/Big Data Admin
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Importing and exporting data into HDFS and Hive using Sqoop
- Implemented Spark RDD transformations, actions to migrate Map reduce algorithm using Scala and Spark SQL for faster testing and processing of data.
- Installed/Configured/Maintained Horton works Dataflow tools such as NIFI
- Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Configured Apache NiFi on the existing Hadoop cluster.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL with data in Oracle Database
- Complete end to end design and development of ApacheNIFIflow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above
- Experience in job workflow scheduling and scheduling tools likeNifi.
- Worked on setting up of Hadoop ecosystem&Kafka Cluster on AWS EC2 Instances.
- Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Designed a workflow that will help the Cloud-transition management decide the correct queries to by run for Google Big Query. (For each and every query executed in Google Big Query cost is applied)
- Designed and Implemented MongoDB cloud Manger for Google Cloud
- Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
- Designed and Implemented MongoDB Cloud Manger for Google cloud
- Experience in automation of code deployment across multiple cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud, VMWare and OpenStack
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers .
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties.
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.
Environment: HBase,Hadoop 2.2.4, Hive, Kerberos,Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, MapR, Yarn, Sqoop, Cassandra.
Confidential - Menomonee Falls, WI
Big Data
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Monitoring and support through Nagios and Ganglia.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Created MapR DB tables and involved in loading data into those tables.
- Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
- Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
- Performed streaming data ingestion using Kafka to the spark distribution environment.
- Implementing Security on MapR cluster using BOKS and by encrypting the data on fly
- Continuous monitoring and managing the HADOOP cluster through MapR Control System, Spyglassand Geneos.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Cloudera Navigator installation and configuration using Cloudera Manager.
- Cloudera RACK awareness and JDK upgrade using Cloudera manager.
- Sentry installation and configuration for Hive authorization using Cloudera manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
- Actively involved on proof of concept for Hadoop cluster in AWS. Used EC2 instances, EBS volumes and S3 for configuring the cluster.
- Involved in migrating the ON PREMISE data to AWS.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Investigate the root cause of Critical and P1/P2 tickets.
Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
Confidential
Hadoop Administrator
Responsibilities:
- Working as Hadoop Administrator with Cloudera Distribution of Hadoop (CDH).
- Installed/Configured/Maintained Apache Hadoop and Cloudera Hadoop clusters for application development and Hadoop tools like HDFS, Hive, HBase, Zookeeper and Map Reduce.
- Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH5.7.0, CDH5.10.0) distributions.
- Successfully upgraded Cloudera Distribution of Hadoop distribution stack from 5.7.0 to 5.10.0.
- Installed and configured a Cloudera Distribution of Hadoop (CDH) manually through command line.
- Maintaining the Operations, installations, configuration of 150 node clusters with CDH distribution.
- Monitored multiple Hadoop clusters environments, workload, job performance and capacity planning using Cloudera Manager.
- Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service.
- Created graphs for each HBase table in cloudera on basis of writes, reads, file size in respective dashboards.
- Exported and created Dashboards of cloudera logs in to Grafana by using JMX exporter and Prometheus.
- Installed and Configured SOLR in cloudera to query HBase data.
- Worked on setting up High availability for major Hadoop Components like Name Node, Resource Manager, Hive and Cloudera Manager.
- Created new Users, Principals, Keytabs in different kerberozed clusters.
- Part of every 30 day patching with Operational team on Hadoop clusters.
- Installed and configured Hadoop cluster across various environments through Cloudera Manager.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager
- Enable TLS between Cloudera manager and agents.
- Enhancing by tuning performance of HBase and HDFS to with stand heavy writes and reads by changing Configurations.
- Installed and Configured Phoenix to query HBase data in Cloudera Environment.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Involved in start to end process of Hadoop cluster setup which includes Configuring and Monitoring the Hadoop Cluster.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
- Handling and generating tickets via the BMC Remedy ticketing tool.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Experienced in managing and reviewing Hadoop log files.
Environment: Linux, Shell Scripting, Teradata, SQL server, Cloudera 5.7, 5.8, 5.9 Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.