Big Data Engineer/ Kafka Admin Resume San Francisco, CA - Hire IT People

SUMMARY:

8+ years of professional IT experience which includes 3+ years of proven experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop, MapR and 3+ year of experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration.
Proficient with Shell, Python, Ruby, YAML, Groovy scripting languages & Terraform.
Configured Elastic Load Balancing(ELB) for routing traffic between zones, and used Route53 with failover and latency options for high availability and fault tolerance.
Site Reliability Engineering responsibilities for Kafka platform that scales 2 GB/Sec and 20 Million messages/sec.
Worked on analyzing Data with HIVE and PIG.
Combined views and reports into interactive dashboards in Tableau Desktop that were presented to Business Users, Program Managers, and End Users.
Used Bash and Python, including Boto3 to supplement automation provided by Ansible and Terraform for tasks such as encrypting EBS volumes backing AMIs and scheduling Lambda functions for routine AWS tasks
Experienced in authoring POM.xml files, performing releases with Maven release plugin, modernization of Java projects, and managing Maven repositories.
Configured Elastic Search for log collections and Prometheus & Cloudwatch for metric collections
Branching, Tagging, Release Activities on Version Control Tools: SVN, GitHub.
Implemented and managed for Devops infrastructure architecture, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
Experience Architecting, designing and implementing large scale distributed data processing applications built on distributed key value stores over Hadoop, Hbase, Hive, MapReduce, Yarn and other Hadoop ecosystem components Hue, Oozie, Spark, Sqoop, Pig and Zookeeper.
Expertise in Commissioned Data Nodes when data grew and Decommissioned when the hardware degraded.
Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning, Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important and sensitive data residing on cluster.
Experience in creating S3 buckets and managed policies for S3 buckets and utilized S3 Buckets and Glacier for storage, backup and archived in AWS.
Experience in set up and maintenance of Auto - Scaling AWS stacks.
Team Player and self-starter possessing effective communication, motivation and organizational skills combined with attention to detail and business process improvements, hard worker with ability to meet deadlines on or ahead of schedules.

TECHNICAL SKILLS:

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, hortonwork, Ambari, Knox, Phoniex, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH).

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: Web logic server, WebSphere and Jboss.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub,Ranger Test NG, Junit.

Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Incident Management, Release Management, Change Management.

PROFESSIONAL EXPERIENCE:

Big Data Engineer/ Kafka Admin

Confidential, San Francisco, CA

Responsibilities:

I used snaplogic for Child pipelines
Spilit work into child and parent pipelines distributed exeution across mutiple nodes.
Spilitting and joining data flow and embedding a pipeline usefull for performing web service call in parallel. use the fork data either directly using S nap or by embedding a pipeline
Designed and implemented by configuring Topics in new Kafka cluster in all environment.
Successfully secured the Kafka cluster with Kerberos
Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
Successfully upgraded Cloudera hadoop cluster from CDH 5.4 to CDH 5.6.
Involved in enabling ssl for hue on prem CDH cluster.
Exported data to Teradata using sqoop data is stored in Vertica database table and Spark was used to load the data from Vertica table in to Data.
Worked with 50+ source systems and got batch files from heterogeneous systems like Unix/windows/oracle/Teradata/mainframe/db2.Migrated 1000+ tables from Teradata to HP vertica.
Developed Shell and Python scripts to automate and provide Control flow to Pig scripts. Imported data from Linux file system to HDFS
Expertise in designing Python scripts to interact with middleware/back end services.
Good experience in writing Spark applications using Python and Scala.
Designed and developed automation test scripts using Python.
Responsible for ingesting data from various source systems (RDBMS, Flat files, BigData) into Azure (Blob Storage) using framework model.
Primarily involved in Data Migration process using SQL, SQL Azure, SQL Azure DW, Azure storage and Azure Data Factory for Azure Subscribers and Customers.
Implemented Custom Azure Data Factory pipeline Activities and SCOPE scripts.
Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, Sql Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to connect to on premise servers
Created Hive tables to store the processed results in a tabular format. Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
Worked on ETL tool Informatica, Oracle Database and PL/SQL, Python and Shell Scripts.
Experience with ETL working with Hive and Map-Reduce.
Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using Datastage.
Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau
Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections
Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
Experience on Microsoft Azure Big Data - HDInsight, Hadoop, Hive, PowerBI, AzureSQLData Warehouse.Knowledge on Azure Machine Learning(RL language) & Predictive Analysis, Pig, HBase, MapReduce, MongoDB, SpotFire, Tableu.
Experience in Machine learning using NLP text classification, Churn prediction using Python.
Experience in leading MS BI Centre of Excellence (COE)& Competency which includes taking trainings on MS BI tools (DW, SSIS,SSAS,SSRS), providing architect solutions and supporting MS BI practice.
Develop and implement innovative AI and machine learning tools that will be used in the Risk
Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
Provided technical solutions on MS Azure HDInsight, Hive, HBase, Mongo DB, Telerik, Power BI, Spot Fire, Tableau, Azure SQL Data Warehouse Data Migration Techniques using BCP, Azure Data Factory, and Fraud prediction using Azure Machine Learning.
Use of cutting edge data mining, machine learning techniques for building advanced customer solutions.
Experience in managing multi-tenant Cassandra clusters on public cloud environment - Amazon Web Services (AWS)-EC2, Rackspace and on private cloud infrastructure - OpenStack cloud platform.
Performed data requirements analysis, data modeling and established data architecture standards.
Performed dimensional data modeling to support data warehouse design and ETL development activities.
Performed data requirements analysis, data modeling and established data architecture standards
Performed data requirements analysis, data modeling (using Erwin) and established data architecture standards.
Performed dimensional data modeling using Erwin to support data warehouse design and ETL development.
Designing and working with Cassandra Query Language knowledge in Cassandra read and write paths and internal architecture
Implemented multi-data center and multi-rack Cassandra cluster
Experience using DSE Sqoop for importing data from RDBMS to Cassandra
As an ETL Tester responsible for the understanding the business requirements, creating test data and test case design.
Involved in migrating the MySQL database to Oracle database and PSQL database to Oracle database.
Experience in writing SQL queries to process some joins on Hive table and No SQL Database.
Worked with Transform Components such as Aggregate, Router, Sorted, Filter by Expression, Join, Normalize and Scan Components and created appropriate DMLs and Automation of load processes using Autosys.
Good understanding and extensive work experience on SQL and PL/SQL.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
DBMS developments include building data migration scripts using Oracle SQL LOADER.
Extensively used Map Reduce component of Hadoop
Responsible for importing and exporting data into HDFS and Hive.
Analyzed data using Hadoop components Hive and Pig.
Responsible for writing Pig scripts to process the data in the integration environment
Responsible for setting up HBASE and storing data into HBASE
Responsible for managing and reviewing Hadoop log files
Responsible for running Hadoop streaming jobs to process terabytes of xml's data.
Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Helping business to make them understand reporting tools better by doing POC's on MicroStrategy and Tableau and JasperSoft.
Working extensively on dashboards/scorecards and grid report using MicroStrategy and Tableau.
Working closely with application/ business and database team to understand the functionality better and develop performance oriented reports
Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration. Worked on YUM configuration and package installation through YUM.
Developed simple and complex MapReduce programs in Java for Data Analysis.
Designed and architected in building New Hadoop Cluster.
Implemented AWS and Azure-Omni for the couch base Load.
Developed and deployed custom hadoop applications, Data Analysis, Data Storage and processed in Amazon EMR.
Worked on integration of Hiveserver2 with Tableau.
Involved in Installing, Configuring and Administration of Tableau Server.
Involved in loading data from LINUX filesystem to HDFS.
Involved in Troubleshooting, Performance tuning of reports and resolving issues within Tableau Server and Reports.
Deployed Spark Cluster and other services in AWS using console.
Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
Integrated LDAP Configuration this includes integrating LDAP for securing Ambari servers and manage authorization and securing with permissions against users and Groups
Implemented KNOX, RANGER, Spark and Smart Sence in Hadoop cluster.
Installed HDP 2.6 in all environments
Installed Ranger in all environments for Second Level of security in Kafka Broker.
Involved in Data Ingestion Process to Production cluster.
Worked on Oozie Job Scheduler
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
Installed Docker for utilizing ELK, Influxdb, and Kerberos.
Involved in defining test automation strategy and test scenarios, created automated test cases, test plans and executed tests using Selenium WebDriver and JAVA.
Architected Selenium framework which has integrations for API automation, database automation and mobile automation.
Executed and maintained Selenium test automation scriptb
Created Database on InfluxDB also worked on Interface, created for Kafka also checked the measurements on Databases
Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
Enabled influxDB and Configured Influx database source into Grafana interface
Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on the Prod machine in a Docker container.
Created a Cron Job those will execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files
Designed Data Flow Ingestion Chart Process.
Set up a new Grafana Dashboard with real-time consumer lags in Dev, PP Cluster, pulling only consumer lags metric and sending them to influx DB (Via a script in Corntab)
Worked on Database Schema DDL-oracle Schema issues at time of Ambari upgrade
Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
Deployed Data lake cluster with Hortonworks Ambari on AWS using EC2 and S3.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
Set up Hortonworks Infrastructure from configuring clusters to Node
Installed Ambari server on the clouds
Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
Assign access to users by multiple users login.
Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
Worked on SNMP Trap Issues in Production Cluster.
Worked on heap optimization and changed some of the configurations for hardware optimization.
Involved working in Production Ambari Views.
Implemented Rack Awareness in Production Environment.
Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
Worked on Nagios Monitoring tool.
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
Involved with Hortonworks Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
Successfully Generated consumer group lags from Kafka using their API
Installed and configured Ambari Log Search under the hood it will required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
Installed Ansible 2.3.0 in Production Environment
Worked on maintenance of Elastic search cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elastic search to write to multiple disk in same time and a segment of given Shard is written to the same disk
Upgraded Elastic search from 5.3.0 to 5.3.2 following the rolling upgrade process and using ansible to deploy new packages in Prod Cluster.
Successfully Made some visualization on Kibana
Also deployed Kibana with ansible and connected to Elastic search Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it.
Successfully test Kakfa ACL's with anonymous users and with different hostnames.
Created HBase tables to store variable data formats of data coming from different applications.
Worked on Production Support Issues

BigData Engineer - Hadoop Administrator

Confidential, Philadelphia, PA

Responsibilities:

Responsible for implementation and support of the Enterprise Hadoop environment.
Responsible for building scalable distributed data solutions using Hadoop.
Used Scala functional programming concepts to develop business logic.
Azure Cloud Infrastructure design and implementation utilizing ARM templates.
Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub- workflows
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.Spark scripts by using Scala shell commands as per the requirement.
Processing the schema oriented and non-schema oriented data using Scala and Spark.
Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
Used Teradata Viewpoint for Query Performance monitoring.
Having knowledge on Installation and configuration of Cloudera hadoop on single or cluster environment.
Hands on with Teradata Queryman & Administrator to interface with the Teradata.
Designing ETL jobs as per business requirements.
Developing ETL jobs with organization and project defined standards and process.
Actively involved in SQL and Azure SQL DW code development using T-SQL
Troubleshooting Azure Data Factory and SQL issues and performance.
Component unit testing using Azure Emulator.
Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
Worked on ETL tool Informatica, Oracle Database and PL/SQL, Python and Shell Scripts.
Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using Datastage.
Worked on MicroStrategy report development, analysis, providing mentoring, guidance and troubleshooting to analysis team members in solving complex reporting and analytical problems.
Extensively used filters, facts, Consolidations, Transformations and Custom Groups to generate reports for Business analysis.
Leveraged with the design and development of MicroStrategy dashboards and interactive documents using MicroStrategy web and mobile.
Extracted data from SQL Server 2008 into data marts, views, and/or flat files for Tableau workbook consumption using T-SQL. Partitioned and queried the data in Hive for further analysis by the BI team.
Managed Tableau extracts on Tableau Server and administered Tableau Server.
Extensively worked in data Extraction, Transformation and Loading data using BTEQ, Fast load, Multiload from Oracle to Teradata
Extensively used the Teradata fast load/Multiload utilities to load data into tables
Used Teradata SQL Assistant to build the SQL queries
Did data reconciliation in various source systems and in Teradata.
Involved in writing complex SQL queries using correlated sub queries, joins, and recursive queries.
Worked extensively on date manipulations in Teradata.
Extracted the data from oracle using sql scripts and loaded into teradata using fast/multi load and transformed according to business transformation rules to insert/update the data in data marts.
Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and certifying environments for production readiness.
Experience in Implementing Hadoop Cluster Capacity Planning
Involved in the installation of CDH5 and up-gradation from CDH4 to CDH5
Cloudera Manager Up gradation from 5.3. to 5.5 version
Extensive experience in cluster planning, installing, configuring and administrating Hadoop cluster for major Hadoop distibutions like Cloudera and Hortonworks.
Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
Hands on experience using Cloudera and Hortonworks Hadoop Distributions.
Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
Set up Hortonworks Infrastructure from configuring clusters to Node.
Worked with release management technologies such as Jenkins, github, gitlab and Ansible
Worked in Devops model, Continuous Integration and Continuous Deployment (CICD), automated deployments using Jenkins and Ansi
Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
Responsible on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
Helped the users in production deployments throughout the process.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
Responsible for building scalable distributed data solutions using Hadoop.
Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
Involved in Strom Batch-mode processing over massive data sets which is analogous to a Hadoop job that runs as a batch process over a fixed data set.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
Involved Storm terminology created a topology that runs continuously over a stream of incoming data.
Integrated Hadoop with Active Directory and enabled Kerberos for Authentication.
Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
Done stress and performance testing, benchmark for the cluster.
Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
Monitoring the System activity, Performance, Resource utilization.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Kafka- Used for building real-time data pipelines between clusters.
Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
Design and Implemented Amazon Web Services As a passionate advocate of AWS within Grace note, migrated from a physical data center environment to
Focused on high-availability, fault tolerance, and auto-scaling.
Managed critical bundles and patches on the production servers after successfully navigating through the testing phase in the test environments.
Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
Integrated Apache Kafka for data ingestion
Configured Domain Name System (DNS) for hostname to IP resolution.
Involved in data migration from Oracle database to MongoDB.
Queried and analyzed data from Cassandra for quick searching, sorting and grouping
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities.

BigData Operations Engineer - Consultant

Confidential, Indianapolis, IN

Responsibilities:

Cluster Administration, releases and upgrades Managed multiple Hadoop clusters with the highest capacity of 7 PB (400+ nodes) with PAM Enabled Worked on Hortonworks Distribution.
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Using Hadoop cluster as a staging environment for the data from heterogeneous sources in data import process.
Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
Extensive experience with Informatica (ETL Tool) for Data Extraction, Transformation and Loading.
Extensive experience in building Data Warehouses/Data Marts using ETL tools Informatica Power Center (9.0/8.x/7.x).
Experience in developing Unix Shell Scripts for automation of ETL process.
Configured High Availability on the name node for the Hadoop cluster - part of the disaster recovery roadmap.
Configured Ganglia and Nagios to monitor the cluster and on-call with EOC for support.
Involved working on Cloud architecture.
Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Maintained, audited and built new clusters for testing purposes using the AMBARI, HORTONWORKS.
Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
Set up Hortonworks Infrastructure from configuring clusters to Node
Installed Ambari server on the clouds
Setup security using Kerberos and AD on Hortonworks clusters
Designed and allocated HDFS quotas for multiple groups.
Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
Upgraded from HDP 2.2 to HDP 2.3 Manually in Software patches and upgrades.
Scripting Hadoop package installation and configuration to support fully automated deployments.
Configuring Rack Awareness on HDP.
Adding new Nodes to an existing cluster, recovering from a Name Node failure.
Instrumental in building scalable distributed data solutions using Hadoop eco-system.
Adding new Data Nodes when needed and re-balancing the cluster.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Involved working in Database backup and recovery, Database connectivity and security.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Utilization based on the running statistics of Map and Reduce tasks.
Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
Inputs to development regarding the efficient utilization of resources like memory and CPU utilization.

Hadoop Admin/ Linux Administrator

Confidential, CHICAGO, IL

Responsibilities:

Installation and configuration of Linux for new build environment.
Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions
Experienced in Installation and configuration Cloudera CDH4 in testing environment.
Resolved tickets submitted by users, P1 issues, troubleshoot the errors, resolving the errors.
Balancing HDFS manually to decrease network utilization and increase job performance.
Responsible for building scalable distributed data solutions using Hadoop.
Done major and minor upgrades to the Hadoop cluster.
Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
Done stress and performance testing, benchmark for the cluster.
Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Monitoring the System activity, Performance, Resource utilization.
Develop and optimize physical design of MySQL database systems.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Responsible for maintenance Raid-Groups, LUN Assignments as per agreed design documents.
Extensive use of LVM, creating Volume Groups, Logical volumes.
Performed Red Hat Package Manager (RPM) and YUM package installations, patch and other server management.
Tested and Performed enterprise wide installation, configuration and support for hadoop using MapR Distribution.
Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster
Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
Involved in estimation and setting-up Hadoop Cluster in Linux.
Prepared PIG scripts to validate Time Series Rollup Algorithm.
Responsible for support, troubleshooting of Map Reduce Jobs, Pig Jobs and maintaining Incremental Loads at daily, weekly and monthly basis.
Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
Channelized Map Reduce outputs based on requirement using Practitioners
Performed scheduled backup and necessary restoration.
Build and maintain scalable data using the Hadoop ecosystem and other open source components like Hive and HBase.
Monitor the data streaming between web sources and HDFS.

Linux/ Unix Administrator

Confidential

Responsibilities:

Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
Performed administration and monitored job processes using associated commands
Manages systems routine backup, scheduling jobs and enabling cron jobs
Maintaining and troubleshooting network connectivity
Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
Configures DNS, NFS, FTP, remote access, and security management, Server hardening
Installs, upgrades and manages packages via RPM and YUM package management
Logical Volume Management maintenance
Experience administering, installing, configuring and maintaining Linux
Creates Linux Virtual Machines using VMware Virtual Center dministers VMware Infrastructure Client 3.5 and Vsphere 4.1
Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
Supporting infrastructure environment comprising of RHEL and Solaris.
Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.

We provide IT Staff Augmentation Services!

Big Data Engineer/ Kafka Admin Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship