Hadoop Admin/Big Data Project Manager Resume

PROFESSIONAL SUMMARY:

Highly skilled Hadoop Administrator with extensive knowledge of scripting and programming languages. Possesses strong abilities in administration of large data clusters in big data environments and is extremely analytical with excellent problem - solving skills. Has a Master’s Degree in Computer Science and more than 9+ years of IT experience which includes 5+ years of Big Data ecosystem and related technologies.
Hands-on experience in installation, configuration, supporting and managing Hadoop clusters using Hortonworks and Cloudera.
Experience in installing, patching, upgrading, and configuring Linux based operating systems & maintaining UNIX Infra.
Sound Knowledge in building & configuring Cloudera Hadoop cluster in AWS.
Experience in implementing new clusters all together from scratch and done live data migration from the old cluster to the newly built one without affecting any running production jobs.
Experience in building end-to-end data pipelines in Amazon Web Services (AWS) that includes AWS Batch, Amazon RDS, AWS CloudWatch and using databases hosted on AWS EC2 instances like Teradata, and MySQL.
Experience in building end-to-end pipeline in Azure using Event Hubs, Spark, Delta and Azure Data Lake Storage (ADLS) Gen2.
Experience in Cloud Technologies Azure and AWS.
Excellence in handling Big Data and cloud Ecosystems like Azure, Apache Hadoop, Spark, HDFS Architecture, Sqoop, and Hive.
Excellent understanding of Hadoop Cluster security and implemented secure Hadoop cluster using Kerberos, PAM.
Experienced in providing project Effort Estimations, Work Breakdown Structure, Risk Analysis, Mitigation & GAP Analysis
Capable of handling multiple projects and delivering high quality results
Experienced in resource planning, organizing, prioritizing, and delegating assignments based on the customer needs
Experienced in project progress tracking, reporting & deliverables tracking
Experienced in waterfall & agile methodologies
Experience in providing on-call for Production Support.

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Admin/Big Data Project Manager

Responsibilities:

Worked in a dual role as Project Manager and TechLead. Resolved Technical issues raised by business users.
Proactively communicate project status, issues and risks to internal stakeholders and top management.
Conducted regular project status/quality meetings and reports. Keep management and key stakeholders well informed on a weekly basis on project progress, status and/or concerns for each assignment.
Assessed business implications for each project phase and monitored progress to meet deadlines, standards and cost targets.
Responsible for setting up weekly status meetings with the team and report the monthly status to program manager.
Extraction of log files from different sources and loaded into HDFS for analyzation using Flume
Created Pig scripts with the logic and computation.
Automated the jobs by pulling data from different sources to load data into HDFS tables using Oozie workflows.
Interface with SME's, Analytics team Account managers and Domain Architects to review tobe developed solutions.
Convert high level solution into deliverables by generating a controllable and manageable activity list.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
Articulated project goals and scope, translated business needs into technical terms, prepared detailed work breakdown structures (WBS) and instilled shared accountability for achieving project milestones.
Responsible for leading the complete technical life cycle of project completion starting from 10-Day estimate to Testing to Deployment to project closure
Responsible to coordinate with conflicting project requirements with other stakeholders while defining the project scope

Confidential

Hadoop Admin

Responsibilities:

Worked on building Hortonworks data flow environment as a POC.
Built Non-production and Production HDF environment.
Involves designing, capacity arrangement, cluster setup, monitoring, structure, planning, and administration.
Setting site scope alerts for file system and CPU utilization on the servers.
Develop and document all the best practices.
Involved in installing Alation (Data Catalog tool) as a POC.
Worked on building Alation into Production with required licenses.
Working on setting up a Dataiku tool POC from scratch and integrate it with Alation.
Responsible for building scalable distributed data solutions using Hadoop.
Built a pipeline to consume and transform real-time data from Azure Event Hubs and store it on Databricks File System (DBFS) as Delta and Azure Data Lake Storage (ADLS) Gen 2 to make it accessible by Power BI and other gateway services.
Worked on Job management using Capacity scheduler and Developed job processing scripts using Oozie workflow.
Configured deployed and maintained multi-node Dev and Test Clusters.
Pipelined auction transactional data in Amazon Web Services (AWS), transformed and made it available to team members to run several machine learning models. Also streamlined the output of these models to combine with appropriate groups, collaborate with other teams to generate a global data table available across the yard units. Used AWS Batch, Amazon RDS and AWS CloudWatch.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Worked extensively with Sqoop for importing metadata from Oracle.
Analyzed the SQL scripts and designed the solution to implement using Pyspark
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Hive queries to process the data and generate the data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Hive queries to process the data and generate the data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Performed HDFS cluster support and maintenance tasks like commissioning and decommissioning servers without any issues to the existing data.
Working as production support 24x7.
System/cluster configuration and health check-up on daily basis.
Assist development team in identifying the root cause of slow performing jobs / queries.
Working with IT service management and other groups to make sure all events, incidents and problems are resolved as per the SLA.
Monitoring running applications a and provide guidance for improving DB performance for developers. Manage and review Hadoop log files.
Performs root cause analysis on failed components and implements corrective measures.
File system management and monitoring through automated scripts
Continuous monitoring and managing the Hadoop cluster through Ambari.
Resolving tickets submitted by users for troubleshoot the error, resolving the issues.
Working with Data Science development team and maintain IBM-Data Science tool.
Working on Anaconda to install and provision new python packages to the users.
Project 3: Hadoop Administrator consultant and Enhancing Hadoop Security
Design and implementation of a new Hadoop platform IOP 4.2.0, IOP-Utils 1.2 and Hortonworks 2.6 on Redhat 7 to ingest all the data from multiple sources into the data lake for future ETL
Hands on Installation and configuration of Hortonworks Data platform HDP 2.6.2
Worked on installing production cluster, commissioning & decommissioning of Data Nodes, Name node recovery, capacity planning and slots configuration.
Worked as Hadoop core admin with responsibilities including software installation, configuration, software upgrades, backup and recovery, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster up and run on healthy.
Designed, developed and implemented connectivity to products that allow efficient exchange of data between the core databases engine and the Hadoop eco system.
Involved in defining job flows using oozie for scheduling jobs to manage Apache Hadoop jobs.
Implemented name node high availability on the Hadoop cluster overcome single point of failure.
Worked on importing and exporting data from oracle database into HDFS and Hive using Sqoop.
Monitored and analyzing map reduce jobs execution on cluster task level.
Extensively involved in cluster capacity planning, hardware planning performance tuning of the Hadoop cluster.
Wrote automation scripts and setting up crontab jobs to maintain cluster stability and healthy
Installed Ambari on an already existing Hadoop cluster.
Implemented tack awareness for data locality optimization.
Optimized and tuned the Hadoop environments to meet performance requirements.
Hands on experience with building multiple non-production and production Hadoop clusters.
Ability to document existing processes and recommended improvements.
Shares knowledge and assists another team member as needed.
Assist with maintenance and troubleshooting of schedule processes.
Participated in development of system test plans and acceptance criteria.
Collaborate with offshore development team in order to monitor ETL jobs and troubleshoot steps.

Confidential

Elasticsearch Developer

Responsibilities:

Associates within the company use this environment which can show them the new and updated products created by the company. Creating a new environment and setting up asearch engine technology for associate’s use which shows a drastic change for the company as well as client, in terms of business management.
Data generated to create a new product and all kinds of updating done to previous products can be taken into consideration and send the information to elastic search engine by creating a new pipeline.
Later on, for security purpose these data uploaded to cloud and data can be handled by Hadoop and big data technologies.
Responsible for new product data.
Introducing High end elastic search technology into MSC direct as a new search technology and process as an information retrieval system.
Database modeling and design involved in development and implementing of the search-using JAVA and PYTHON.
Developing new applications to convert the file formats and inject into elastic search.
Hands on development on code that focuses on bigdata technologies such as Pig, hive, spark MapReduce and Sqoop.
Used restful API to gather sales related data for products.
Performed database testing using SQL.
AngularJS to describe web pages.
Used SVN as source version repository and push code automatically onto production servers, compatibility throughout the development.
Aided establishment of KT documents and process documents within the Project and team, thus promoting sand contributing to the overall knowledge base in maintaining a general repository.
Project estimation delegation planning and execution.
Working with team members to development new search environment.
Design and Implementation high performance, large volume of data integration, storage and high-end services.

Confidential

Software Developer

Responsibilities:

Analyzed system requirements and led the Java/J2EE projects development, application enhancement & maintenance.
Analyze and design a report based on Crystal Report and Java.
Gathering requirements, analysis of entire system and providing estimation on development and testing efforts.
Loading from desperate data sets.
Perform analysis of vast data stores and uncover insights.
Design and maintain schemas in the analytical database and write efficient SQL for loading and querying analytics data.
Design and develop code that consistently address to functional programming principles.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship