Sr. Big Data Engineer Resume Raliegh, NC - Hire IT People

SUMMARY

Hadoop Developer and analyst with over 8 years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
5+ years of extensive experience as Hadoop and spark engineer and Big Data analyst.
DataStax Cassandra and IBM Big Data University certified.
Implemented various algorithms for analytics using Cassandra with Spark and Scala.
Excellent understanding of Hadoop architecture and underlying framework including storage management.
Have experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH4, and CDH5.
Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drill and spark for data storage and analysis.
Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
Experienced in running query - using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
Good experience in Oozie Framework and Automating daily import jobs.
Experienced in managing Hadoop clusters and services using Cloudera Manager.
Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
Experienced in Creating Vizboards for data visualization in Platfora for real - time dashboard on Hadoop.
Collected logs data from various sources and integrated in to HDFS using Flume.
Assisted Deployment team in setting up Hadoop cluster and services.
Good experience in Generating Statistics/extracts/reports from the Hadoop.
Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB.
Designed and implemented a product search service using Apache Solr.
Good knowledge in querying data from Cassandra for searching grouping and sorting.
Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.
Having good knowledge in Benchmarking & Performance Tuning of cluster.
Experienced in Identifying improvement areas for systems stability and providing end end high availability architectural solutions.
Good experience in Generating Statistics and reports from the Hadoop.
Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, Map Reduce YARN, Hive, Pig, Hbase, Impala, Zookeeper, Sqoop, Oozie, DataStax & Apache Cassandra, Drill, Flume, Spark, Solr and Avro, AWS, Amazon EC2, S3.

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX

RDBMS: Oracle 10g/11g, MySQL, SQL server, Teradata

No SQL: Hbase, Cassandra

Web/Application servers: Tomcat, LDAP

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Data Bases: Oracle 11g/10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access

Programming Languages: Scala, Python, SQL, Java, PL/SQL, Linux shell scripts.

Tools: Used Eclipse, Putty, Cygwin, MS Office

BI Tools: Platfora, Tableau, Pentaho

PROFESSIONAL EXPERIENCE

Confidential - Raliegh, NC

Sr. Big Data Engineer

Responsibilities:

Implemented a generic Sqoop framework with high availability for bringing related data for DaaS from various sources into Hadoop then processed the data and loaded the data to Cassandra using spark as a denormalize table.
Implemented Informatica workflows for bringing data to Hadoop from various sources.
Experienced in using Platfora a data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
Implemented various Data Modeling techniques for Cassandra.
Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
Participated in various upgradations and troubleshooting activities across enterprise.
Knowledge in performance troubleshooting and tuning Hadoop clusters.
Applied Spark advanced procedures like text analytics and processing using the in-memory processing.
Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
Created architecture stack blueprint for data access with NoSQL Database Cassandra;
Experienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.
Created multiple dashboards in tableau for multiple business needs.
Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
Devised and lead the implementation of next generation architecture for more efficient data ingestion and processing.
Created and implemented various shell scripts for automating the jobs.
Implemented Apache Sentry to restrict the access on the hive tables on a group level.
Employed AVRO format for the entire data ingestion for faster operation and less space utilization.
Experienced in managing and reviewing Hadoop log files.
Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
Implemented test scripts to support test-driven development and continuous integration.
Used Spark for Parallel data processing and better performances.

Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential - Chicago, IL

Hadoop Developer

Responsibilities:

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.
Experienced in installing, configuring and using Hadoop Ecosystem components.
Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
Participated in development/implementation of Cloudera Hadoop environment.
Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.
Used DataStax Cassandra along with Pentaho for reporting.
Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.
Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.
Designed and implemented a product search service using Apache Solr/Lucene.
Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.
Used Yarn Architecture and Map reduce 2.0 in the development cluster for POC.
Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.

Environment: CDH 5.0, 5.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra, spark, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential - Plano, Texas

Hadoop Developer

Responsibilities:

Acted as a lead resource and build the entire Hadoop platform from scratch.
Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Estimated the Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
Lead role in NoSQL column family design, client access software, Cassandra tuning; during migration from Oracle based data stores.
Designed, implemented and deployed within a customer’s existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer defined metrics and unsupervised learning models.
Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
Wrote queries Using DataStax Cassandra CQL to create, alter, insert and delete elements.
Written the Map Reduce programs, Hive UDFs in Java.
Used Map Reduce JUnit for unit testing.
Deployed an Apache Solr/Lucene search engine server to help speed up the search of financial documents.
Develop HIVE queries for the analysts.
Created an e-mail notification service upon completion of job for the particular team which requested for the data.
Defined job work flows as per their dependencies in Oozie.
Played a key role in productionizing the application after testing by BI analysts.
Given POC of FLUME to handle the real-time log processing for attribution reports.
Maintain System integrity of all sub-components related to Hadoop.

Environment: Apache Hadoop, HDFS, Spark, Solr, Hive, DataStax Cassandra, Map Reduce, Pig, Java, Flume, Cloudera CDH4, Oozie, Oracle, MySQL, Amazon S3.

Confidential - Tampa, FL

Hadoop engineer

Responsibilities:

Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
Setup and benchmarked Hadoop/Hbase clusters for internal use.
Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
Developed Simple to complex Map/reduce Jobs using Hive and Pig
Developed Map Reduce Programs for data analysis and data cleaning.
Developed PIG Latin scripts for the analysis of semi structured data.
Developed and involved in the industry specific UDF (user defined functions)
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop to import data into HDFS and Hive from other data systems.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
Developed Hive queries to process the data for visualizing.

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, Map Reduce, Eclipse, Hive, PIG, Sqoop, Oozie and SQL.

Confidential

Sr. Java Developer

Responsibilities:

Involved in requirement analysis and played a key role in project planning.
Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project.
Designed and developed web components and business modules through all tiers from presentation to persistence.
Used hibernate for mapping from Java classes to database tables.
Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
Developed UI layout using Dreamweaver.
Developed java beans to interact with UI & database.
Created the end-user business interfaces.
Frequent interaction with client and delivered solution for their business needs.
Developed ANT script for building and packaging J2EE components.
Wrote PL/SQL queries and Stored procedures for data retrieval
Created and modified DB2 Schema objects like Tables, Indexes.
Created Test Plan, Test Cases & scripts for UI testing.

Environment: Java, JSP, Servlets, JDBC, JavaBeans, Oracle, HTML/DHTML, Microsoft FrontPage, Java Script 1.3, PL/SQL, Tomcat 4.0, Windows NT.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Raliegh, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship