Bigdata Lead Resume
Nashville, TN
SUMMARY:
- IT professional with 8 years of experience in Analysis, Design,Development, Integration, Testing and maintenance of various applications using JAVA /J2EEtechnologies along with 4+ years of BigData/Hadoop experience.
- Experienced in building highly scalable Big - data solutions using Hadoop andmultiple distributions i.e.Cloudera, Horton works and NoSQL platforms(Hbase, Cassandra, DynamoDB).
- Expertise in big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Pig, Zookeeper, Oozie, Flume, Avro, Impala and Apache spark.
- Hands on experience on performing Data Quality checks on petabytes of data
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
- Expertise in developing PIG and HIVE scripts for data analysis
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using HiveQL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Expertise in using Apache Hcatlog with different big data processing tools.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non-standard formats
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries
- Involved in the Ingestion of data from various Databases like TERADATA( Sales Data Warehouse), AS400, DB2, SQL-SERVER using Sqoop
- Experience working with Flume to handle large volume of streaming data.
- Good working knowledge on Hadoop hue ecosystems.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts.
- Good knowledge in evaluating big data analytics libraries (MLlib) and use of Spark-SQL for data exploratory.
- Experienced in using Spark Streaming for handling streaming data.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in scala.
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache kafka and zookeeper.
- Expert in creating and designing data ingest pipelines using technologies such as springIntegration, Apache Storm-kafka
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC, JSON for HIVE Querying and Processing
- Experienced in working with DynamoDB.
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS
- Working knowledge in Hadoop HDFS Admin Shell commands.
- Developed core modules in large cross-platform applications using JAVA, J2EE, Python, JDBC, JavaScript, XML, and HTML.
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Working Knowledge in configuring and monitoring tools like Ganglia and Nagios.
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experienced with version controller systems like SVN, Clear case.
- Experience using IDEs tools Eclipse, NetBeans, Pycharm for Python.
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies
TECHNICAL SKILLS:
BigData Technologies: HDFS, MapReduce, Hive, Hcatlog, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Zookeeper, Kafka,Impala, Apache Spark, hue, EMR, Cloudwatch, Lambda, Kinesis, EC2, SNS, Elastic Search.
Hadoop Distributions: Cloudera (CDH4/CDH5), AWS EMR
Languages: Java, C, SQL, PYTHON,PL/SQL,PIG-Latin, HQL
IDE Tools: Eclipse, NetBeans, Pycharm
Framework: Hibernate, Spring, Struts, Junit
Web Technologies: HTML5, JavaScript, JQuery, Servlets, JSP,JSON, XML, Angular JS
Web Services: SOAP,REST, WSDL
Operating Systems: Windows (XP,7,8), UNIX, LINUX,Ubuntu,CentOS
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Reporting Tools /ETL Tools: Powerview for Microsoft Excel,Informatica, Rstudio
Databases: Oracle, MySQL, DB2, Derby,PostgreSQL,No-SQL Database (Hbase,Cassandra, DynamoDB), Presto
PROFESSIONAL EXPERIENCE:
Bigdata Lead
Confidential, Nashville, TN
Responsibilities:
- Involved in creating Hive tables loading and analyzing mixpanel, telemetry data using hive queries.
- Developed multiple Mapreduce jobs in java for data cleaning and pre-processing.
- Written scripts to provision clusters and applications in AWS depending upon the needs of various customers.
- Written autoscaling scripts to add and remove nodes on the clusters depending on the use.
- Developed simple to complex hive queries and pig scripts for processing enrollement, mixpanel and telemetry data.
- Developed APIs for FPE (Format preserving encryption) using HP Voltage to encrypt Personal Information in the data.
- Responsible for managing data from multiple sources.
- Implemented Pig scripts to transform data from dev buckets to prod buckets with key rotation.
- Assisted in exporting analyzed data to relational databases SISS.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive, Pig, Spark, Zeppelin, Presto, Ganglia, Rstudio in AWS EMR clusters.
- Experience in managing and reviewing hadoop log files.
- Worked on creating log4j appenders to load the logs that are generated from custom APIs to AWS Cloudwatch.
- Analyzed large amounts of datasets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with pig and hive.
- Established connections to ingest data in and from HDFS.
- Fetched data from on premise My Sql database and imported into S3.
- Optimizing the hive queries using the various file formats like Json, Avro, ORC and parquet.
- Optimizing hive queries using various big data techniques.
- Written spark jobs to convert json files to parquet files.
- Involved in developing Sparksql scripts to query battery and mixpanel data in Hive tables.
- Assisted datascience team in sharing the knowledge in spark and solving technical issues in spark.
Environment: AWS, EMR, AWS Cloudwatch, AWS Lambda, Hadoop, HDFS, MapReduce, Hive, PIG, Java (JDK 1.8), Eclipse, MySQL, Spark, Redshift, DynamoDB, EC2, Maven, SVN.
Senoir Hadoop Developer
Confidential, O’Fallon, MO
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple MapReduce programs in Java for Data Analysis
- Wrote MapReduce job using Pig Latin and Java API
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Developed pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume
- Designed and presented plan for POC on impala.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrate withSpark environment.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Performed extensive Data Mining applications using HIVE.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Ooziecoordinator jobs.
- Performed streaming of data using Spark Streaming by setting up cache for efficient data analysis.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Utilized Storm for processing large volume of datasets.
- Used Kafka to load data in to HDFS and move data into NoSQL databases(cassandra)
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using JobTracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop,Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, SVN, autosys, Hbase,Cassandra, Spark Streaming.
Hadoop Developer
Confidential, Cleveland, Ohio
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, hbase,Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Putty and Eclipse
Hadoop Developer
Confidential, Walmart Bentonville, AR
Responsibilities:
- Gathering functional requirements. Analyzed large amounts of datasets to determine optimal way to aggregate and report on it.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote MapReduce jobs.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing scripts and Batch job to schedule various Hadoop programs.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Importing and exporting data into HDFS and Hive using sqoop.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Java,Eclipse,cobol,DB2,Oracle, DB2, PL/SQL, Toad, QMF.
Hadoop Developer
Confidential
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HDFS using Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the credit card transactions and structure them in tabular format to facilitate effective querying on the data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop for internal use.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, hbase,Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Putty and Eclipse
SQL and Jr. Java Developer
Confidential
Responsibilities:- Collect and collate appropriate data for use in databases and conduct related research.
- Design and development of Java classes and interfaces.
- Worked in core Java for the client side validations
- Design and developed JDBC connection objects for the data retrieval and update.
- Loading the data into DB2 environment.
- Working on java and mainframes.
- Developed Database applications using SQL and PL/SQL.
- Scheduling of jobs using CA7.
- Responsible for analyzing and developing the data.
- Coordinating with cross-functional teams in different locations for quality data and analysis.
- Designing and developing software applications, working following rules of SDLC.
- Creating production and analysis report and Handling Production Issues and Releases.
- Monitor and maintain the quality of database systems, secure access and use.