We provide IT Staff Augmentation Services!

Big Data/nifi Developer Resume

2.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • Over 3+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, SQL,NoSQL technologies.
  • Experience in working with Developer Toolkits like Eclipse IDE, Mavens.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and Control - M.
  • Experience in the Software Development Life Cycle (SDLC) phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Strong technical, administration and mentoring knowledge in Linux and Big data/Hadoop technologies.
  • Good working experience in using Spark SQL to manipulate Data Frames in Python.
  • Good knowledge in NoSQL databases including Cassandra and MongoDB.
  • A very good experience in developing and deploying the applications using Weblogic, Apache Tomcat, and JBoss.
  • Extensive knowledge of Teradata utilities (BTEQ, Fastload, Fast Export, Multiload Update/Insert/Delete/Upset)
  • Good experience in Hive partitioning, bucketing and performing different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
  • Experience with build tool ANT, Maven and continuous integrations like Jenkins.
  • Working experience in Development, Production and QA Environments.
  • Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2.
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
  • Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL MySQL, and IBM DB2.
  • Execute faster MapReduce functions using Spark RDD for parallel processing or referencing a dataset in HDFS, HBase and other data sources
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Experience with Apache Spark ecosystem using Spark-SQL, Data Frames, RDD's and knowledge on Spark MLlib.
  • Hands on experience with Big Data Ecosystems including Hadoop, MapReduce, Pig, Hive, Impala, Sqoop, Flume, NIFI, Oozie, MongoDB, Zookeeper, Kafka, Maven, Spark, Scala, HBase, Cassandra.
  • Good Knowledge in using NiFi to automate the data movement between different Hadoop systems.
  • Designed and implemented custom NiFi processors that reacted, processed for the data pipeline
  • Experience in installation, configuration and deployment of Big Data solutions.
  • Good usage of Apache Hadoop along the enterprise version of Cloudera and Hortonworks.
  • Good Knowledge on MAPR distribution & Amazon's EMR.
  • Extensive experience in developing stored procedures, functions, Views and Triggers, Complex queries using SQL Server, TSQL and Oracle PL/SQL
  • Experience with Data Cleansing, Data Profiling and Data analysis. UNIX Shell Scripting, SQL and PL/SQL coding.
  • Database / ETL Performance Tuning: Broad Experience in Database Development including effective use of Database objects, SQL Trace, Explain Plan, Different types of Optimizers, Hints, Indexes, Table Partitions, Sub Partitions, Materialized Views, Global Temporary tables, Autonomous Transitions, Bulk Binds, Capabilities of using Oracle Built-in Functions. Performance Tuning of Informatica Mapping and workflow.
  • Superior communication skills, strong decision making and organizational skills along with outstanding analytical and problem solving skills to undertake challenging jobs. Able to work well independently and also in a team by helping to troubleshoot technology and business related problems.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, NIFI,Apache Flume 1.8, Kafka 1.1, Zookeeper

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory

Hadoop Distributions: Cloudera, Hortonworks, MapR

Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0

Databases: Oracle 12c/11g, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2, Intellij, Maven

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS

Other Tools: Visual Studio 2010, Business Intelligence Studio 2008, SQL Server Integration Services (SSIS) 2005/2008, SQL Server Reporting Services (SSRS) 2008, SQL Server 2008 R2

PROFESSIONAL EXPERIENCE

Confidential - Irving, TX

Big Data/Nifi Developer

Responsibilities:

  • Implemented quality logical and physical ETL designs that have been optimized to meet the operational performance requirements for our multiple solutions and products, including architecture, design, by following strict development standards.
  • Worked on Azure Data Platform components - Azure Data Lake, Data Factory, Data Management Gateway, Azure Storage Options, DocumentDB, Data Lake Analytics, Stream Analytics, EventHubs, Azure SQL.
  • Created DataFrames from RDDs using reflection and programmatic inference of schema over RDD.
  • Good knowledge on handling medical data and building systems that conform to the compliance set by HIPAA
  • Responsible for handling large and real-time datasets using Partitions, Spark’s in-memory capabilities, Broadcast variables, Joins, and other Transformations at the time of data Ingestion.
  • Developed and designed data integration and migration solutions in Azure.
  • Designed solution for various system components using Microsoft Azure.
  • Designed and Implemented Spark Jobs to be deployed and run on existing Active clusters.
  • Developed multiple Scala user defined functions (UDF’s) to fit specific analytical requirements.
  • Designed and Developed custom data flow using Apache Nifi to fully automate the ETL process by taking various worst-case scenarios into account.
  • Developed NiFi workflow to pick up the multiple retail files from ftp location and move those to HDFS on a daily basis.
  • Used NiFi data Pipeline to process large sets of data and configured Lookup’s for Data Validation and Integrity.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy.
  • Worked on a Spark Job to Convert fixed width files and CSV files data to Avro, parquet and .xml formats by applying transformations on the data based on business requirements.
  • Involved in writing code with Scala which has support for functional programming.
  • Developed Impala scripts for end user / analyst requirements for adhoc analysis.
  • Responsible for writing bash scripts to automate various small but time-consuming tasks such as user creation, importing multiple tables from RDBMS to HDFS using sqoop, setting up a development environment for new users.
  • Responsible for converting complex SQL queries from Teradata BTEQ scripts to SPARK SQL to leverage the distributed nature of the underlying spark RDDs

Environment: Hadoop 3.0, Scala 2.12, Spark, SQL, Hive 2.3, Scala, Cassandra 3.11, Oozie, Apache Nifi, Azure, Oracle 12c, RDBMS, HDFS, XML, JSON, Parquet, Avro, Spark Kafka, Sqoop

Confidential

Big Data/Hadoop Developer

Responsibilities:

  • Implemented Cassandra and managed the other tools to process observed running on over Yarn.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Worked with Kafka streaming tool to load the data into HDFS and exported it into MongoDB database.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Developed mappings to extract data from Oracle, Teradata, Flat files, XML files, Excel and load.
  • Analyzed Hadoop cluster and different Big Data including HBase and Cassandra.
  • Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, Avro data files and sequence files for log files.
  • Used Elastic search as a distributed RESTful web services with MVC for parsing and processing XML data.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Collected the log data from web servers and integrated into HDFS using Flume.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
  • Involved in designing the data model for the system.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and spark.
  • Wrote multiple MapReduce programs in Java for data extraction, transformation, and aggregation from multiple file formats.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters and Experience in converting MapReduce applications to Spark.
  • Responsible for building and configuring distributed data solutions using MapR distribution of Hadoop.
  • Installed Oozie workflow engine to run multiple MapReduce, Hive HQL and Pig jobs.
  • Developed HDFS with huge amounts of data using Apache Kafka.
  • Implemented best income logic using Pig scripts and UDFs.

Environment: Hadoop 3.0, Pig 0.17, Hive 2.3, HBase, Sqoop, MapR, HDFS, Agile, Spark SQL, Apache Kafka 2.0.0, Scala, Spark, Nifi, NoSQL, Cassandra 3.11, Java, MongoDB, Flume, XML, JSON, Elastic search, AWS

Confidential

ETL/SQL Developer

Responsibilities:

  • Developed Logical and Physical data models that capture current state/future state data elements and data flows using Erwin 4.5.
  • Responsible for design and build data mart as per the requirements.
  • Extracted Data from various sources like Data Files, different customized tools like Meridian and Oracle.
  • Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for loading the data (staging) to enhance and maintain the existing functionality.
  • Done analysis of Source, Requirements, existing OLTP system and identification of required dimensions and facts from the Database.
  • Created Data acquisition and Interface System Design Document.
  • Designed the Dimensional Model of the Data Warehouse Confirmation of source data layouts and needs.
  • Extensively used Oracle ETL process for address data cleansing.
  • Developed and tuned all the Affiliations received from data sources using Oracle and Informatica and tested with high volume of data.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Oracle and Informatica PowerCenter.
  • Created common reusable objects for the ETL team and overlooked coding standards.
  • Reviewed high-level design specification, ETL coding and mapping standards.
  • Designed new database tables to meet business information needs. Designed Mapping document, which is a guideline to ETL Coding.
  • Used ETL to extract files for the external vendors and coordinated that effort.
  • Migrated mappings from Development to Testing and from Testing to Production.
  • Performed Unit Testing and tuned for better performance.
  • Created various Documents such as Source-to-Target Data mapping Document, and Unit Test Cases Document.

Environment: Informatica Power Center 8.1/7.1.2, Erwin 4.5, Oracle 10g/9i, Teradata V2R5, XML, PL/SQL, SQL Server 2005/2000 (Enterprise Manager, Query Analyzer), Sybase, SQL* Loader, SQL * Plus, Autosys, OLAP, Windows XP/NT/2000, Sun Solaris UNIX, MS Office 2003, Visio Project, Shell scripts.

We'd love your feedback!