We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • 7+ years of extensive experience with 4+Big Data Engineer/Hadoop Developer with around business procedures, design strategies, data analytics solutions development and work flow implementations.
  • Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data
  • Hands on experience on Hadoop and Spark Big Data technologies with experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop tools such as Map Reduce, Hive, Sqoop, Impala, Avro & HDFS.
  • Technologies extensively worked on during my career are Python, Java and various Databases like MySQL, Oracle, Postgre SQL and Microsoft SQL server.
  • Hands on experience working with various Hadoop cluster managers and tools like Cloudera Manager, Apache Ambari, Hue, etc.
  • Developed End - to-End data lake solution in Hadoop and Snowflake.
  • Experienced in developing programs by using SQL, Python & shell scripts to schedule the processes running on a regular basis.
  • Proficient in working on the Git version control system for code sharing and updating.
  • Experienced in creating ad-hoc reports, summary reports using Advanced Excel, SQL and Tableau.
  • Experienced in collecting logs data from various sources and integration into HDFS using Flume.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing & exporting data using Sqoop from HDFS to Relational Database Systems & vice-versa.
  • Migration of Hadoop solution to Snowflake platform.
  • Building use cases in snowflake by bring various sources using Attunity
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS).
  • Good experience in Shell programming.
  • Knowledge in managing Cloudera's Hadoop platform along with CDH clusters.
  • Proficient in writing complex SQL queries, working with Databases like Oracle, SQL Server, PostgreSQL and MySQL
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience working with operational data sources and migration of data from traditional databases to Hadoop System.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Snowflake, Flume, Oozie, Impala, HBase, Hue, Zookeeper.

Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala and Kafka

Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD.

Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.

NoSQL Technologies: Cassandra, MongoDB, HBase.

Frameworks: Struts, Hibernate, And Spring MVC.

Scripting Languages: Unix Shell Scripting, Perl.

Distributed platforms: Horton works, Cloudera, MapR

Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux

Software Package: MS Office 2007/2010/2016.

Web/ Application Servers: Web Logic, WebSphere, Apache Tomcat, WebSphere Application Server

Visualization: Tableau, Qulickview, Microstratergy and MS Excel.

Version control: CVS, SVN, GIT, TFS.

Web Technologies: HTML, XML, CSS, JavaScript, and jQuery, AJAX, AngularJS, SOAP, REST and WSDL.

PROFESSIONAL EXPERIENCE

Confidential, Austin TX

Big Data Engineer

Responsibilities:

  • Developed analytical solutions, data strategies, tools and technologies for the marketing platform using the Big Data technologies.
  • Implemented solutions for ingesting data from various sources utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Sqoop, Hive
  • Worked as a Hadoop consultant on technologies like Map Reduce, Pig, Hive, and Sqoop.
  • Worked with the PySpark API.
  • Involved in ingesting large volumes of credit data from multiple provider data sources to AWS S3. Created modular and independent components for AWS S3 connections, data reads.
  • Implemented Data warehouse solutions in AWS Redshift by migrating the data to Redshift from S3.
  • Automated the jobs and data pipelines using AWS Step Functions, AWS Lambda and configured various performance metrics using AWS Cloud watch.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Experience working with big data and real time/near real time analytics using the big data platforms like Hadoop and Spark using Python.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Worked in writing Hadoop Jobs for analyzing data like Text format files, sequence files, Parquet files using Hive and Pig.
  • Worked on analyzing Hadoop cluster and different Big Data components including Pig, Hive, Spark, Impala, and Sqoop.
  • Developed Spark code using Python and Spark-SQL for faster testing and data processing.
  • Monitored metrics, created backend reports and dashboard on Tableau.
  • Good working knowledge of implementing solutions using AWS services like (EC2, S3, and Redshift).
  • Developed predictive analytics using PySparkAPIs.
  • Involved in working of big data analysis using Pig and Hive.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Spark SQL to process the huge amount of structured data.
  • Extracted the data from MySQL and AWS RedShiftinto HDFS using Sqoop.
  • Worked on tools like Flume, Sqoop, Hive and PySpark.
  • Expert in performing business analytical scripts using HiveQL.

Environment: Big Data, Spark, Yarn, Hive, Flume, Pig, Python, Hadoop, AWS, Databases, RedShift.

Confidential - New York City, NY

Big Data Engineer

Responsibilities:

  • Worked as Big Data Developer in the team dealing with Firm's proprietary platform issues, providing data analysis for the team as well as developing enhancements.
  • Involved in working with large sets of big data in dealing with various security logs.
  • All the data was loaded from relational databases to Hdfs using Sqoop and handled the data in the form of flat files from different vendors, text data, xml data, etc.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
  • Developed Map Reduce jobs for data cleaning and manipulation.
  • Involved in migration of data from existing RDBMS (MySQL and SQL server) to Hadoop using Sqoop for processing and analyzing the data. Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Azure cloud.
  • Configured Azure Container Registry for building and publishing Docker container images and deployed them into Azure Kubernetes Service (AKS
  • Performed file system management and monitoring on Hadoop log files.
  • Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Experience in working with Cloudera (CDH4 &CDH5), Horton Works, Amazon EMR, Azure HDINSIGHT on multi-node cluster.
  • Wrote Pig and Hive jobs to extract files from MongoDB through Sqoop and placed in HDFS.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Worked on Microsoft Azure toolsets including Azure Data Factory Pipelines, Azure Data bricks, Azure Data Lake Storage.
  • Involved in developing data frames using Spark SQL as needed.
  • Wrote Hive join queries to fetch information from multiple tables and Map Reduce jobs to collect data from Hive.
  • Used Hive to analyze the partitioned & bucketed data and compute various metrics for reporting on the dashboard.
  • Developed the code for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in configuring and maintaining cluster, and managing & reviewing Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Analyzed large amounts of data to determine optimal way to aggregate and reported the findings.
  • Explored with the Spark framework, methods for improving the performance and optimization of the existing jobs in Hadoop using Spark Context, Spark-SQL, Data Frames, and YARN.

Environment: MySQL, SQL Server, Python, Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Impala, Flume, PySpark, Spark SQL.

Confidential

Java Developer

Responsibilities:

  • Developed the Web Interface using Struts, Java Script, HTML and CSS.
  • Extensively used the Struts controller component classes for developing the applications.
  • Involved in developing business tier using stateless session bean (acts as a Session Facade) and Message driven beans.
  • Developed application screens using HTML5, CSS3 and JavaScript.
  • Used JDBC and Hibernate to connect to the database, using Oracle.
  • Data sources were configured in the app server and accessed from the DAO’s through Hibernate.
  • Design patterns of Business Delegates, Service Locator and DTO are used for designing the web module of the application.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Involved in developing database specific data access objects (DAO) for Oracle.
  • Used CVS for source code control and JUNIT for unit testing.
  • Used Eclipse to develop entity and session beans.
  • The entire application is deployed in WebSphere Application Server.
  • Followed coding and documentation standards.

Confidential

Junior Developer

Responsibilities:

  • Involved in the life-cycle of the project, i.e., requirements gathering, design, development, testing and maintenance of the database.
  • Created Database Objects like Tables, Stored Procedures, Views, Clustered and Non-Clustered indexes, Triggers, Rules, Defaults, User defined data types and functions.
  • Performed and fine-tunedstored procedures and SQL Queries and User Defined Functions using Execution Plan for better performance.
  • Created and scheduled SQL jobs to runSSIS packages daily, using MS SQL Server Integration Services (SSIS).
  • Performed query optimization and tuning, debugging and maintenance of stored procedures.
  • Database Creation, Assigning Database Security and Standard data modelling techniques.
  • Performed troubleshooting operations on the production servers.
  • Monitored, tuned and analysed database performance and allocated server resources to achieve optimum database performance.
  • Worked on running integrated testing using JUNIT and XML for building the data structures required for the Web Service.
  • Developed user interface using JSP, HTML, CSS and JavaScript
  • Creating Staging Database and Import Tables in MS SQL Server.
  • Loading the data in the systems using Loaderscripts, Cursors, Stored Procedures.
  • Testing the data in Test Environment, client validation, issues resolution.
  • Developing reports onSSRS on SQL Server (2012).
  • Analysis, design and development of application based on J2EE and design patterns.

Environment: Java, J2EE, JDK, Java Script, XML, Struts, JSP, Servlets, JDBC, EJB, Hibernate, Web services, JMS, JSF, JUnit, CVS, IBM Web Sphere, Eclipse, Oracle 9i, Linux.

We'd love your feedback!