We provide IT Staff Augmentation Services!

Bigdata/ Spark Engineer Resume

3.00/5 (Submit Your Rating)

Austin, TX

SUMMARY:

  • Big data Engineer with 7.5 years of IT experience including 4 years in Big Data and Analytics field, developing E2E Data pipelines to perform batch and Real - Time/Stream analytics on structured and unstructured data.
  • Expertise in designing scalable Big Data solutions, data warehouse models on large-scale distributed data, performing wide range of analytics to measure service performance.
  • Experienced in software design, object oriented programming, scripting, SDLC and Machine learning models.
  • Enormous experience in Software Development Life Cycle (SLDC) in requirement analysis and system design, programming, testing, implementation, and application maintenance.
  • Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka.
  • Solid understanding of data processing patterns, distributed computing and in building applications for real-time and batch analytics.
  • Strong programming skills in design and implementation using Java/J2EE, C# .Net, SQL, and other scripting Languages.
  • Data Ingress and Egress using Sqoop and Azure Data Factory from HDFS to Relational Database Systems and vice-versa.
  • Developed solutions using Spark SQL, Spark streaming, Kafka to process web feeds and server logs.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop Map Reduce, Hive, Sqoop, Pig, HDFS, HBase, Zookeeper, Oozie, and Flume.
  • Good knowledge on NoSQL Data bases such as HBase, Cassandra, Redis and using SPARK streaming for real time stream processing of data into the cluster.
  • Experience with multiple Hadoop file formats like Avro, Parquet, ORC, and JSON etc.
  • Developed scalable applications using SOAP, RESTful Web Services, HTTP and JMS.

TECHNICAL SKILLS:

Programming Languages: Scala, Java, C# .net, J 2EE, Python, PowerShell & UNIX Scripting, PL/SQL

Big Data Eco System: Azure Data Lake, Hadoop, MapReduce, Pig, Hive, Sqoop 1.4.4, Zookeeper 3.4.5, Yarn, Spark, Storm, Impala, Kafka, Apache Drill, HBase, Cassandra, SolR.

Distributions: Microsoft HDInsight, CDH4&5, MapR V7, Hortonworks HDP 2.0

Databases: Oracle 10g/9i/8i, MS SQL Server 7.x/2000/2003, DB2, Netezza, Sybase, My SQL, Vertica.

Middleware: Web services, Spring MVC, Java beans, EJB, Servlets, JSP, ESB

Web services: SOAP, RESTful, XML-RPC and WSDL

Version control: TFS, GIT, IBM Clear Case, JIRA, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Austin TX

BigData/ Spark Engineer

Responsibilities:

  • Design and Document the new architecture and development process to convert Database and data warehouse models in to Hadoop based systems.
  • Develop the Large-scale data processing pipelines to handle Peta bytes of transaction data and egress to analytical sources.
  • Worked on the Hortonworks based Hadoop platform deployed on 120 nodes cluster to build the Data Lake, utilizing the Spark, Hive and NoSQL for data processing.
  • Worked on Apache Spark 2.0 Utilizing the Spark SQL and Streaming components to support the intraday and real-time data processing.
  • Used the Scala 2.12 and Java 7 Programming to build the UDFs and supporting utilities for the standardizing the data pipelines.
  • Developed Batch and streaming workflows with in-built Stone branch scheduler and bash scripts to automate the Data Lake systems.
  • Implemented the Spark Best practices to efficiently process data to meet ETAs by utilizing features like partitioning, resource tuning, memory management and Check pointing features.
  • Resolved many open Spark and Yarn resource management issues, OOM errors, Shuffle exceptions, heap space errors, Null Pointer Exceptions and schema compatibility in Spark.
  • Worked on multiple data formats like ORC, Parquet, Avro, JSON and XML etc.
  • Worked on converting the multiple SQL Server and Oracle stored procedures into Hadoop using Spark SQL, Hive, Scala and Java.
  • Extensively used AWS services S3 for storing data and EMR for resource intensive jobs.
  • Built and managed on-demand AWS Clusters using Qubole to process the daily web feeds.
  • Worked on building the Hybrid data model utilizing the AWS S3 Storage and HDFS storage from the Hortonworks Cluster.
  • Used GitHub as code repository and version control system.

Environment: Apache Spark 2.0, Hadoop Stack, Scala SDK, Java, Spark SQL, Hive, SQL Server, Data Warehouse, Tableau, AWS S3, EMR, Qubole, IntelliJ Idea, REST APIS, Jupyter Notebook.

Confidential, Cambridge, MA

Software Engineer, Big Data

Responsibilities:

  • Extracting, Parsing, Cleaning and ingesting the incoming web feed data and server logs into the HDInsight and Azure DataLake Store by handling structured and unstructured data.
  • Collecting and processing the Peta bytes of Logs generated in the cloud by the Web servers, application servers and store in the usable format by applying indexing and partitioning.
  • Perform the Analytics on the Azure ML product by generating the metrics and KPI’s to get trend analysis, User churn rate, Service SLA etc.
  • Programming in C#, U-SQL, Hive, writing the scope scripts in Data Lake and HDFS to structure the Peta bytes of unstructured data stored in the Azure DataLake (Cosmos) big data system.
  • Provisioning Hadoop and Spark Clusters on the Azure HDInsight, to build the On-Demand data warehouse to process the PBs of data and provide the datasets to the data scientists.
  • Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines using Azure Data Factory.
  • Worked with the windows PowerShell and Azure scheduler to automate the data ingestion and transformation jobs on daily and monthly schedules.
  • Collecting the requirements from the product managers and develop the efficient schema to build the data warehouse to ease the further analytics.
  • Worked on technologies like YARN, Hive, HBASE, Pig, MapReduce, REST, .net, Power BI, Excel, Power Pivot, SQL Server, Visual Studio etc.
  • Developed reporting solutions using Tableau and Power BI by connecting big data systems with ODBC drivers and pushing the data to SQL Server.
  • Used the Jupyter and IPython notebooks to execute the python modules, which generates the Fact data from database and other storage systems.
  • Used ORC, Parquet file formats on HDInsight, Azure Blobs and Azure tables to store for raw data.
  • Worked with SSAS cubes, Excel and other UI frameworks to implement the visual analytics.
  • Developed and automated reporting models for live site issues, alerts and business metrics to visualize charts on Power BI.

Environment: C#, Java, Distributed computing, Hadoop stack, Spark SQL, Hive, Pig, Azure Data Lake Analytics, Data Warehouse, SQL server, Power BI, Tableau, HDInsight, Azure Data Factory, Azure PowerShell, Azure ML studio, etc.

Confidential, Charlotte, NC

Big Data Engineer

Responsibilities:

  • Responsible for the system, workflow design and architecture for the stream processing architecture.
  • Worked on MapR V7 distribution with 46-node cluster handling the 500 TB of data each day.
  • Used the Sqoop and SQL Plus client to Import the data from Oracle and MySQL DB into Hadoop.
  • Created the Hive staging tables to load source tables in RDBMS source data Parquet format.
  • Further used the Pig and Hive scripts to do the transformations, Joins and load the optimized tables into the HIVE data mart. Also, written PIG UDF for generating the trade sequence number.
  • Loaded the delta files and Historical files into Hive in the Batch mode, automated the process using KRON jobs.
  • Involved in connecting and loading the results in Tableau from the Hive and HBase.
  • Built the Efficient Real-time data processing pipeline using Kafka, spark streaming and HBase for processing the incoming trades instantly.
  • Used the Kafka to load the log files, trade XMLs into Hadoop and implemented the Lambda architecture.
  • Written the Kafka producer in Java, to consume the messages from JMS Queues and used the AVRO Serialization to send the stream into Kafka brokers for partitioning and distributing in cluster.
  • Written the Kafka-Spark Streaming module acting as consumer to Kafka which executes the business logic on the trades using spark DStreams and RDD methods.
  • Implemented the HBase Java API to convert the schema of streaming data from spark and create the HBase table and load the values.
  • Troubleshoot Single Point of Failure (SPOF) of Hadoop Daemons and recovery procedures.  
  • Wrote Hive Generic UDF's to perform business logic operations at record level. 
  • Extensively worked on creating combiners, Partitioning, Distributed cache to improve the performance of Map Reduce jobs. 
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and used Zookeeper to coordinate the clusters.
  • Used the Apache Drill to connect the HBase with Tableau and generate the real-time reports.

Environment: MapR V7, Hadoop, Spark Streaming & SQL, UNIX Shell Scripting, Kafka, Flume, Zookeeper, Hive, Pig UDF, HBase, Java SE 7, Oozie, Apache Drill, Tableau, Parquet files, Avro Serialization.

Confidential, Santa Fe

Java/J2EE Developer

Responsibilities:

  • Worked in EDBC and correspondence teams, responsible for implementing the business policies, delivering messages to clients and fixed bugs in Java and Corticon by My Eclipse Debugging.
  • Responsible for implementing the business policies using Corticon Business Rule Management System (BRMS) tool and coding them in Java.
  • Developed web services using JAX-RS and JAX-WS models that provide services like add, update and inquire information for any individual in ASPEN.
  • Configured and deployed the entire application in IBM WebSphere Application Server in different environments.
  • Responsible for coding Java Batch process and Restful Web Services and developed DAOs for data access from database.
  • Extensively used the JSP’s and JavaScript to make any necessary changes to the functionality of application when dealing with new CRs.
  • Extensively used the SQL queries using SQL Developer for getting details from DB to resolve the issues in functionality in the application.
  • Extensively used MVC Framework, Fast4j tools developed by Deloitte, in entire development process.
  • Used IBM Clear Case as code repository, version control and used IBM Clear Quest for defect tracking and bug fixing in the system and assigning the work requests.
  • Developed stored procedures, triggers to create batch jobs and to trigger events per the business requirements.
  • Interacting with the clients and QA in discussing the specific functionality and guiding them for testing application with test scripts prior moving fix to the production environment.

Environment: Java/J2EE, EJB, Fast4j MVC Framework, Corticon (BRMS), Oracle 11g, SQL Developer, Eclipse IDE, JSP, JavaScript, IBM Clear case, IBM Clear Quest, SOAP Web services.

Confidential

Java Developer

Responsibilities:

  • Involved in design and development of architecture of the application using MVC Model 2 design patterns using JSP and Servlets.
  • Designed and developed interactive static HTML screens as screen level prototype.
  • Developed JavaScript for client side validation and developed Cascading Style Sheet (CSS).
  • Involved in design and development of JSP based presentation layer for web based account inquiry using Struts custom tags, DHTML, HTML, and JavaScript.
  • Used Servlets 2.3 for processing business rules.
  • Developed server side application, which handles the database manipulation with the back-end Oracle DB using JDBC.
  • Deployed the application components into Apache Tomcat web server.

Environment: JDK 1.4, Servlets 2.3, JSP 1.2, JDBC, JavaScript, CSS, HTML, DHTML, Ant, Log4j, JUnit, Apache Tomcat web server, Oracle8i.

We'd love your feedback!