We provide IT Staff Augmentation Services!

Sr. Big Data/spark Developer Resume

4.00/5 (Submit Your Rating)

Addison, TX

SUMMARY:

  • Big Data developer with over 8 years of professional IT experience, which includes 4 years’ experience in the field of Big Data.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
  • In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
  • Extensive knowledge of Hadoop architecture and its components.
  • Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
  • Exposure to Data Lake Implementation using Apache Spark.
  • Developed Data pipe lines and applied business logics using Spark.
  • Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Used Scala and Python to perform RDD transformations in Apache Spark.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Expertise in performing real time analytics on big data using HBase and Cassandra .
  • Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Hands-on experience in tools like Oozie and Airflow to orchestrate jobs.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Expertise in Cluster management and configuring Cassandra Database.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
  • Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Skilled in configuring Relational Database Service.
  • Built AWS secured solutions by creating VPC with public and private subnets.
  • Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
  • Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Expertise working in JAVA J2EE, JSP, Java Eclipse, Java Beans, EJB, Servlets.
  • Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Experience working with Spring and Hibernate frameworks for JAVA.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
  • Excelled in using version control tools like PVCS, SVN, VSS and GIT.
  • Used web-based UI development using JavaScript, jquery UI, CSS, jquery, HTML, HTML5, XHTML and JavaScript .
  • Developed stored procedures and queries using PL/SQL.
  • Development experience in RDBMS like Oracle, MS SQL Server, Teradata, and MYSQL .
  • Experience with best practices of Web services development and Integration (both REST and SOAP ).
  • Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
  • Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
  • Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
  • Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB, Redshift

Languages: C, C++, Java, Scala, Python, HTML, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Application Servers: WebSphere, WebLogic, JBoss, Tomcat

Cloud Computing Tools: Amazon AWS, (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch), Google Cloud

Databases: Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2

Build Tools: Jenkins, SBT, Maven, ANT

Business Intelligence Tools: Tableau, Micro Strategy

Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, NetBeans

Development Methodologies: Agile Scrum, Waterfall

PROFESSIONAL EXPERIENCE:

Sr. Big Data/Spark Developer

Confidential, Addison, TX

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Implemented Hive UDFs to solve and improve performance.
  • Migrated existing Hive processes to Spark to improve performance drastically.
  • Responsible for fetching real time data using Kafka and processing using Spark Streaming.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Worked on loading AVRO/PARQUET/TXT files in Spark Framework using Java/Scala language and created Spark Data frame and RDD to process the data and save the file in parquet format in HDFS to load into fact table using ORC Reader.
  • Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
  • Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
  • Developed traits and case classes in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Experienced in using Data Stax Spark Connector which is used to store the data in Cassandra database from Spark.
  • Involved in NoSQL (Datastax Cassandra ) database design, integration, implementation, written scripts and invoked them using CQLSH .
  • Well versed in using Data Manipulations, Compactions, tombstones in Cassandra.
  • Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Created Spark RDD’S on that data stored in Amazon S3 to perform transformation and actions.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud ( EC2 ) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
  • Experience in working on Production Server's on Amazon Cloud (S3, EBS, EC2, Lambda and Route53).
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
  • Wrote Java code to format XML documents and upload them to Solr server for indexing.
  • Experienced with Faceted Reader search and Full Text Search using Solr.
  • Configured work flows that involves Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark-Core, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Solr, Impala, Cassandra, Cloudera, Oracle 10g, Linux.

Hadoop Developer

Confidential, Carlsbad, CA

Responsibilities:

  • Involved in review of functional and non-functional requirements (NFR’s).
  • Responsible for Collection and aggregation of large amounts of data from various sources and ingested into Hadoop file system (HDFS) using Sqoop and Flume , the data was transformed to business use cases using Pig and Hive.
  • Developed and maintained data integration programs in RDBMS and Hadoop environment with both RDBMS and NoSQL data stores for data access and analysis
  • Responsible for coding MapReduce program to develop multiple Map Reduce jobs in Java for data cleaning and processing.
  • Responsible for testing and debugging the Map Reduce programs.
  • Experienced in implementing Map Reduce programs to handle semi/unstructured data like json, XML, log files.
  • Worked on importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Used Spark-SQL hive context to Load data into Hive tables and Written queries to fetch data from these tables.
  • Developed Pig scripts and UDF's as per the Business logic.
  • Used Pig to import semi-structured data from Avro files to make serialization faster.
  • Used Oozie work flows and Java schedulers to manage and schedule jobs on a Hadoop cluster.
  • Created the Hive external tables using Accumulo connector.
  • Indexed documents using Elastic Search.
  • Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, performance analysis and permission checks.
  • Worked with Kerberos authentication and integrated it to the Hadoop cluster to establish a more secure network communication on the cluster.
  • Expertise in implementing Spark Applications using Scala, Python (Pyspark) and Spark SQL for faster processing of data.
  • Developed Multi-hop flume agents by using Avro Sink to process web server logs and loaded them into MongoDB for further analysis.
  • Collected and aggregated large amounts of weblogs and unstructured data from different sources such as web servers, network devices using Apache Flume and stored the data into HDFS for analysis.
  • Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
  • Implemented transformations and data quality checks using Flume Interceptor.
  • Responsible for using Flume sink to remove the data from Flume channel and to deposit in No-SQL database MongoDB.
  • Well-versed in using MongoDB CRUD (Create, Read, Update and Delete) operations.
  • Worked with Oozie and Zookeeper to Schedule workflow and orchestrate Hive, Pig and MapReduce jobs.
  • Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, Flat Files and XML using Talend.
  • Good experience in using python Scripts to handle data manipulation.
  • Generated python reports on claims data using Tableau
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Configured Hadoop clusters and coordinated with Bigdata Administrators for cluster maintenance.
  • Experienced in using agile approaches including Test-Driven Development, Extreme Programming, and Agile Scrum.

Environment: Hortonworks HDP, Hadoop, Spark, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, Python, Talend, MapReduce, HDFS.

Hadoop Developer

Confidential, Peoria, IL

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce
  • Extract data from everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
  • Installed and configured Hadoop, MapReduce, and HDFS clusters
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
  • Identified issues on behavioral patterns and analyzed them using Hive queries.
  • Analyze and transform stored data by writing MapReduce and Pig jobs based on business requirements
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and import to HDFS
  • Using Oozie , developed workflow to automate the tasks of loading the data into HDFS and pre-process with Pig scripts
  • Integrated Map Reduce with HBase to import bulk data using MR programs
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)
  • Installed a cluster, commissioned & decommissioned data node, performed name node recovery, capacity planning, and slots configuration adhering to business requirements

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.

Java Developer

Confidential

Responsibilities:

  • Developing rules based on different state policy using SpringMVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, Log4j
  • Gathered requirements, developed, implemented, tested and deployed enterprise integration patterns (EIP) based applications using Apache Camel, JBoss Fuse
  • Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies
  • Designed and developed using web service framework - Apache CX
  • Worked on Active MQ messaging service for integration
  • Worked with SQL queries to store and retrieve the data in MS SQL server
  • Performed unit testing using JUnit
  • Worked on continuous integration using Jenkins/Hudson
  • Participated in all phases of development life cycle including analysis, design, development, testing, code reviews and documentations as needed.
  • Involved in configuring Struts, Tiles and developing the configuration files
  • Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.

Environment: Spring Framework, Spring MVC, spring web flow, JSP, JSTL, SOAP UI, rating Engine, IBM Rational Team, Oracle 11g, XML, JSON, Ajax, HTML, CSS, IBM WebSphere Application Server, RAD with sub-eclipse, jenkins, maven, SOA, SonarQube, Log4j, Java, JUnit

Java Developer

Confidential

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio
  • Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
  • Developing Enterprise Application using SpringMVC, JSP, MySql
  • Working on developing client-side Web Services components using Jax-Ws technologies
  • Extensively worked on JUnit for testing the application code of server-client data transferring
  • Used SVN as a repository for managing/deploying application code
  • Developed and enhanced products in design and in alignment with business objectives
  • Involved in the system integration and user acceptance tests successfully
  • Developed front end using JSTL, JSP, HTML, and Java Script
  • Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
  • Used Oracle 10g as the backend database and written PL/SQL scripts.
  • Maintained and modified system based on user feedbacks using the OO concepts
  • Implemented database transactions using Spring AOP & Java EE CDI capability
  • Enriched organization reputation via fulfilling requests and exploring opportunities
  • Business Analysis, Reporting Service and Integrate to Sage Accpac (ERP)
  • Developing new and maintaining existing functionality using SPRING MVC, Hibernate
  • Developed test cases for integration testing using JUnit
  • Creating new and maintaining existing web pages build in JSP, Servlet .

Environment: Java, SpringMVC, Hibernate, MSSQL, JSP, Servlet, JDBC, ODBC, JSF, Servlet, NetBeans, GlassFish, Spring, Oracle, MySQL, Sybase, Eclipse, Tomcat, WebLogic Server

We'd love your feedback!