We provide IT Staff Augmentation Services!

Sr. Spark/aws Developer  Resume

5.00/5 (Submit Your Rating)

Freeport, ME

SUMMARY:

  • Professional Software developer with 8+ years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors like Banking, Financial, Auto Insurance, Health Care expertizing in Bigdata analyzing Frame works and Java/J2EE technologies
  • 4+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, Solr and Zookeeper.
  • Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
  • Expertise in developing Pig Latin scripts and using Hive Query Language.
  • Developed Customized UDFs and UDAF’s in java to extend HIVE and Pig core functionality.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Worked on GUI Based Hive Interaction tools like Hue, Karmasphere for querying the data.
  • Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
  • Working knowledge in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers, etc.
  • Experience in OLTP and OLAP design, development, testing and support of enterprise Data warehouses.
  • Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioner’s to deliver the best results for the large datasets.
  • Good knowledge on build tools like Maven, Log4j and Ant.
  • Experienced in migrating data from different sources using PUB-SUB model in Redis, and Kafka producers, consumers and preprocess data using Storm topologies.
  • Had competency in using Chef, Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, Hudson Bambino for automated builds.
  • Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Proficient in developing, deploying and managing the Solr from development to production. 
  • Experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari, Ganglia and Nagios.
  • Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications. 
  • Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX and had a bird’s eye view on React Java Script Library.
  • Experience in maintaining an Apache Tomcat MYSQL, LDAP, LAMP, Web service environment.
  • Ability to work with Onsite and Offshore Teams. 
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
  • Generated various kinds of knowledge reports using Power BI and Qlik based on Business specification.
  • Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib(Spark).
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall
  • Good understanding of all aspects of Testing such as Unit, Regression, Agile, White & Black-box.
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Solr, Impala, Oozie, ZooKeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, Scala, SQL, JavaScript and C/C++

No SQL Databases: Cassandra, MongoDB, HBase and Amazon Dynamodb.

Java Technologies: JSE, Servlets, JavaBeans, JSP, JDBC, JNDI, AJAX, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs and JSON

Development / Build Tools: Eclipse, Jenkins, Git, Ant, Maven, IntelliJ, JUNIT and log4J.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, Red Hat LINUX, Mac os and Windows Variants

Testing: Hadoop MRUNIT Testing, Hive Testing, Quality Center (QC)

ETL Tools: Talend, Informatica, Pentaho, Ab Initio

PROFESSIONAL EXPERIENCE:

Confidential, Freeport, ME

Sr. Spark/AWS Developer 

Responsibilities:

  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. 
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra .
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates .
  • Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.
  • Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes
  • Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic. 
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig
  • Worked with Log4j framework for logging debug, info & error data.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG
  • Developed Custom Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing the data.
  • Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
  • Used Amazon DynamoDB to gather and track the event based metrics.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE. 
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines
  • Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration
  • Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Modified ANT Scripts to build the JAR's, Class files, WAR files and EAR files.
  • Generated various kinds of reports using Power BI and Tableau based on Client specification.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively
  •  Worked with SCRUM team in delivering agreed user stories on time for every Sprint

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, MapR, HDFS, Hive, Pig, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, NIFI, Cassandra and Agile Methodologies.

Confidential, Naperville, Illinois

Hadoop/Spark Developer

Responsibilities:

  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on implementing Spark Framework a Java based Web Frame work.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Written java code to format XML documents, uploaded them to Solr server for indexing.
  • Experienced on Apache Solr for indexing and load balanced querying to search for specific data in larger datasets and implemented Near Real Time Solr index on Hbase and HDFS.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD. 
  • Extracted and restructured the data into MongoDB using import and export command line utility tool. 
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from the event payload in flume.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive
  • Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume. 
  • Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system(LFS) to HDFS
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format. 
  • Created Partitioned Hive tables and worked on them using HiveQL. 
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Installed, Configured TalendETL on single and multi-server environments.
  • Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade. 
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.  
  • Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server. 
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System.  
  • Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search
  • Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
  • Written and Implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency. 
  • Experienced knowledge over designing Restful services using java based API’s like JERSEY.
  • Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings. 
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Oracle, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, Scala, Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies

Confidential, Orrville, Ohio

Big Data Hadoop Consultant

Responsibilities:

  • Experienced in migrating and transforming of large sets of Structured, semi structured and Unstructured RAW data from HBase through Sqoop and placed in HDFS for further processing.-
  • Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file formats
  • Written Java program to retrieve data from HDFS and providing it to REST Services. 
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources. 
  • Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa. 
  • Implemented partitioning, bucketing in Hive for better organization of the data .
  •   Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code
  • Created HBase tables, used HBase sinks and loaded data into them to perform analytics using Tableau.
  • Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster. 
  • Created multiple Hive tables, running hive queries in those data, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
  • Experienced in running batch processes using Pig Latin Scripts and developed Pig UDFs for data manipulation according to Business Requirements
  • Hands on experience in Developing optimal strategies for distributing the web log data over the cluster, importing and exporting of stored web log data into HDFS and Hive using Scoop
  • Developed several REST web services which produces both XML and JSON to perform tasks, leveraged by both web and mobile applications. 
  • Developed Unit test cases for Hadoop M-R jobs and driver classes with MR Testing library.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database
  • Managed and scheduled several jobs to run over a time on Hadoop cluster using oozie.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts). 
  • Used MAVEN for building jar files of MapReduce programs and deployed to cluster.
  • Involved in final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
  • Installed Qlik Sense Desktop 2.x and developed applications for users and made reports using Qlik view.
  • Configured different Qlik Sense roles and attribute based access control. 
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Helped in design of Scalable Big Data Clusters and solutions and involved in defect meetings.
  • Followed Agile Methodology for entire project and supported testing teams,

Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, ANT 1.6, Elastic Search, Hive, Pig, Oozie, Java (jdk 1.5), JSON, Eclipse, Qlik view, Qlik Sense, Oracle Database, Jenkins, Maven, Sqoop.

Confidential

Java Developer 

Responsibilities:

  • Involved in Requirements Analysis, and design an Object-oriented domain model.
  • Involvement in the detailed Documentation, written functional specifications of the module.
  • Involved in development of Application with Java and J2EE technologies.
  • Develop and maintain elaborate services based architecture utilizing open source technologies like Hibernate, ORM and Spring Framework.
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Developed server-side services using Java multithreading, Struts MVC, Java, EJB, Spring, Web Services (SOAP, WSDL, AXIS).
  • Used Micro Services as communicating medium for different APIs, processed large number of small Processes.
  • Involvement in creating and configuring of build files using Ant.
  • Development of Controller Servlet a Framework component for Presentation.
  • Investigated MVC framework technologies including JSF based (ICEfaces,RichFaces) and to implement the MVC architecture of the product.
  • Developed application using JSF, myFaces, Spring, and JDO technologies which communicated with Mainframe software.
  • Designing, Development and Implementation of JSPs in Presentation layer for Submission, Application, reference implementation.
  • Developed JSP pages using Jdeveloper9.0.5, such as HTML, Bean Tags, Logic Tags and Template Tags.
  • Practiced and evangelized agile development approaches. Wrote ANT scripts and assisted with build and configuration management processes.
  • Development of JavaScript for client end data entry validations and Front-End Validation.
  • Deployed Web, presentation and business components on Apache Tomcat Application Server.
  • Generating schema difference reports for database using toad
  • Developed PL/SQL procedures for different use case scenarios
  • Built the report module on reports based from Crystal reports.
  • Involvement in post-production support, Testing and used JUNIT for unit testing of the module.

Environment: Java/J2EE, JSP, XML, Spring Framework, Hibernate, Eclipse(IDE), Micro Services, Java Script, Struts, Tiles, Ant, SQL, PL/SQL, Oracle, Windows, UNIX, Soap, Jasper reports.

Confidential

Junior Java Developer

Responsibilities:

  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance. Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component based development of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Developed batch jobs which will run on specified time to implement certain logic in java platform.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, Jboss application Server, Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS

We'd love your feedback!