Sr.Spark Developer Resume PA - Hire IT People

PROFESSIONAL SUMMARY:

Over 8+ Years of Technical Expertise in all phases of SDLC (Software Development Life Cycle)which includes Professional IT Experience in Analyzing, Designing, Building, highly distributed products and working with BigData/ Hadoop, NOSQL and Java/J2EESoftware Practices.
Worked on Various Diversified Enterprise Applications concentrating in Financial, HealthCare and BankingSectorsas a Big Data Engineer with Good Understanding of Hadoop Frameworks and various data analyzing tools.
Over 4+ Years of Experience working with Big Data and Hadoop Ecosystem with expertise with Big Data Ecosystem Components HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop,Flume,oozie, Zookeeper, Avro, Solr, Spark, Kafka, Strom, Cassandra, Impala, Greenplum andMongoDB.
Experience in importing streaming logs and aggregating the data to HDFS through Flume.
Experience in handling various tools for Big Data analysis using Pig, Hive, Sqoop and Spark.
Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
Developed Apache Spark jobs using Scala in test Environment for faster data processing and use Spark SQL for Querying.
Experience in storing and retrieval of documents in ApacheSolr.
Used Oozie Scheduler system to automate the pipeline workflow and orchestrate Hive,Pig and MapReduce jobs that extract the data on a timely manner.
Good Experience in writing Spark applications using Python and Scala.
Experience building data processing pipeline using Kafka and Storm to ingest data into HDFS.
Experience with Testing MapReduce programs using MRUnit and EasyMock.
Combined Pig with Hive to create processing pipelines which can scale quite easily in place of writing low - level MapReduce jobs.
Experience on working with different File formats like FLATFILES, ORC, AVRO and JSON.
Experience in deploying NiFi Data Flow in production team and integrating data from multiple sources like Cassandra, MongoDB.
Developed Spark streaming programs in Scala to transform and store the data into HDFS on the fly.
Hands on knowledge creating Amazon EC2 instances, S3 buckets on AmazonEMR.
Experienced in Hadoop data testing, data validation and data quality checks.
Used Pig to extract, write complex data transformations, cleaning and processing of large data sets and storing data in HDFS.
Worked on streaming data processing frameworks like SparkStreaming and Storm.
Widely used Spark transformations to normalize data coming from real time data sources.
Configured Kafka producers and created consumer groups to publish and subscribe stream of records in a distributed environment in a fault-tolerant way.
Involved in converting Cassandra/Hive/SQL queries into Spark Transformations using RDD’s and Scala.
Hands on experience in Sequencefiles, Combiners, Counters, DynamicPartitions, Bucketing for best practice and performance improvement.
Migrated the traditional MapReduce jobs to Spark jobs to improve the Speed of Data.
Working Knowledge with Talend, Informatica, Maven, Git Enterprise, Jenkins, Contol-M, Cron, Autosys, Putty and WinSCP.
Worked with Core Java and J2EE Technologies as Servlets, JSP, Collections, Multi-Threading, ExceptionHandling, EJB, JDBC and WebServices.
Extensive Experience in working with SQL and NOSQL Databases such as MySQl, DB2, MongoDB, Cassandra.
Setting up Solr schema, data import handler to synchronize data to SQL database, Query suggesters and spell checking for approximate searches.
Expertise with Cloud Technologies like Nifi (transformations) and AWSS3 buckets.
Developed a data pipeline using Kafka and SparkStreaming to store data into HDFS and performed the real-time analytics on the incoming data.
Experience in Configuration, Deployments and Managing Different Hadoop Distributions like Cloudera, EMR, HortonWorks (HDP) and Good knowledge on Mapper.
Expert in developing applications using Servlets, Hibernate, Spring Frameworks.
Exploring with Spark various modules of Spark and working with DataFrames, RDD and SparkContext.
Expertise in using version control like GITHUB and SVN.
Actively Collaborated with Team members on Daily Scrum meetings to ensure smooth progress in development and on-time completion of sprints.
Experience in implementation of the SDLC process with different project management methodologies including Agile.

TECHNICAL SKILLS:

Big Data: Hadoop, Big Data, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, oozie, Impala,Cassandra, MongoDB, Kafka, Spark, Nifi

Languages: SQL, PL/SQL, HTML, C, XML,Java, Scala, Pig Latin, HiveQL, python, Unix Shell Scripting

Databases: SQL, NOSQL, MySQL, Teradata, MS SQL, Oracle, HBase, Cassandra, MongoDB, Neo4j

IDE &ETL Tools: Eclipse, NetBeans, Intellij, Maven, Informatica, IBM DataStage,Talend, Jenkins

AWS Services: Red Shift, EMR, EC-2, ELB, RDS, S3, CloudWatch, SNS, SQS, EBS.

Other Tools: Putty, WinSCP, Stream Weaver, Amazon AWS, Hortonworks, Cloudera,Azure.

Version Control: GitHub, SVN, CVS

Methodologies: Agile, Scrum, Waterfall

Operating Systems: UNIX, Windows, iOS, LINUX

PROFESSIONAL EXPERIENCE:

Confidential, PA

Sr.Spark Developer

Responsibilities:

Understanding Business needs, Analyzing Functional Specifications and map those to Development and Designing.
Worked with Spark for improving Performance and optimization of the existing algorithms in Hadoop using SparkContext, SparkSQL, DataFrames, RDD’s.
Used Amazonsimple storage service(s3), AmazonElasticMapReduce(EMR) and Amazoncloud(EC2).
Involved in Data ingestion into HDFS using Sqoop for full load and Flume for Incremental load on variety of sources like web server, RDBMS and Data API’s.
Developed the configuration files for Flume source, Channel and sink for creating pipelines from various data sources into HDFS.
Consumed Real time and near real time data coming from various data sources through kafka data pipelines and applied various transformations to normalize the data which further stored in HDFS data lake.
Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further Analysis.
Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and loaded into HDFS.
Developed oozie workflows and they are scheduled through a scheduler on a monthly basis.
Developed workflow in oozie to automate the tasks of loading the data into HDFS and Pre-processing with Pig.
Having experience in customizing the fusion index pipeline and Query Pipeline and wrote own stages to manipulate the Solr queries.
Involved in creating HiveORCTables, Loading the data into it and writing HiveQueries to analyze the Data.
Extensively worked with spark Data frames for ingesting data from flat files into RDD’s to transform unstructured data and structured data.
Created the SparkSQL context to load data from Hive tables into RDD’s for performing complex queries and analytics on data present in data lake.
Used Spark transformations for data wrangling and ingesting the real-time data of various file formats.
Very Good understanding of Partitions, bucketing concepts in Hive and Designed both managed and External tables in Hive to Optimize Performance.
Monitored the Hadoop cluster continuously using Cloudera manager and written the shell scripts for automation of mails to Business team.
Expertise in creating TWS Jobs and Job streams and automate them as per schedule.
Involved in data transfer from Hive tables into Cassandra file system for real time exploration.
Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
Performance tuning using Partitioning, bucketing of Hive tables.
Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
Configured the Hive Metadata and CatalogD to make it possible for Impala daemon to pull data using Hive metadata.
Good understanding of DAG cycle for entire Spark application flow on Spark application in WebUI.
Analyzed and performed data integration using Talend open integration suite.
Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
Ran many performance tests using the Cassandra-Stress tool in order to measure and improve the read and write performance of the cluster.
Populated HDFS and Cassandra with huge amounts of data using ApacheKafka.
Involved in converting Hive/SQL queries into Spark transformations using Spark, RDD, Python and Scala.
Involved in development of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phases.
Followed Agile Methodology and SCRUM meetings to track, optimize and tailored features to customer needs.

Environment:: Java J2EE, Hadoop, AWS, Spark, Scala, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, Zoo keeper, MapReduce, Sqoop, LINUX, MapR, Big Data, UNIX Shell Scripting, Strom,Agile.

Confidential, MN

Sr.Hadoop Developer

Responsibilities:

Deployment of Hadoop Cluster (HDInsight) and Data pipelines using Big Data analytic tools.
Worked closely with Data source team for understanding the scale and format of data to be ingested on daily basis.
Used Spark over Hortonworks Hadoop YARN for performing transformations and analytics on Hive tables.
Designed complex ETL systems using SQLServer and NOSQL in python and migration from various databases to Azure Blob storage.
Wrote Lambda functions in python for Azure which invokes python scripts to perform various transformations and analytics on large data sets in EMRclusters.
Imported and Exported the data from RDBMS to HDFS Data lake and HDFS to Teradata using Sqoop Import, Sqoop incremental Import and Sqoop Export functionalities and scheduled the jobs on daily basis with Shell Scripting.
Used Sqoop import functionality for loading Historical data present in a Relational Database system into Hadoop File System(HDFS).
Extensively used Solr to enable indexing for enabling searching on non-primary key columns from the Cassandra key spaces.
Analyzed the SQL scripts and Designed the Solution to Implement Using PySpark.
Efficiently joined raw data with the reference data using Pig scripting.
Used various file formats like Parquet, Avro, ORC and compression techniques like Snappy, LZO and GZip for efficient management of cluster resources.
Written HadoopMapReduce jobs using JAVAAPI for processing data present on HDFS.
Imported the historical data present in MongoDB using Sqoop import and stored in HDFS using compression techniques.
Expert knowledge in MongoDB NoSQL data modelling, tuning, disaster recovery and Backup.
Unstructured files like XML’s, JSON files are processed using custom built java API and pushed into mongoDB.
Developed processes to integrate events data from Nifi Transformations and finally load to AWS S3 buckets.
Worked on migrating the old java stack to type safe stack using Scala for Backend Programming.
Used Slick to query and storing in database in a Scala fashion using the powerful Scala collection framework
Worked on MongoDB, NoSQL data Modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
Written the ShellScripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Fetched and Generated monthly reports, Visualization of those reports using Tableau.
Developed FlumeETL job for handling data from HTTPSource and Sink as HDFS.
Used AVRO, Parquet file formats for serialization of data.
Agile Methodology and SCRUM meetings to track, optimize and tailored features to customer needs.

Environment:: Apache Hadoop, Pig, Hive, Sqoop, Spark, Spark Streaming, SparkSQl, Kafka, MapReduce, HDFS, LINUX, oozie, MongoDB,Solr, AWS, Tableau, Nifi, Rabbit MQ, Agile.

Confidential, MI

Hadoop Developer

Responsibilities:

Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. We used for improving campaign targeting and efficiency.
Responsible for building scalable distributed data solutions using Hadoop.
Using oozie workflows and enabled email alerts on any failure cases.
Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig.
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic in Hadoop.
Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
Supporting HBase Architecture Design with the Hadoop Architect group to build up a Database Design in HDFS.
Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
Wrote Flume configuration files for importing streaming log data into HBase with Flume.
Experience in implementing using one or more Azure PaaS services like web sites, SQL Azure Database, Storage, Cloud Services.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Loading the data to HBase using Pig, Hive and Java API’s.
Incoming messages were handled by using play framework MVC framework.
Managed and reviewed Hadoop log files to identify issues when job fails.
Analyzed large data sets by running Hive queries and Pig scripts.
Implemented Frameworks using Java and Python to automate the ingestion flow.
Worked on tuning the performance on Pig queries.
Mentored analyst and test team for writing Hive Queries.
Troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment:: Java J2EE, Hadoop, AWS, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, MapReduce, Sqoop, LINUX, HBase, Scala, Spark, MapR, Big Data, UNIX Shell Scripting, Strom,Agile.

Confidential, IL

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Analyzed large data sets by running HiveQueries and PigScripts.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using Pyspark.
Experience in building CI/CD methodology in Azure using technologies like Jenkins.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
Involved in creating Hive tables, loading and analyzing data using HiveQueries.
Load and transform large sets of structured, semi structured and unstructured data.
Worked with application teams to install Hadoop updates, patches and version upgrades as required
Implemented test scripts to support test driven development and continuous integration.
Developed and maintained complex outbound notification applications that run on custom architectures, using diverse technologies including Java, J2EE, SOAP, XML, JMS and JBoss.
Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
Implemented Oozie workflow engine to run multiple Hive and Python jobs
Used Sqoop, Pig, Hive as ETL tools for pulling and transforming data.
Managed and reviewed Hadoop Log Files. Used Scala integration Spark into Hadoop
Migrated data existing in Hadoop cluster into Spark and used SparkSQL and Scala to perform actions on the data
Wrote Shell Scripts for rolling day-to-day processes and it is Automated.
Troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment:: Java J2EE, Hadoop, Spark, AWS, Cloudera, Cassandra, HDFS, Flume, Hive, Kafka, Impala, oozie, MapReduce, Scala, Sqoop, LINUX, MapR, Big Data, PySpark, UNIX Shell Scripting, Strom,Agile.

Confidential

Java/J2EE Developer

Responsibilities:

Designed and developed rich front-end screens using JSF,JSP, Docker, CSS, HTML, Angular jsand jQuery.
Developed Managed beans and defined Navigation rules for the application using JSF.
The application we developed is based on microservices Architecture.
Worked on generating the web services classes by using SOA, WSDL, UDDI and SOAP.
Responsible for developing Use case diagrams, Class diagrams, Sequence diagrams and process flow diagrams for the modules using UML and Rational Rose.
Configured the Hibernate mapping files for mapping the domain objects to the database tables and their corresponding properties to the table columns.
Queries for accessing data were built using the Hibernate API.
Used Java Messaging Services (JMS) for reliable and asynchronous exchange of essential information such as payment status report to MQServerusing MQSeries.
Used RAD as IDE for development, build, deployment and testing the application.
Experience with Java microservices in Spring.
Used Log4j framework for logging the application.
Used Maven for build and deployment.
Used SVN as a version control tool and used WebSphere server.
Performed some Unit Testing on the application and the web services before its release to QA.
Documented and communicated test result to the team lead on daily basis.
Tested the whole module using SOAPUI.
Involved in writing database connection classes for interacting with Oracle database. Incorporated Singleton Pattern to implement the database access classes.

Environment: JSF 1.2/2.0, Spring 3.0, Hibernate 4.0, JMS, Web Services (Restful), Maven, TDD (Test-driven development), Spring, Singleton Design pattern, SVN, JSP, HTML5, CSS3, JavaScript, jQuery, Agile, SQL, WebSphere, AWS, JUnit and Log4j,Agile.

Confidential

Java/J2EE Developer

Responsibilities:

Involved in development of Staffing sub-modules like Staffing Override, Interview Override, Resume Upload.
Performed Analysis and development of Stateless Session Bean, Data Access object and ApplicationComponent for Screening and Shortlisting module.
Configured JBossApplication Server and deployed the web components into the server,
Involved in debugging, testing and integration of the system.
Worked with Spring, RestfulWebServices to interact with Objects created ORM tools.
Worked on fixing bugs raised by the users,
Worked with Spring Restful Web Services to interact with the JPA Objects created using ORM tools.
Documented all the low-level design of the Application.
Developed JSP / Action servlet classes.
Designed and developed user interfaces using JSP, JavaScript and HTML.
Developing Hibernate XML Java object-to-database mapping documents.

Environment: Core Java, J2EE, EJB, JSP, HTML, Java Script, Hibernate, Restful Web services,Eclipse,UNIX. Spring, Agile.

We provide IT Staff Augmentation Services!

Sr.srk Developer Resume

PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship