Sr. Hadoop And Spark Developer / Data Scientist Resume
Franklin, TN
PROFESSIONAL SUMMARY:
- To work for an organization where I am able to contribute to the organization’s growth with my skills and in turn get an opportunity to gain exposure and expertise that would help me build a strong and successful career.
- Experienced Java and Hadoop Developer has a strong background with file distribution systems in a big - data area. Understands the complex processing needs of big data and have experience developing codes and modules to address those needs. Brings a Certification Training as Hadoop and Spark Developer (CLOUDERA), Moving Data into Hadoop (IBM), Hadoop Data Access (IBM), Data pipelines with Apache Kafka(IBM),
- 9 years of total IT experience in all phases of Hadoop Development, Java Development along with experience in Application Development,Data modeling, Data mining
- Good experience with Big Data Ecosystems,ETL
- Expertise in Java, Python, R and Scala.
- Expertise in data analytic tools R, Tableau, and SQL
- Experience in data architecture including Data ingestion pipeline design, Data analysis and Data Analytics, advanced Data processing. Experience optimizing ETL workflows.
- Experience in Hadoop (Cloudera, HortonWorks,MapR, IBM Big Insights) - Architecture, Deployment and Development.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target Data warehouse.
- Expertise in Java/J2EE technologies such as Core Java, Struts, Hibernate, JDBC, JSP, JSTL, HTML, JavaScript, JSON
- Experience with database SQL and NoSQL (MongoDB) (Cassandra )
- Hands on experience with Hadoop Core Components (HDFS, MapReduce) and Hadoop Ecosystem (Sqoop, Flume, Hive, Pig, Impala, Oozie, HBase).
- Experience in ingesting real time/near real time data using Flume, Kafka, Storm
- Experience in importing and exporting the data using Sqoop from Relational Database to HDFS and reverse.
- Hands on Experience on Linux systems
- Experience in using Sequence files, AVRO file,Parquet file formats; Managing and reviewing Hadoop log files
- Good knowledge in writing Spark application using Python, Scala and Java
- Experience in writing MapReduce jobs.
- Efficient in analyzing data using HiveQL, Pig Latin, partitioning an existing data set with static and dynamic partition, tune data for optimal query performance.
- Good experience transformation and storage: HDFS, MapReduce, Spark
- Good understanding of HDFS architecture.
- Experienced in Database development, ETL, OLAP, OLTP
- Knowledge of extracting an Avro schema using avro-tools and evolving an Avro schema by changing JSON files
- Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Experience in UNIX Shell scripting.
- Developing and maintaining applications on the AWS platform
- Experience with developing and Maintaining Applications written for Amazon Simple Storage Service, Amazon Dynamo DB, Amazon Simple Queue Service, Amazon Simple Notification Service, Amazon Simple Workflow Service, AWS Elastic Beanstalk, and AWS Cloud Formation.
- Picking the right AWS services for the application
WORK EXPERIENCE:
Confidential, Franklin, TN
Sr. Hadoop and Spark Developer / Data Scientist
Responsibilities:
- Experienced in development using Cloudera distribution system.
- As a hadoop Developer, my responsibility is manage the data pipelines and data lake.
- Performing Hadoop ETL using hive on data at different stages of pipeline.
- Worked in an agile technology with Scrum
- Sqooped data from different source systems and automating them with oozie workflows.
- Generation of business reports from data lake using Hadoop SQL (Impala) as per the Business Needs.
- Automation of Business reports using Bash scripts in Unix on Datalake by sending them to business owners.
- Developed Spark scala code to cleanse and perform ETL on the data in data pipeline in different stages.
- Worked in different environments like DEV,QA, Datalake and Analytics Cluster as part of Hadoop Development.
- Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
- Developed pig scripts, python to perform Streaming and created tables on the top of it using hive.
- Developed multiple POCs using Scala and Pyspark and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
- Supported Map Reduce Programs those are running on the cluster.
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Good Understanding of Workflow management process and in implementation.
- Knowledge on HL7 protocols and parsing the messages from the HL7 messages.
- Involved in the development of frameworks that are used in Data pipelines and co - ordinated with cloudera consultant
Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera
Confidential, Dallas, TX
Sr.Hadoop Developer
Responsibilities:
- Helped the team to increase cluster size from 55 nodes to 145+ nodes. The configuration for additional data nodes was managed using Puppet.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs
- Integrate Apache Spark with Hadoop components
- Java for data cleaning and preprocessing.
- Extensive experience in writing HDFS and Pig Latin commands.
- Developed complex queries using HIVE and IMPALA.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MYSQL database and vice - versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Written Hive and Pig scripts as per requirements.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Developed Spark Application by using Scala
- Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
- Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing
- Used Spark Dataframes, Spark-SQL, Spark MLLib extensively
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy, normalizer, joiner and rank transformations in Power Center Designer.
- Created, altered and deleted topics (Kafka Queues) when required with varying
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSql database such as Hbase,MongoDB Involved in cluster maintenance and monitoring.
- Load and transform large sets of structured, semi structured and unstructured data
- Involved in loading data from UNIX file system to HDFS.
- Created an e-mail notification service upon completion of job or the particular team which requested for the data.
- Worked on NOSQL databases which differ from classic relational databases.
- Conducted requirements gathering sessions with various stakeholders
- Involved in knowledge transition activities to the team members.
- Successful in creating and implementing complex code changes.
Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential, Wilmington, DE
Hadoop Developer
Responsibilities:
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations
- Wrote MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi - structured and unstructured data.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Java, and Python shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frames and Pair RDD's.
- Predictive analytics (which can monitor inventory levels and ensure product availability)
- Analysis of customers' purchasing behaviors
- Response to value-added services based on clients' profiles and purchasing habits
- Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
- Providing pivotal graphs in order to show the trends
- Maintenance of data importing scripts using Hive and MapReduce jobs
- Developed and maintain several batch jobs to run automatically depending on business requirements
- Unit testing and Deploying for internal usage monitoring performance of solution
Environment: Apache Hadoop, Hive, PIG, HDFS, Java Map-Reduce, Core Java, Scala, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata and MySql
Confidential, Carlsbad, CA
DATA ENGINEER
Responsibilities:
- Exported data to a Mysql from HDFS using Sqoop and NFS mount approach.
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Developed Map Reduce programs for applying business rules on the data.
- Developed and executed hive queries for denormalizing the data.
- Works with ETL workflow, analysis of big data and loaded them into Hadoop cluster.
- Installed and configured Hadoop Cluster for development and testing environment.
- Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
- Automated the workflow using shell scripts.
- Performance tuning of the Hive queries, written by other developer.
- Mastered major Hadoop distros HDP/CDH and numerous Open Source projects
- Prototype various applications that utilize modern Big Data tools.
Environment: Linux, Java, Map Reduce, HDFS, DB2, Cassandra, Hive, Pig, Sqoop, FTP
Confidential, Dallas,Texas
Java Developer
Responsibilities:
- Developed UI screens for data entry application in Java swing.
- Worked on backend service in Spring MVC and openEJB for the interaction with Oracle and Mainframe using DAO and model objects.
- Introduced Spring IOC to increase application flexibility and replace the need for hard - coded class based application functions
- Used Spring IOC for dependency injection to autowire different beans and data source to the Application.
- Used Spring JDBC templates for database interactions and used declarative Spring AOP transaction management.
- Used mainframe screen scraping for adding forms to mainframe through the claims data entry application.
- Worked on jasper reports (iReport 4.1.1) to generate reports for various people (executive secretary and commissioners) based on their authorization.
- Generated Electronic letters for attorneys and insurance carriers using iReport.
- Worked on application deployment on various tomcat server instances using putty.
- Worked in TOAD for PL/SQL in Oracle database for writing queries, functions, stored procedures and triggers.
- Worked on JSP, Servlets, HTML, CSS, JavaScript, JSON, Jquery, AJAX for Vault web based project and Confidential application.
- Used Spring MVC architecture with dispatcher Servlet and view resolver for the web applications.
- Worked on web service integration for Confidential project for integrating third party pay processing system with Confidential application.