Sr. Hadoop/apache Spark Developer Resume
Auburn, MichigaN
SUMMARY
- Overall 7 years of Experience in teh field of Java and Data Engineering using Hadoop, HDFS, MR2, YARN, Apache Kafka, Apache PIG, Hive, Apache Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, CDH5, AWS, Apache Spark, Apache Scala, Java Development and Software Development Life Cycle (SDLC)and Python wif Apache Spark Implementation.
- Strong working Knowledge in Agile Methodologies, Scrum stories and Sprints experience in Python environment, along wif Data Analytics, and Excel data extracts.
- Experience wif Horton Works and Cloudera platforms.
- Sound noledge in Big data, Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
- Experienced on working wif job/workflow scheduling and monitoring tools like Oozie.
- Extensive real time experience wif Apache Scala and its affiliated components.
- Implemented Hash Map, Hash Sets, Linked Hash Map by using Apache Scala.
- Hands on Experience wif reporting tools like Tableau.
- Hands on experience wif Apache Kafka and its ore architectural components.
- Knowledge of distributed systems, HDFS architecture, anatomy of MapReduce and Apache Spark processing frameworks.Worked on debugging and performance tuning of Hive Jobs.
- Implemented Sqoop Queries for data import into Hadoop from MySQL.
- Working noledge of NoSQL databases such as HBase, Cassandra.
- Working Knowledge on Apache Scala Programming.
- Proficient in applying performance tuning concepts to SQL Queries, Informatica Mappings, Session and workflow properties, and database.
- Implemented Java tools in business, Web, and client - server environments including Java Platform, J2EE, EJB, JSP, Servlets, Struts, Spring, JDBC.
- Experience in data cleansing, extracting, pre-processing, transformation and data mining.
- Around 3 years of experience in advanced statistical techniques including predictive statistical models, segmentation analysis, customer profiling, survey design and analysis, and data mining tools like supervised, unsupervised learning models.
- Dynamic personality wif problem-solving, analytical, communication and interpersonal skills.
- Expertise in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).
TECHNICAL SKILLS
Programming Languages-: C, C++, Java, Python, SQL, Apache Kafka, Apache Scala
Web Technologies: HTML, XML, JSP, JSF
Hadoop Ecosystem: YARN, MR2, Sqoop, Hive, Pig, Flume, Oozie, Apache Spark
Hadoop Distribution: Hortonworks, Cloudera, Docker
Databases: MySQL, Teradata, RDBMS
No SQL Databases: MongoDB, Cassandra, HBase
Reporting Tools: Tableau, Power BI
Frameworks-: MVC, Impala, Apache Kafka, Apache Spark, Py Spark, Horton works, Cloudera
Operating Systems: Unix, Linux, Windows
Cloud based Databases: EC2, S3, EBS, RDS and VPC
PROFESSIONAL EXPERIENCE
Confidential, Auburn, Michigan
Sr. Hadoop/Apache Spark Developer
Responsibilities:
- Worked on teh Hadoop Ecosystem wif tools like HBase and Sqoop.
- Responsible for building applications utilizing Hadoop.
- Involved in stacking information from LINUX record framework to HDFS.
- Worked on recovery, scope quantification.
- Created HBase tables to store variable organizations information originating from various portfolios.
- Created Data bricks implemented by Apache Scala stack, lists.
- Implemented test scripts to halp test driven improvement and consistent coordination.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket
- Played a key role in configuration of teh various Hadoop ecosystem tools such as Apache Kafka, Pig, HBase.
- Implementation noledge of Apache Spark framework wif RDD’s.
- Worked wif Apache Kafka implementation.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Ran data formatting scripts in Python and created csv files to be consumed by Hadoop MapReduce jobs.
- Created Hive tables to store data into HDFS, loading data and writing hive queries dat will run internally in map-reduce.
- Apache Scala programming is worked through implementation layers.
- Worked wif teh advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to efficiently run teh algorithm on teh huge datasets.
- Apache Spark Hash-maps and Lists were implemented.
- Worked on analyzing Hadoop cluster using different big data processing tools including Hive
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Technologies Used- Hadoop, MapReduce, HDFS, Hive, Java, Sqoop, AWS, HBase, SQL, Cloudera
Confidential, Charlotte, North Carolina
Sr. Hadoop Developer
Responsibilities:
- Importing and sending out information into HDFS and Hive utilizing Sqoop.
- Experienced in running Hadoop stream jobs to process terabytes of xml group information wif teh halp of Map Reduce programs.
- Used parquet file format for published tables and created views on teh tables.
- In-charge of managing data coming from different sources.
- Support in running MapReduce Programs in teh cluster.
- Apache Scala implementation is supported.
- Cluster coordination services through Zoo Keeper.
- Involved in loading information from UNIX document framework to Hadoop Distributed File System.
- Installed, configured Hive and furthermore composed Hive UDFs.
- Automated every one of teh jobs for pulling information from FTP server to stack information into Hive tables, utilizing Oozie work processes.
- Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data to partitioned tables using Apache Spark.
- Wrote User Defined functions (UDFs) for special functionality for Apache Spark.
- Used SQOOP Export functionalities and scheduled teh jobs on daily basis wif Shell scripting in Oozie.
- Worked wif SQOOP jobs to import teh data from RDBMS and used various optimization techniques to optimize Hive and SQOOP.
- Used SQOOP import functionality for loading Historical data present in a Relational Database system into Hadoop File System (HDFS).
Technologies used- Hadoop, MapReduce, HDFS, Hive, Java, R, Sqoop, Apache Spark, Sqoop, Horton Works
Confidential, Woodlands, TX
Big Data Developer/Hadoop Developer
Responsibilities:
- Collected raw files from FTP server and ingested files using proprietary ETL framework.
- Built new ETL packages using Microsoft SSIS. New packages included detailed workflow of data imports from client FTP server.
- Troubleshoot ETL failures and performed manual loads using SQL stored procedures.
- Engineered client's platform by incorporating new dimensions onto teh client's site using SQL Server Integration Services.
- Engineered new OLAP cubes dat aggregated health provider's patient visit data.
Technologies Used- SQL, ETL, SSIS, Cloudera
Confidential
Java Developer
Responsibilities:
- Designed, implemented and maintained java application phases
- Took part in software and architectural development activities
- Conducted software analysis, programming, testing and debugging.
- Implemented various phases like develop, test, implement and maintain application software.
- Recommend changes to improve established java application processes
- Develop technical designs for application development
- Develop application code for java programs
- Designed forms using JavaScript and HTML for form validations.
- Developed servlet-based applications.
- Maintained teh existing modules and applications.
- Developed server side and client-side code for internal and external web applications.
Technologies Used- Java based web services, Relational Databases, SQL and ORM, J2EE framework, Object Oriented Analysis and Design, JSP, EJB (Enterprise Java Beans), XML, Test-Driven Development, JSP, HTML, CSS, Ubuntu