Sr. Hadoop /spark Developer Resume
Waukegan, IllinoiS
PROFESSIONAL SUMMARY:
- 9 years of experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Java and Big Data Technologies working with Apache Hadoop Eco - components
- Extensive participation in analysisactivities, assisting in troubleshooting architectural problems and providing technical solutionsto meet business requirements
- Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala.
- Experienced in Real Time Streaming applications using Kafka, Flume, Storm and Spark Streaming
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Experience in analyzing data using HiveQL, Pig Latin, HBase& custom Map Reduce programs.
- Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
- Good understanding & experience on Hadoop Distributions like Cloudera & Hortonworks .
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data in near real time
- Hands on working experience in Linux environment with Apache Tomcat . Used UML to design class diagrams for object-oriented analysis and designing.
- Expertise in data transformation & analysis using SPARK, PIG, HIVE
- Good understanding & knowledge of NOSQL databases like HBase, Cassandra and Mongo DB.
- Experience with ETL and Query big data tools like Pig Latin and Hive QL .
- Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig MapReduce, Hive .
- Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA) .
- Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for bestPractice and Performance improvement.
- Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
- Developed Spark SQL programs for handling different data sets for better performance.
- Good knowledge of creating event-processing data using Spark Streaming.
- Experience in building web services using both SOAP and RESTful services in Java.
- Experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Apache Tomcat.
- Experience in performing Unit testing using Junit and TestNG.
- Extensive experience in documenting requirements, functional specifications, technical specifications.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker .
AWS Components: EC2, S3
No SQL Databases: HBase, Cassandra, MongoDBLanguages C, C++, Java, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE Eclipse, NetBeans, Toad, Maven, SBT, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
PROFESSIONAL EXPERIENCE:
Confidential, Waukegan, Illinois
Sr. Hadoop /Spark Developer
Responsibilities:
- Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context.
- Client Interfacing skills: Had a interacted with Managing Director and High level business frequently on resolving show stopper and any critical incident
- Worked on different applications incidents of Confidential to provide insights, perform ad-hoc analysis and implemented in the production environment.
- Performed Data analysis on Hive tables using Hive queries (ieHiveQL)
- Involved in Estimations and release plan for project release activities.
- Data patch for production issues via tactical fix whenever required and provide suggestion to improve the quality of data which is consumed by downstream
- Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
- Load and transform large sets of semi-structured and unstructured data on HBase and Hive.
- Developed spark scripts and python functions that involve performing transformations and actions on data sets.
- Monitored workflow in Oozie that automate the tasks of loading the data into HDFS and if there is an issue re-processing is done manually.
- Developed the UNIX shell scripts for creating the reports from Hive data
- Worked with different file formats like JSON, XML, Avro data files and text files.
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra.
- Hands on experience in creating Apache Spark RDD transformations on Data sets in the Hadoop data lake.
- Closely collaborated with both the onsite and offshore team.
- Closely worked with App support team to deploy the developed jobs into production.
- Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2, SQL Server and MySQL.
- Experience in working with Spark SQL for processing data in the Hive tables.
- Developing and testing data Ingestion/Preparation/Dispatch jobs
- Worked on HBase table setup and shell script to automate ingestion process.
Confidential, Richmond, VA
Sr Hadoop Spark Developer
Responsibilities:
- Responsible for Loading the Customers Data from SAS to MSSQL2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop
- Written PIGscripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants
- Responsible for writing Lucenesearch program for high-performance, full-featured text search of Merchants
- Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming
- Written Java UDFs to convert to upper case of card names & process dates to suitable format in PIG&Hive
- Responsible for creating data pipeline using Kafka, Flume and Spark Streaming for Twitter source to collect the sentiment tweets of Target customers about the reviews
- Implemented Kerberossecurity Implementation
- Complete caring of Hive and Spark tuning with partitioning/bucketing of ORC and executors/driver’s memory
- Written Hive UDFs to extract data from staging tables
- Analyzed the web log data using the HiveQL and process through Flume
- Executed queries using Hive and developed Map-Reduce jobs to analyze data.
- Worked on analyzing Hadoopcluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Created Hivetables and load the data using Sqoop and worked on them using Hive QL
- Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
- Optimizing the HiveQueries using the various files format like JSON, AVRO , ORC , and Parquet
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
- Operating the cluster on AWS by using EC2, EMR, S3 and Elastic Search.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Migrated existing MapReduce programs to Spark using Scala and Python
- Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable.
- Analyze the tweets json data using hive SerDeAPI to deserialize and convert into readable format
- Processed application Weblogs using flume and load them into Hive for analyzing the data.
- Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
- Generated detailed design documentation for the source-to-target transformations.
- Involved in planning process of iterations under the Agile Scrum methodology.
Environment: Hadoop, HDFS, Kerberos, Apache Sentry, MapReduce, Hive, Pig, HBase, Sqoop, Spark, Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files.
Confidential, Carrolton, TX
Sr. Hadoop Developer
Responsibilities:
- Analyzed the web log data using the HiveQL and process through Flume and Spark Streaming
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Executed queries using Hive and developed Map-Reduce jobs to analyze data.
- Worked on analyzing Hadoopcluster and different big data analytic tools including Pig, HBaseNoSQLdatabase and Sqoop
- Developed the PigUDF's to preprocess the data for analysis.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Written Hive UDFs, UDAFs to extract data from staging tables
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Implemented Proof of Concepts on Hadoopstack and different big data analytic tools, Migration from different databases
- Optimizing the Hive Queries using the various files format like JSON, AVRO , ORC , and Parquet
- Configured deployed and maintained multi-node Dev and UAT Kafka Clusters.
- Analyze the tweets JSON data using hive SerDeAPI to deserialize and convert into readable format
- Processed application Weblogs using flume and load them into Hive for analyzing the data
- Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
- Generated detailed design documentation for the source-to-target transformations.
- Wrote UNIX scripts to monitor data load/transformation.
- Involved in planning process of iterations under the Agile Scrum methodology.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Spark , Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files.
Confidential, Dallas, TX
Sr. Hadoop Developer
Responsibilities:
- Completely involved in the requirement analysis phase.
- Involved in gathering the requirements, designing, development and testing
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Analyzing the requirement to setup a cluster
- Created the script files for processing data and loading to HDFS
- Created the Apache PIGscripts to process the HDFS data.
- Created Hive tables to store the processed results in a tabular format.
- Developed the sqoop scripts in order to make the interaction between Pig and Cassandra Database.
- Created CLI commands using HDFS.
- Created two different users (hduser for performing hdfs operations and map red user for performing mapreduceoperations only)
- Involved in crawldata flat files generation from various retailers to HDFS for further processing.
- Setup Hive with MySQL as a Remote Metastore
- Log/text files generated by various products are moved into HDFS location
- Created MapReduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Created ExternalHive Table on top of parsed data.
Environment: Cloudera Distribution, HadoopMap Reduce, HDFS, Python, Hive, HBase, HiveQL, Sqoop, Java, UNIX, Maven.
Confidential
Sr. Java Developer
Responsibilities:
- Generated Domain Layer classes using DAO’s from the Database Schema.
- Defined set of classes for the Helper Layer which validates the Data Models from the Service Layer and prepares them to display in JSP Views.
- Design and develop the interface to interact with web services for card payments
- Performed enhancements to existing SOAP web services for online card payments
- Performed enhancements to existing payment screens by developing servlets and JSP Pages
- Involved in end to end batch loading process using ETLInformatica
- Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
- Developed Validation Layer providing Validator classes for input validation, pattern validation and access control
- Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Involved in creation of Test Cases for JUnit Testing and carried out Unit testing.
- Used ClearCase as configuration management tool for code versioning and release deployment on Oracle WebLogic Server 10.3.
- Used MAVEN tool for deployment of the web application on the Weblogic Server.
- Interacted with business team to transform requirements into technical solutions.
- Involved in the functional tests of the application and also resolved production issues.
- Designed and Developed application using EJB and Spring framework.
- Developed POJO’s for Data Model to map the Java Objects with Relational database tables.
- Designed and developed Service layer using spring framework.
Environment: Java, J2EE, Spring, SOAP, Web Services, MAVEN, Solaris, WEBLOGIC7.0, Oracle 8i, Informatica 8.5, Mainframe, OSS/BSS, Log4j, Servlets,JSP,JSTL,JDBC,HTML,Java Script,CSS,RationalRose,UML
Confidential
Java Developer
Responsibilities:
- Design and developed the presentation layer using theJSPpages for the payment module.
- Used patterns including Singleton, factory, MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate
- Design and developed REST services for Payment Gateway.
- Developed controllers and JavaBeans encapsulating the business logic.
- Developed classes to interface with underlying web services layer.
- Worked on Service Layer which provided business logic implementation.
- Involved in building PL/SQLqueries, triggers and stored procedures for Database operations.
- Involved in specification analysis and identifying the requirements.
- Participated in design discussions for the methodology of requirement implementation.
- Involved in preparation of the Code Review Document & Technical Design Document.
- Used JasperReports to provide print preview of Financial Reports and Monthly Statements.
- CR Implementation.Carried out integration testing & acceptance testing.
- Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
- Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions.
Environment: Java1.5, J2EE 1.5, JDBC, JAXB, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JAX-RS, Jersey, JUnit, PL\SQL, UML, Eclipse