Hadoop/srk Developer Resume
PA
SUMMARY:
- 8+years of experience in softwaredevelopment , deployment and maintenance of various web based applications using Java, and Big Data Ecosystems on Windows and Linux environments.
- 5+ years of experience on major components in Hadoop Ecosystem like MapReduce , HDFS , Hive , Pig , HBase , Zookeeper , Sqoop , Oozie , Flume , Storm , Yarn , Spark , SparkStreaming , SparkSQL , Mahout , Nifi , Kafka , Impala
- Experience working on various Cloudera distributions like (CDH 4/CDH 5) and Horton worksdistributions, Knowledge on Amazon EMR Hadoop distributors.
- Extensive experience working with real time streaming applications and batch style large scale distributed computing applications, worked on integrating Kafka with NiFi and Spark .
- Developed re - usable and configurable components as part of project requirements in Java , Scala and Python .
- Good knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying , Higher Order Functions and Pattern Matching .
- Strong knowledge on advanced features of Java8 like Lambda Expressions and Streams .
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 instances and S3 , configuring the servers for Auto scaling and Elastic load balancing .
- Developed component to upload/download from AWS S3 based on project specific configurations.
- Developed machine learning POCs using R programming and Python modules for data analytics.
- Developed end to end POC from Data Ingestion to Data Transformation to Data Quality to Data Lineage for Big Data Platform.
- Strong knowledge of machine learning algorithms like Linear Regression , Logistic Regression , Decision Tree , SVM and K-Means .
- Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle .
- Expertise in writing Spark RDDtransformations , actions for the input data and Spark-SQL queries, Data frames to import data from Data sources to perform datatransformations , read/write operations using Spark - Core and save the results to output directory into HDFS .
- Hands on experience in coding Map Reduce/Yarn Programs using Java , Scala for analyzing Big data .
- Good Knowledge on Spark framework on both batch and real-time data processing.
- Hands on experience in MLlib from Spark to use in predictive intelligence , customer segmentation and for smooth maintenance in Spark streaming .
- Expertise in Storm for reliable real-time data processing capabilities to Enterprise Hadoop .
- Hands on experience in scripting for automation , and monitoring using Shell , Python&Perl scripts.
- Through knowledge in ETL , Data Integration and Migration and Extensively used ETL methodology for supporting Data Extraction , transformations and loading using Informatica .
- Hands on experience with the data extraction , transformation and load in Hive , Pig and HBase .
- Worked on importing data into HBase using HBase Shell and HBase Client API .
- Designed and developed custom processers and data flow pipelines between systems using flow-based programming in Apache NiFi , extensive experience in using NiFi’s web based UI.
- Experience in developing data pipeline using Kafka , Spark and Hive to ingest , transform and analyzing data .
- Hands on experience in writing Pig Latin scripts , working with grunt shells and job scheduling with Oozie .
- Experience in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MYSQL , Oracle , Teradata and DB2 using Sqoop according to client’s requirement.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark , with Hive and SQL/Teradata .
- Good understanding knowledge in MPP (Massively Parallel Processing) databases such as HP Vertica and Impala .
- Extensive hands-on experience in ETL , Oracle PL/SQL and Data Warehouse , Star Schemas .
- Involved in developing Impala scripts for extraction , transformation , loading of data into data warehouse .
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig .
- Experienced in collecting metrics for Hadoop clusters using Ambari & Cloudera Manager .
- Experience in performance tuning , monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager .
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Good understanding and hands on work experience in writing applications on NoSQL databases like HBase , MongoDB and Cassandra with functionality and implementation .
- Good understanding knowledge in installing and maintainingCassandra by configuring the Cassandra . yaml file as per the requirement and performed reads and writes using Java JDBC connectivity .
- Experience in Extraction , Transformation & Loading (ETL) of data with different file formats like CSV , text files , sequence files , Avro , Parquet , JSON , ORC and used file compression codecs like gzip , lz4 & snappy .
- Experience in using version control tools like CVS , GIT , SVN . Build tools like Ant and Maven .
- Good knowledge of Web/Application Servers like ApacheTomcat , IBM WebSphere and Oracle WebLogic .
- Design and Programming experience in developingInternet Applications using JSP , MVC , Servlets , Struts , Hibernate , JDBC , JSF , EJB , AJAX , webbased development tools and Web Services using XML , HTML ,and SOAP.
- Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau , Deployed data from various sources into HDFS and building reports using Tableau .
- Participated in entire Software Development Life Cycle including Requirement Analysis, Design, Development, Testing, Implementation, Documentation and Support of software applications.
- Have a good experience working in Agile development environment including Scrum methodology .
- Strong analytical skills and ability to understand existing business processes.
TECHNICAL SKILLS:
Big Data Eco Systems: HDFS, MapReduce, Hive, Yarn, HBase, Pig, Sqoop, Kafka, Storm, Flume, Oozie, ZooKeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr
No SQL Databases: Hbase, Cassandra, mongoDB
Programming Languages: C, C++, Java, Scala, Python, SQL, PL/SQL, HiveQL, Pig Latin
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, JSP, Servlets, EJB
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins,ANT, JDeveloper
Cloud Technologies: Amazon WebServices(AWS), Mahout, Microsoft Azure Insight, Amazon RedShift,S3
WORK EXPERIENCE:
Confidential, PA
Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Experience in Loading the data into Spark RDD’s, perform advanced procedures like text analytics and processing using in memory data Computation capabilities of Spark using Scala.to generate the Output response.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective&efficientJoins, Transformations and other during ingestion process itself.
- Developed Scala scriptsusing both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
- Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra for data access and analysis.
- Created Hive tables for loading and analyzing data, Implemented Partitions, Buckets and developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
- Worked on a POC to compare processing time of Impala with ApacheHive for batch applications to implement the former in project.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoopcluster .
- Worked with BI team to create various kinds of reports using Tableau based on the client's needs.
- Experience in Querying on Parquetfiles by loading them in to Spark's data frames by using Zeppelin notebook.
- Experience in troubleshooting any problems that arises during any batch data processing jobs.
- Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming .
- Migrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
Environment: Hadoop Yarn, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, MySQL, Linux.
Confidential, CA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop .
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability.
- This project will download the data that was generated by sensors from the Patients body activities, the data will be collected in to the HDFS system online aggregators by Kafka .
- Kafka consumer will get the data from different learning systems of the patients.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in HBase .
- Used Hadoop's Pig , Hive and Map Reduce for analyzing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region details.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Uses Pig in three distinct workloads like pipelines, iterative processing and research.
- Uses Pig UDF's in Python , Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank .
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers .
- Created Hive tables to store the processed results in a tabular format.
- Good experience in PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS , and HBASE tables using Sqoop .
- Implemented exception tracking logic using Pig scripts.
- Implemented test scripts to support test driven development and continuous integration.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Good understanding of ETL tools and how they can be applied in a Big Data environment .
Environment: Hadoop, Map Reduce, Spark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Python, Eclipse, Hbase, Flume, Cloudera, Oracle, UNIX Shell Scripting.
Confidential, Perry, Iowa
Hadoop Developer
Responsibilities:
- Worked onSpark SQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE(Hadoop User Experience).
- Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Created the Hive external tables using Accumulo connector.
- Knowledge in developing Nifi flow prototype for data ingestion in HDFS .
- Managed real-time data processing and real time Data Ingestion in Mongo DB and Hive using Storm.
- Developed Spark scripts by using Python shell commands.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR.
Environment: Cloudera, HDFS, MapReduce, Storm, Hive, Pig, Sqoop, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, Nifi, Solr, Git, Maven.
Confidential
Java Developer
Responsibilities:
- Interact and coordinate with team members to develop detailed software requirements that will drive the design, implementation, and testing of the Consolidated Software application.
- Implemented the object-oriented programming concepts for validating the columns of the import file.
- Integrated Spring Dependency Injection (IOC) among different layers of an application.
- Designed the Database, written triggers, and stored procedures.
- Developed PL/SQL View function in Oracle 9i database for get available date module.
- Used Quartz schedulers to run the jobs in a sequential with in the given time
- Used JSP and JSTL Tag Libraries for developing User Interface components.
- Used Core Java concepts such as multi-threading , collections , garbage collection and other JEE technologies during development phase and used different design patterns .
- Created continuous integration builds using Maven and SVN control.
- Written deployment scripts to deploy application at client site.
- Involved in design , analysis , and architectural meetings.
- Created the stored procedures using Oracle database and accessed through Java JDBC .
- Configured log4j to log the warning and error messages.
- Implemented the reports module applications using jasper reports for business intelligence
- Supported Testing Teams and involved in defect meetings.
- Deployed web, presentation, and business components on Apache Tomcat Application Server .
Environment: Oracle Database, Ajax, Servlets, JSP, XML, Maven, SVN, Tomcat Server, Soap, Jasper Reports, JDBC
Confidential
Java Developer
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Implemented Model View Controller (MVC) architecture using Jakarta Struts frameworks at presentation tier.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Used Spring Framework for Dependency injection and integrated it with the Struts Framework.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL . Performed query optimization to achieve faster indexing and making the system more scalable.
- Deployed application on windows using IBM Web Sphere Application Server.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting data from different instruments and used SAX and DOM XML parsers for data retrieval.
- Implemented SOA architecture with web services using Web Services like JAX-WS .
- Used ANT scripts to build the application and deployed on Web Sphere, Application Server.
Environment: Core Java, J2EE, Oracle, SQL Server, JMS, EJB, Struts, Spring, JDK, JavaScript, HTML, CSS, AJAX, JUnit, Log4j, Web Services, Windows .