We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • Having 8+ years of Experience in IT industry in Designing, Developing and Maintaining Web based Applications using BigData like Hadoop and Spark Ecosystems and Java/J2EE Technologies.
  • Excellent understanding of Hadoop Architecture and Daemons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker and Map Reduce Concepts.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
  • Hands on experience in various big data application phases like data ingestion, data analytics and data visualization.
  • Experienced in writing MapReduce programs in Java to process large data sets using Map and Reduce Tasks.
  • In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
  • Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
  • Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
  • Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
  • Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python & Perl scripts.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka .
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Hands on Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Worked on version control tools like CVS, GIT, SVN .
  • Experience in Web Services using XML, HTML, and SOAP .
  • Experience in developing web pages using Java, JSP, Servlets, JavaScript, JQuery, Angular JS, Node, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Web Sphere.
  • Hands on Experience in working with Spark MLlib.
  • Experienced in Developing Spark programs using Scala and Java API’s.
  • Expertise in using Kafka as a messaging system to implement real-time Streaming solutions.
  • Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
  • Expertise in using Flume in Collecting, aggregating and loading log data from multiple sources into HDFS.
  • Scheduled various ETL process and Hive scripts by developing Oozie workflows.
  • Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Experience in handling various file formats like AVRO, Sequential, Parquet etc.
  • Proficient in Various NoSQL Databases like Cassandra, MongoDB, Hbase etc.
  • Good understanding of MPP databases such Impala and Created tables and writing Queries in IMPALA and GreenPlum.
  • Experienced in using Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWSand Amazon EC2, Amazon EMR.
  • Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
  • Experienced with Kerberos authentication to provide more security to the cluster.
  • Experienced with Cloudera Manager to monitor health and performance of the Hadoop cluster.
  • Experienced in writing Test cases and perform unit testing using testing frame works like Junit, Easy mock and Mockito.
  • Strong Knowledge in Informatica ETL Tool, Data warehousing and Business intelligence.
  • Good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
  • Expert in developing web applications using Struts,Hibernate and Spring Frameworks.
  • Hands on Experience in writing SQL and PL/SQL queries.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, Agile, White-box, Black-box.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile, waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON,NodeJs.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools: Talend, Informatica, Pentaho

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Sr. Hadoop/Spark Developer

Responsibilities:

  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Cassandra, Oozie, Sqoop, Kafka, Spark, Impala with Cloudera distribution
  • Developed Pig scripts to help perform analytics on JSON and XML data.
  • Created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
  • Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
  • Good understandings of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
  • Worked with SCRUM team in delivering agreed user stories on time for every sprint.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Log4j framework has been used for logging debug, info & error data.
  • Developed Spark applications using Scala and Spark-SQL for faster processing and testing.
  • Developed customized UDF's in java to extend Hive and Pig functionality.
  • Imported data from RDBMS systems like MySQL into HDFS using Sqoop.
  • Developed Sqoop jobs to perform incremental imports into Hive tables.
  • Implemented map-reduce counters to gather metrics of good records and bad records.
  • Involved in loading and transforming of large sets of structured and semi structured data.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR,EBS, RDS and VPC.
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Used Impala and Written Queries for fetching Data from Hive tables.
  • Developed Several MapReduce jobs using Java API.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming
  • Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Near Real Time Solr index on Hbase and HDFS.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
  • Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
  • Implemented the project by using Agile Methodology and Attended Scrum Meetings daily.

Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Spark-Streaming, KAFKA, Apache Solr, Cassandra, Cloudera Distribution, Java, Impala, Web Server’s, Maven Build, MySQL, AWS, Agile-Scrum.

Confidential, New York City, NY

BigData Engineer

Responsibilities:

  • Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
  • Implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
  • Implemented Custom Serializes to perform encryption using DES algorithm.
  • Developed Collections in Mongo DB and performed aggregations on the collections.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Used Spark-SQL to Load data into Hive tables and written queries to fetch data from these tables.
  • Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Experienced in writing Spark Applications in Scala and Python (Pyspark).
  • Created HBase tables and used Hbase sinks and loaded data into them to perform analytics using Tableau.
  • Created HBase tables and column families to store the user event data
  • Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD's.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Develop ETL Process using SPARK, SCALA, HIVE and HBASE .
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Developed PIG Latin scripts for the analysis of semi structured data and conducted data Analysis by running Hive queries and Pig Scripts.
  • Used codec's like snappy and LZO to store data into HDFS to improve performance.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Created HBase tables to store variable data formats of data coming from different Legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Involved in loading data from UNIX file system and FTP to HDFS
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Worked on Kerberos authentication to establish a more secure network communication on the cluster.
  • Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
  • Experience in maintaining the cluster on AWS EMR.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
  • Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1
  • Experienced in Creating ETL Mappings in Informatica.
  • Experienced in working with various Transformations like Filter, Router, Expression, update strategy etc. in Informatica.
  • Scheduled the ETL jobs using ESP scheduler.
  • Worked in Agile methodology and actively participated in daily Scrum meetings.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Hbase, MongoDB, Flume, Apache Spark, Accumulo, Oozie, Kerberos, AWS, Tableau, Java, Informatica, Elastic Search, Git, Maven.

Confidential, Fort Lauderdale, FL

Hadoop Developer

Responsibilities:

  • Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Worked on Data importing and exporting into HDFS and Hive Using Sqoop.
  • Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
  • Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
  • Responsible to Create Hive Tables, Load data into them and to write Hive queries.
  • Performing Data transformations in HIVE. written Hive queries to perform Data Analysis as per the Business Requirements.
  • Created partitions and buckets on hive tables to improve performance while running Hive queries.
  • Optimizing and performance tuning of Hive Queries.
  • Implementing Complex transformations by writing UDF's in PIG and HIVE.
  • Loading and Transforming all kinds of data like Structured, semi-structured, and Unstructured data.
  • Ingesting Log data from various web servers into HDFS using Apache Flume.
  • Implemented Flume Agents for loading Streaming data into HDFS.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Written several Map reduce Jobs using Java API.
  • Scheduled jobs using Oozie workflow Engine.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced in working with data analytics, web Scraping and Extraction of data in Python
  • Designed & Implemented database Cloning using Python and Built backend support for Applications using Shell scripts
  • Worked on various compression techniques like GZIP and LZO.
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
  • Implemented HBase on top of HDFS to perform real time analytics.
  • Handled Avro Data files using Avro Tools and Map Reduce.
  • Developed Data pipelines by using Chained Mappers.
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML,CSV etc.
  • Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
  • Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: HDFS, Map Reduce, Hive, Flume, Pig, Sqoop,Oozie,HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.

Confidential

Java Developer

Responsibilities:

  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance. Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component based development of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Developed batch jobs which will run on specified time to implement certain logic in java platform.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, Jboss application Server,Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.

Confidential

Java/J2EE Developer

Responsibilities:

  • Implemented several design patterns like Observer pattern, factory pattern, singleton pattern, facade pattern etc.
  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Interacting with the Business Analyst and Host to understating the requirements using the Agile methodologies and SCRUM meeting to keep track and optimizing the end client needs.
  • Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
  • Developed the project using MVC Design pattern.
  • Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
  • Worked with Core Java concepts like Collections Framework, multithreading, memory management.
  • Used JDBC connectivity and JDBC statements, Prepared Statements, Callable Statements for querying, inserting, updating, deleting data from Oracle databases.
  • Developed Front-end Screens using HTML, CSS, and JavaScript.
  • Developed Date Time Picker using Object Oriented JavaScript extensively.
  • Code reviews and re-factoring was done during the development and check list is strictly adhered during development.
  • Used JENKINS for continuous Integration . used Subversion as a version control system for the application.
  • Used Log4j for logging purposes and Tracing the code.
  • Client side Validations are done using JavaScript.
  • Optimized XML parsers like SAX and DOM for the production data.
  • Have good understanding of Teradata MPP architecture such as Partitioning and Primary Indexes.
  • Good knowledge in Teradata Unity, Teradata Data Mover, OS PDE Kernel internals, Backup and Recovery.
  • Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
  • Used JDBC Connections and WebSphere Connection pool for database access.
  • Developed and modified several Database Procedures, Triggers and views to implement the business logic for the application.
  • TOAD is used to monitor the turnaround times of queries and to test all the connections.
  • Prepared the test plans and executed test cases for unit, integration and system testing.
  • Developed multiple unit and integrations tests using Mockito and Easy Mock.
  • Used JIRA for reporting bugs in the application.

Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JMS, JavaScript, XSLT, HTML,CSS, SAX, DOM, XML, UML, TOAD, Mockito, Oracle, Eclipse RCP, JIRA, WebSphere, Unix/Windows.

Confidential

Junior Java Developer

Responsibilities:

  • Extensive Involvement in Requirement Analysis and system implementation.
  • Actively involved in SDLC phases like Analysis, Design and Development.
  • Responsible for Developing modules and assist in deployment as per the client’s requirements.
  • Application is implemented using JSP and servlets are used for implementing Business logic.
  • Developed utility and helper classes and Server side Functionalities using servlets.
  • Created DAO Classes and Written Various SQL queries to perform DML Operations on the data as per the requirements.
  • Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
  • Implemented User Session tracking in JSP.
  • Involved in Designing DB Schema for the application.
  • Implemented Complex SQL Queries, Reusable Triggers, Functions, Stored procedures using PL/SQL.
  • Worked in pair programming, Code reviewing and Debugging.
  • Involved in Tool development, Testing and Bug Fixing.
  • Performed unit testing for various modules.
  • Involved in UAT and production deployments and support activities.

Environment: Java, J2EE, Servlets, JSP, SQL,PL/SQL,HTML,JavaScript,CSS, Eclipse, Oracle, MYSQL, IBM Websphere,JIRA.

We'd love your feedback!