We provide IT Staff Augmentation Services!

Bigdata Hadoop/data Developer Resume

2.00/5 (Submit Your Rating)

Arlington, VA

SUMMARY

  • Around 8 years of IT experience in a variety of industries, which includes hands on experience in Big Data Hadoop and Java development
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.
  • Good Knowledge in Machine Learning algorithms using Python and its concepts as data - preprocessing, Regression, classification etc and appropriate model selection techniques.
  • Good exposure with Agile software development process.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good understanding of Teradata, Zeppelin and SOLR.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Worked in large and small teams for systems requirement, design & development.
  • Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka, HCatalog, Impala, Datameer.

Distributed Platforms: Cloudera, Hortonworks, MapR and Apache

Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts, HL7.

NoSQL Databases: MongoDB, Cassandra, HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile/Scrum, Rational Unified Process and Waterfall

Monitoring tools: Ganglia, Nagios.

Hadoop/BigData Technologies: HDFS, Map Reduce, spark sql, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase

Version Control: Github, Bitbucket, CVS, SVN, Clear Case, Visual Source Safe

Build & Deployment Tools: Maven, ANT, Hudson, Jenkins

Database: Oracle, MS SQL Server 2005, MySQL,Teradata

PROFESSIONAL EXPERIENCE

Confidential, Arlington, VA

BigData Hadoop/Data Developer

Responsibilities:

  • Developing and maintaining a Data Lake containing regulatory data for federal reporting with big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala, Apache Hive and Cloudera distribution.
  • Developing different ETL jobs to extract data from different data sources like Oracle, Microsoft SQL Server, transform the extracted data using Hive Query Language (HQL) and load it into Hadoop Distributed file system (HDFS).
  • Fixing data related issues within the Data Lake.
  • Implementing new functionality in the Data Lake using big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala and Apache Hive based on the requirements provided by the client.
  • Communicating regularly with the business teams along with the project manager to ensure that any gaps between the client’s requirements and project’s technical requirements are resolved.
  • Developing Python scripts using Hadoop Distributed File System API’s to generate Curl commands to migrate data and to prepare different environments within the project.
  • Monitoring production jobs using Control-M on a daily basis.
  • Coordinating the Production releases with the change management team using Remedy tool.
  • Communicating effectively with team members and conducting code reviews.

Environment: Hadoop, Data Lake, Python, Hive, Cassandra, ETL Informatica, Cloudera, Oracle 10g, Microsoft SQL Server, Control-M, Linux

Confidential, San Antonio, TX

BigData Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in importing the data from different sources into HDFS using sqoop and applying transformations using Hive, spark and then loading data into Hive tables.
  • Migrated PIG scripts, MR to into Spark Data frames API and Spark SQL to improve performance.
  • Used Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Developed DF's, Case Classes for the required input data and performed the data transformations using Spark-Core.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed application using Scala as well
  • Expertise in deployment of Hadoop Yarn, Spark and Storm integration with Cassandra, ignite and Kafka etc.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Experience in using Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on the hive tables.
  • Responsible for importing the data from different sources like MYSQL databases into HDFS to save it in form of AVRO, JSON file formats.
  • Experience in importing data from S3 to HIVE using Sqoop and Kafka.
  • Good Experience working with Amazon AWS for accessing Hadoop cluster components.
  • Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Configured Hadoop clusters and coordinated with BigData Admins for cluster maintenance.

Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.

Confidential, Riverwoods, IL

Hadoop Developer

Responsibilities:

  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Developed a data pipeline using HBase and Hive to ingest, transform and analyzing customer behavioral data.
  • Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
  • Handled importing of data from machine logs using Flume.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Used with NoSQL technology (Amazon Dynodb) to gather and track event-based metric.
  • Maintenance of all the services in Hadoop ecosystem using ZOOKEPER.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Help design of scalable Big Data clusters and solutions.
  • Followed Agile methodology for the entire project.
  • Involved in review of functional and non-functional requirements.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Converting the existing relational database model to Hadoop ecosystem.

Environment: Hadoop, HDFS, pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Kafka.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Developed and Deployed Hadoop Cluster using Pig, Hive, Hbase, Oozie, Sqoop, Spark, Impala, and Kafka.
  • Worked with Sqoop to import and export data into HDFS, HIVE, and HBASE.
  • Implemented MapReduce jobs in Java for data processing.
  • Used Sqoop imports to load and transform large sets of data into HDFS from relational databases.
  • Experience in writing Map Reduce programs for Data Analysis using Java.
  • Involved in integrating Apache Kafka and Apache Storm.
  • Performed data transformation and a few pre-aggregations before storing the data onto HDFS by using Pig.
  • Created structured data from a pool of unstructured data using Spark.
  • Performed in-memory computing capacity of Spark to perform procedures such as text analysis and processing using Scala.
  • Experience working with Spark Streaming and divided data into different branches for batch processing through the Spark engine.
  • Worked on parallel processing using MapReduce and Spark.
  • Created Hive UDFs using Java.
  • Loading data from disparate data sets using Flume and Sqoop.
  • Worked on job scheduling using Oozie workflow engine.
  • Worked on installation, support, and monitoring of Hadoop clusters using Cloudera manager.
  • Worked with the Cloudera distributions
  • Worked with Apache Kafka to collect, aggregate and move large amounts of data from application servers.
  • Worked on integrating Cassandra with Elastic Search and Hadoop.
  • Involved in creating HBASE tables to store data from UNIX and NoSQL.
  • Troubleshooting and debussing runtime issues in the Hadoop ecosystem.
  • Involved in integrating algorithms into production system by working with the engineering team.

Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Sqoop, Spark, Cloudera, Kafka, Cassandra, HBase.

Confidential, Cambridge, MA

Java/Hadoop Developer

Responsibilities:

  • Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
  • Used Hibernate Framework for persistence onto oracle database.
  • Written and debugged the ANT Scripts for building the entire web application.
  • Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
  • Implemented Java Message Services (JMS) using JMS API.
  • Involved in managing and reviewing Hadoop log files.
  • Installed and configured Hadoop, YARN, Map Reduce, Flume, HDFS, developed multiple Map Reduce jobs in Java for data cleaning.
  • Coded Hadoop Map Reduce jobs for energy generation and PS.
  • Coded using Servlets, SOAP Client and Apache CXF RestAPI's for delivering the data from our application to external and internal for communication protocol.
  • Worked on Cloudera distribution system for running Hadoop jobs on it.
  • Expertise in writing Hadoop Jobs to analyze data using Map Reduce, Hive, Pig and Solr, Splunk.
  • Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.

Environment: Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.

Confidential

Java Developer

Responsibilities:

  • Designed and developed the application using agile methodology.
  • Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.
  • Wrote technical design document with class, sequence, and activity diagrams in each use case.
  • Developed various reusable helper and utility classes which were used across all modules of the application.
  • Involved in developing XML compilers using XQuery.
  • Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
  • Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
  • Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
  • Written Java classes to test UI and Web services through JUnit.
  • Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.
  • Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
  • Used Soap UI for testing the Web Services.
  • Use of MAVEN for dependency management and structure of the project
  • Create the deployment document on various environments such as Test, QC, and UAT.
  • Involved in system wide enhancements supporting the entire system and fixing reported bugs.
  • Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.
  • Done data manipulation on front end using JavaScript and JSON.

Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.

We'd love your feedback!