We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

St Louis, MO

SUMMARY:

  • Hadoop Developer: Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, Spark, Data frames, HBase and MapReduce programming. Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive . Developed Spark code and Spark - SQL/Streaming for faster testing and processing of data.
  • Hadoop Distributions: Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks . Good Knowledge on MAPR distribution.
  • Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real time data to Hadoop using Kafka and also worked on Flume. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
  • Scripting and Reporting: Created scripts for performing data-analysis with PIG, HIVE and IMPALA . Used the ANT script for creating and deploying .jar, .ear and .war files. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Java Experience: Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
  • Interface Design: Created front end user interface using HTML, CSS and JavaScript along with validation techniques. Implemented Ajax toolkit for validation with GUI.
  • Methodologies: Handful experience in working with different software methodologies like Water fall and agile methodologies.
  • No SQL Databases: Worked with NoSQL such as HBase, MongoDB, Cassandra etc.
  • AWS: Planned, deployed, and maintained Amazon AWS cloud infrastructure consisting of multiple nodes and also Involved in deploying the applications in AWS.

TECHNICAL SKILLS:

Languages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.

Big Data Technologies: Apache Hadoop,HDFS, Spark, HIVE, PIG,Talend,HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Solr, Kafka, Storm, Cassandra, Impala, HUE, tez, MongoDB, Scala.

Java Technologies: JSE: JAVA architecture, OOPs conceptsJEE:JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services

Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON

Databases/NO SQL: MS SQL Server, MySQL, HBase, Oracle, MS Access, Teradata, oracle

PROFESSIONAL EXPERIENCE

Confidential, St.louis, MO

Hadoop/Spark Developer

Responsibilities:

  • Analyze and define researcher’s strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS .
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Wrote Map Reduce jobs using Java API and Pig Latin Optimized Hive QL / pig scripts by using execution engine like Tez, Spark.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Worked extensively on spark,MLlib to develop a Logical regression model on operational Data.
  • The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Worked on extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs Cassandra implementation using Datastax Java API.
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Imported data from various resources to the Cassandra cluster using Java APIs.
  • Configured Performance Tuning and Monitoring for Cassandra Read and Write processes for fast I/O operations and low latency time. used Java API and Sqoop to export data into DataStax Cassandra cluster from RDBMS.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Experience in Data modelling using Cassandra.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data files.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in making code changes for a module in turbine simulation for processing across the cluster using spark-submit.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
  • Worked on high performance computing (HPC) to simulate tools required for the genomics pipeline.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.

Environment: HDP 2.3.4, Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA, Java, Scala, Web Server’s, Maven Build and SBT build.

Confidential, Cincinnati, OH

Hadoop Developer

Responsibilities:

  • Used Cloudera distribution for hadoop ecosystem.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Also used Spark SQL to handle structured data in Hive.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Defined the Accumulo tables and loaded data into tables for near real-time data reports.
  • Created the Hive external tables using Accumulo connector.
  • Written Hive UDFs to sort Structure fields and return complex data type.
  • Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR) and setting up environments on Amazon AWS EC2 instances.
  • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real- time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Involved in utilizing HCATALOG to get to Hive table metadata from MapReduce or Pig code.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Creating files and tuned the SQL queries in Hive utilizing HUE.
  • Experience working with Apache SOLR for indexing and querying.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
  • Acted for bringing in data under HBase using HBase shell also HBase client API.
  • Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging,job scheduling and Error Handling.
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2.

Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hbase, Apache Spark,Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, Talend, HUE, HCATALOG, Flume, Solr, Git,Maven.

Confidential, Lincolnshire,IL

Hadoop/ETL Developer

Responsibilities:

  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop .
  • Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate particular visualizations using Tableau.
  • Map reduce program and adding external jars for the Map-Reduce Program.
  • Involved in loading data from UNIX file system to HDFS .
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on JVM performance tuning to improve Map-Reduce jobs performance

Environment: Hadoop, MapReduce, HDFS, Hive, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, teradata,Tomcat 6.,Tableau.

Confidential

JAVA/ETL Developer

Responsibilities:

  • Developed Maven scripts to build and deploy the application.
  • Developed Spring MVC controllers for all the modules.
  • Implemented JQuery validator components.
  • Extracted data from Oracle as one of the source databases.
  • Using Data stage ETL tool to copy data from Teradata to Netezza
  • Created ETL Data mapping spreadsheets, describing column level transformation details to load data from Teredata Landing zone tables to the tables in Party and Policy subject area of EDW based on SAS Insurance model.
  • Used JSON and XML documents with Marklogic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
  • Built data transformation with SSIS including importing data from files.
  • Loaded the flat files data using Informatica to the staging area.
  • Created SHELL SCRIPTS for generic use.

Environment: Java, Spring, MPP, Windows XP/NT, Informatica Power center 9.1/8.6, UNIX, Teradata, Oracle Designer, Autosys, Shell, Quality Center 10.

Confidential

Java Developer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Implemented Spring IoC framework
  • Developed Spring REST services for all the modules.
  • Developed custom SAM L and SOAP integration for healthcare.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Used DAO and JDBC for database access.
  • Built responsive Web pages using Kendo UI mobile.
  • Designed dynamic and multi-browser compatible pages using HTML, CSS, JQuery, JavaScript, Require Js and Kendo UI.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, Java, JSP, JDBC, JavaScript, MySQ L, Eclipse IDE, Rest.

Confidential

Jr. Java Developer

Responsibilities:

  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Implemented Web Service using SOAP protocol using Apache Axis.
  • Requirement gatherings from various parties involved in the project
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Monitoring of test cases to verify actual results against expected results.
  • Carrying out Regression testing to track the problem tracking.

Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS, Soap.

We'd love your feedback!