We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

2.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • Overall 9+ years of professional IT experience with 4+years of Big data consultant experience in Hadoop components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big data.
  • A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements Result oriented and hands on, skillfully balances between meeting resource and time constraints, while doing it right.
  • Hands - on development and implementation experience on Big Data Management Platform(BMP) using Hadoop 2.x, HDFS, Map Reduce, Yarn, Spark, Hive, Pig, Oozie, Sqoop and other Hadoop eco-system components as Data Storage and Retrieval systems.
  • Experience creating real-time data streaming solutions using Apache Spark core, spark SQL, Data Frames, Kafka, spark streaming and Apache Storm.
  • Strong experience in building Data-pipe lines using Big Data Technologies.
  • Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node, Data node, Resource manager, Node Manager and Job history server.
  • Hands-on experience in developing Map Reduce programs and User defines functions(UDF’s) for Hive and Pig.
  • Experience working on NoSQL technologies like HBase, Cassandra and Mango DB.
  • Proficient in big data ingestion and streaming tools like Apache Flume, Sqoop, Kafka, Storm and Spark.
  • Experience in importing and exporting data from various databases like RDBMS, MYSQL, Teradata, Oracle and DB2 into HDFS using Sqoop.
  • Proficient at using Spark API’s to explore, cleanse, aggregate, transform and store machine sensor data.
  • Exposure on usage of Apache Kafka to develop data-pipe line of logs as a stream of messages using producers and consumers.
  • Design and strong programming experience as a java developer in internet applications and client/server technologies using JAVA, J2EE, JDBC and web based development tools.
  • Excellent programming skills in C, C++, HTML, CSS, WordPress, JavaScript, SQL, PL/SQL and XML Technologies.
  • Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data.
  • Excellent at scripting for monitoring and automation using Shell and peril scripts.
  • Extensively development experience in different IDE’s like Eclipse, NetBeans and Forte.
  • Developed Hive scripts for end user / analyst requirements for ad-hoc analysis.
  • Involved in designing a data model in Hive for migrating ETL process into Hadoop.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Excellent in writing Hive UDF’s, generic UDF’s to in corporate complex business logic into Hive Queries to perform high level data analysis.
  • Built real-time Big data solutions using HBase with billions of records.
  • Experience working on various Cloudera distributions like (CDH 4/CDH 5), Knowledge of working on Horton works and Amazon EMR Hadoop distributors.
  • Worked on a live 55 Node Hadoop cluster running on CDH 4.4 and handled highly unstructured and semi structured data of 40 TB in size (replicated size of 120 TB).
  • Expertise in Oozie for configuring job work flows based on time driven and data driven.
  • Excellent in scripting on LINUX and UNIX SHELL.
  • Hands on experience with Agile and Scrum methodologies.
  • Exposed with different web application servers like Tomcat Apache Server, j2EE, JDBC, ODBC.
  • Experience leading a sizeable web-based analytics project using Tableau.
  • Successfully working on a fast-paced environment, both independently and in collaborative team environment.

TECHNICAL SKILLS:

Big Data EcoSystem: HDFS, Yarn, Map Reduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, Oozie, impala.

Hadoop Technologies: Apache Hadoop 1.x, Apache Hadoop 2.x, Cloudera CDH4/CDH5, Hortonworks.

Programming Languages: Java, Scala, c/c++, Mat Lab, Python, Scala, shell scripting, pig, HiveQL.

Java/J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts

Databases: RDBMS, MYSQL, Teradata, DB2, Oracle

NOSQL Databases: HBase, Mongo DB, Cassandra.

Web Development: HTML, J2EE, CSS, WordPress, JavaScript, AJAX, Servlet.

Application Servers: Apache Tomcat, J2EE, JDBC, ODBC.

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans and Forte.

Development Methodologies: Agile/Scrum, Waterfall.

Cloud Computing tools: Amazon AWS.

Business Intelligence Tools: Tableau, Talend

Operating Systems: WINDOWS(XP/7/8/10), MAC OS, UNIX, LINUX (Ubuntu, centos).

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Hadoop Spark developer

Responsibilities:

  • Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
  • Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Involved in using HCATLOG to access Hive table metadata from Map Reduce and Pig code.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Involved in developing Hive UDFs for the needed functionality.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Apache Solr search engine.
  • Integrated Apache Storm with Kafka to perform to perform web analytics. Uploaded streaming data from Kafka to Hdfs, HBase and Hive by integrating with storm.
  • Performed real time analysis on the incoming data.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in managing and reviewing Hadoop log files
  • Developed data pipeline using Flume, Sqoop, pig and java Map Reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in emitting processed data from Hadoop to relational databases and external file systems using Sqoop.
  • Orchestrated hundreds of Sqoop scripts, pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Loaded cache data into HBase using Sqoop.
  • Experience in custom Talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Created lots of external tables on Hive pointed to HBase tables.
  • Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
  • Worked with cache data stored in Cassandra.
  • Injected the data from External and Internal Flow Organizations.
  • Used the external tables in Impala for data analysis.
  • Map Reduce Programs those are running on the cluster.
  • Participated in apache Spark POCS for analyzing the sales data based on several business factors participated in daily scrum meetings and iterative development.

Environment: Hadoop 2.x, Map Reduce, Hdfs, Pig, Hive, HBase, Impala, Sqoop, Flume, Oozie, Apache Spark, Java,Kafka, Storm, Linux, SQL Server, Zookeeper, Autosys, Tableau, Cassandra.

Confidential, Long beach, CA

Big Data Hadoop Developer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Datasets will be loaded from two different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
  • Installed and configured Hive on the Hadoop cluster.
  • Worked on HBase Java API to populate operational HBase table with Key value.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Experience in developing multiple Map Reduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Storm.
  • Designed and developed Pig Latin Scripts to process data in a batch to perform trend analysis.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex Map Reduce streaming jobs using Java language that are implemented Using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behavior.
  • Developed Data Cleansing technics / UDFs using Pig scripts / Hive QL, Map/Reduce.
  • Worked on NoSQL including Mongo DB and HBase.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: Hadoop, HDFS, Pig, Pig Latin Eclipse, Hive, Map Reduce/Yarn, Java, Avro,HBase, Sqoop, Storm, LINUX, Big Data, Java, My SQL, NoSQL, Mongo DB, JSON, XML, CSV.

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Involved in the HLD design of the cluster, cluster setup and designing the application flow.
  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Experience in handling VSAM files in mainframe to move them to Hadoop using SFTP.
  • Using Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Created shell script to ingestion the files from Edge Node to HDFS.
  • Worked on creating Map Reduce scripts for processing the data.
  • Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG and PYTHON.
  • Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
  • Experience in writing HIVE JOIN Queries.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on different Big Data file formats like txt, sequence, avro, parquet and snappy compression.
  • Using Java to read the AVRO file.
  • Develop HiveQL scripts to perform the incremental loads.
  • Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Importing and Exporting Big Data in CDH in to every data analytics ecosystem.
  • Involved in data migration from one cluster to another cluster.
  • Analyzing HBase database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Creating the Hive tables and partitioned tables using Hive Index and bucket to make ease data analytics.
  • Experience in HBase database manipulation with structured, unstructured and semi-structured types of data.
  • Using Oozie to schedule the workflows to perform shell action and hive actions.
  • Experience in writing the business logics for defining the DAT, CSV files for Map Reduce.
  • Experience in managing Hadoop Jobs and logs of all the scripts. Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.

Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Map Reduce, Cloudera, NoSQL, HBase,python Shell Scripting, Linux.

Confidential, NJ

Java/J2EE Developer

Responsibilities:

  • Developed complete Business tier with Session beans.
  • Designed and developed the UI using Struts1.1 view component, JSP, HTML, CSS and JavaScript.
  • Used Web services (SOAP) for transmission of large blocks of XML data over HTTP.
  • Chart Controller Web Services - Developed a hierarchy of controllers, with generic interfaces, and default implementations, exposed as RESTful Web services, with Chart Specific parameters passed as query string.
  • Extensive experience working in Spring Framework, Struts Framework and O/R mapping Hibernate framework.
  • Implemented the database connectivity using JDBC with Oracle 9i database as backend.
  • Developed UI using JavaScript, HTML, CSS, JavaScript validations and XML.
  • Defined and implemented web service security.
  • Modeled and automated the End to End Continuous Integration/Deployment/Delivery pipeline which included building a Continuous Integration server utilizing tools like Jenkins, Ivy, Nexus, maven, Jira, Subversion, Git, Ant, Selenium, and Sonar.
  • Application was designed and developed using the spring framework. Used Spring Dependency Injection to inject required services
  • Wrote python scripts to parse XML documents and load the data in database.
  • Perform deployment of application on WebLogic6.0.

Environment: EJB2.0, Struts1.1, JSP2.0, CSS, Servlet, XML, Agile, XSLT, SOAP, JDBC, JavaScript, CVS, Log4J, JUnit, JBoss 2.4.4, Eclipse 2.1.3, Weblogic6.0, Oracle 9i.

Confidential, Boston, MA

Java Developer

Responsibilities:

  • Developing the code as per the user requirements.
  • Involved in working with SOAP web services.
  • Worked on JMS publisher/subscriber model in publishing messages from CSU to public agencies and subscribing.
  • Good knowledge on java sonar plugin.
  • Working knowledge on JPA integrated with Hibernate.
  • Involved in working with spring portliest.
  • Implemented spring batch jobs.
  • Analyzed software development process and suggested alternative technologies.
  • Implemented new Java features in the existing modules.
  • Involved in working with Java Script frameworks like Angular JS.
  • Performed system testing and regression Testing.
  • Performed database validation using Oracle SQL Profiles.
  • Used Junit as testing framework, and Maven for Project building.

Environment: Java, Servlets, JSP, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MSVisio, Eclipse, JDBC, Windows XP.

We'd love your feedback!