We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

0/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • Over 7 years of professional IT experience and experience in the Big Data ecosystem related technologies.
  • 3 plus years of experience in Big Data Technologies.
  • In depth understanding of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
  • Experience in using Hortonworks, Cloudera Hadoop ecosystems and its components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Hue, Zookeeper and flume.
  • Experience in reviewing Hadoop log files to detect node failures.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs in Java.
  • Worked with multiple file Input Formats such as TextFile, KeyValue, SequenceFile and NLine input format.
  • Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experience using Talend Integration Suite (5.0/5.5/6.1)/Talend Open Studio (5.0/5.5/6.1).
  • Extending Hive and Pig core functionality by writing custom UDF’s.
  • Experience in scheduling recurring Hadoop jobs using Apache Oozie workflows.
  • Very good understanding on NOSQL databases like mongoDB, HBase and Cassandra.
  • Worked on real-time, in-memory processing engines such as Spark, Impala and integration with BI Tools such as Tableau.
  • Good Knowledge in creating event processing data pipelines using Kafka and Spark Streaming.
  • Experienced in loading log data into HDFS by collecting and aggregating the data from various sources using Flume.
  • Experience in data management and implementation of Big Data applications using Hadoop frameworks.
  • Knowledge of the design and implementation of the Data Warehouse life cycle.
  • Knowledge of Data Warehouse/Data Mart design concepts.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
  • Experience in various programming languages like C, C++, Java/J2EE, Python, Scala, PL/SQL.
  • Expertise in RDBMS like Oracle, MS SQL Server, MySQL, Greenplum and DB2.
  • Experience in UNIX and shell scripting.
  • Experience in developing and applying Machine Learning algorithms to Big Data
  • Experience in Agile Engineering practices.
  • Good knowledge of GITHUB and Jenkins in Automated Deployments.
  • Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Languages: C, C++, Java/J2EE, Python, Scala, PL/SQL, Bash.

Big Data Technologies: HDFS, Mapreduce, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, flume, Yarn, Spark.

Data Stacks: Apache Spark, Apache Hadoop, Oracle, MySQL, MS SQL Server, Greenplum.

NoSQL Databases: Hbase, MongoDB, Cassandra.

Java&J2EE, Web Technologies: JavaScript, JSF, Ajax, JSP, Servlets, Java Beans, JDBC, EJB, JMS, HTML, XML, CSS.

OS: MS-Windows XP/7, Linux, Unix, Mac OS X.

IDEs: Eclipse, Sublime Text, Notepad++, Visual Studio, Putty.

PROFESSIONAL EXPERIENCE

Confidential, Minneapolis, MN

Big Data/Hadoop Developer

Responsibilities:

  • Developed hive queries on clickstream data to perform analysis of a Confidential user behavior on various online modules.
  • Implemented Partitioning and bucketing in Hive and optimizing the hive queries.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Extensively used Pig for data cleansing.
  • Developed Map Reduce programs for some refined queries on big data.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked with various HDFS file formats like Avro, SequenceFile, text file and various compression formats like Snappy, bzip2.
  • Loaded data into HDFS and extracted the data from Teradata into HDFS using Sqoop.
  • Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer data by date.
  • Developed Hadoop streaming Map/Reduce works using Spark.
  • Used Spark for logistic regression as well as linear regression and various machine learning algorithms.
  • Developed Spark SQL scripts to perform analysis on the data from third party vendors.
  • Experience in the field of Enterprise Data Warehousing (EDW) and Data Integration.
  • Developed a GraphX solution using Spark to inter-relate several users based on their behavior and different id’s.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Exported data from Kafka topics to HDFS files in a variety of formats and integrated with Hive to make data immediately available for querying with HiveQL using HDFS connector.
  • Used Oozie to automate/schedule business workflows which invoke HiveQL, Sqoop, MapReduce and Pig jobs as per the requirements.
  • Experienced in Building a Talend job outside of a Talend studio as well as on TAC server.
  • Developed Simple to complex Map/reduce Jobs using SQL.
  • Mentored analyst and test team for writing Hive Queries.
  • Experience in reviewing Hadoop log files to detect failures.
  • Loaded data into the cluster from dynamically generated files using Flume.
  • Developed reports on various hive tables by connecting Tableau server to Hadoop for data analytics purpose.

Environment: Hortonworks Hadoop, MapReduce, HDFS, Hive, Java, Pig, Linux, HBase, Zookeeper, Sqoop, Flume, Oozie, kafka, Talend, Tableau, Spark, Scala, PL/SQL.

Confidential, Durham, NC

Hadoop Developer

Responsibilities:

  • Handled importing of data from various data sources, performed data transformations using HAWQ, Map Reduce.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries which will run internally in map reduce way.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Designed and implemented Customization of Keys, Values, Partitioners, Combiners, InputFormats and RecordReaders in JAVA.
  • Developing Scripts and Batch Jobs to schedule various Hadoop Programs.
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Worked on complex data types Array, Map and Struct in Hive.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Analyzed JSON and XML files using Hive Built in functions and SerDe’s.
  • Transformed the log files into structured data using Hive SerDe’s and Pig Loaders.
  • Parsed JSON and XML files in PIG using Pig Loader functions and extracted meaningful information from Pig Relations by providing a regex using the built-in functions in Pig.
  • Extensively used Pig for data cleansing.
  • Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Deployed and configured Flume agents to stream log events into HDFS for analysis.
  • Familiarity in using NoSQL database, HBase on top of HDFS.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Connected Hive and Impala to Tableau reporting tool and generated graphical reports.

Environment: Pivotal HD, MapReduce, EDW, HDFS, Hive, Java, Pig, Linux, XML, JSON, HBase, Zookeeper, Sqoop, Flume, Oozie, Impala, Tableau, My SQL, putty.

Confidential, Patskala, OH

Hadoop Developer

Responsibilities:

  • Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
  • Developed efficient MapReduce programs for filtering out the unstructured data.
  • Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed unit test cases for mapper, reducer and driver classes.
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
  • Involved in developing Pig scripts.
  • Used Pig as ETL tool to do transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Experience in migrating the Data warehouse from oracle to teradata.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in moving all log files generated from various sources into Hadoop HDFS using Flume for further processing.
  • Good Knowledge of analyzing data in HBase using Hive and Pig. Experienced in defining job flows using Oozie.
  • Used Agile/Scrum method for requirements gathering.
  • Developed Java Map Reduce programs using Mahout to apply on different datasets.
  • Extensive usage of Maven for building jar files of Map Reduce programs and deployed to cluster.
  • Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.
  • Experienced in managing and reviewing Hadoop log files.
  • Configured Sentry to secure access to purchase information stored in Hadoop.
  • Involved in several POCs for different LOBs to benchmark the performance of data-mining using Hadoop.

Environment: CloudEra Hadoop, MS SQL Server, Oracle, Hadoop CDH 3/4/5, PIG, Hive, ZooKeeper, Mahout, HDFS, HBase, Sqoop, Java, Oozie, Hue, Tez, UNIX Shell Scripting, PL/SQL, Maven, Ant.

Confidential, Raleigh, NC

Application Developer J2EE

Responsibilities:

  • Developed JavaScript behavior code for user interaction.
  • Created database program in SQL server to manipulate data accumulated by internet transactions.
  • Wrote Servlets class to generate dynamic HTML pages.
  • Developed Servlets and back-end Java classes using Web Sphere application server.
  • Developed an API to write XML documents from a database.
  • Performed usability testing for the application using JUnit Test.
  • Maintenance of a Java GUI application using JFC/Swing.
  • Created complex SQL and used JDBC connectivity to access the database.
  • Involved in the design and coding of the data capture templates, presentation and component templates.
  • Part of the team that designed, customized and implemented metadata search and database synchronization.
  • Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL SQL code for procedures and functions.

Environment: Java, Web Sphere 3.5, EJB, Servlets, JavaScript, JDBC, SQL, JUnit, Eclipse IDE, Apache Tomcat 6.

Confidential

JAVA Developer

Responsibilities:

  • Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
  • Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, Junit,Tomcat 6.

We'd love your feedback!