Big Data/hadoop Developer Resume
Minneapolis, MN
SUMMARY
- Over 7 years of professional IT experience and experience in the Big Data ecosystem related technologies.
- 3 plus years of experience in Big Data Technologies.
- In depth understanding of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Experience in using Hortonworks, Cloudera Hadoop ecosystems and its components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Hue, Zookeeper and flume.
- Experience in reviewing Hadoop log files to detect node failures.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs in Java.
- Worked with multiple file Input Formats such as TextFile, KeyValue, SequenceFile and NLine input format.
- Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience using Talend Integration Suite (5.0/5.5/6.1)/Talend Open Studio (5.0/5.5/6.1).
- Extending Hive and Pig core functionality by writing custom UDF’s.
- Experience in scheduling recurring Hadoop jobs using Apache Oozie workflows.
- Very good understanding on NOSQL databases like mongoDB, HBase and Cassandra.
- Worked on real-time, in-memory processing engines such as Spark, Impala and integration with BI Tools such as Tableau.
- Good Knowledge in creating event processing data pipelines using Kafka and Spark Streaming.
- Experienced in loading log data into HDFS by collecting and aggregating the data from various sources using Flume.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Knowledge of the design and implementation of the Data Warehouse life cycle.
- Knowledge of Data Warehouse/Data Mart design concepts.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Experience in various programming languages like C, C++, Java/J2EE, Python, Scala, PL/SQL.
- Expertise in RDBMS like Oracle, MS SQL Server, MySQL, Greenplum and DB2.
- Experience in UNIX and shell scripting.
- Experience in developing and applying Machine Learning algorithms to Big Data
- Experience in Agile Engineering practices.
- Good knowledge of GITHUB and Jenkins in Automated Deployments.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Languages: C, C++, Java/J2EE, Python, Scala, PL/SQL, Bash.
Big Data Technologies: HDFS, Mapreduce, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, flume, Yarn, Spark.
Data Stacks: Apache Spark, Apache Hadoop, Oracle, MySQL, MS SQL Server, Greenplum.
NoSQL Databases: Hbase, MongoDB, Cassandra.
Java&J2EE, Web Technologies: JavaScript, JSF, Ajax, JSP, Servlets, Java Beans, JDBC, EJB, JMS, HTML, XML, CSS.
OS: MS-Windows XP/7, Linux, Unix, Mac OS X.
IDEs: Eclipse, Sublime Text, Notepad++, Visual Studio, Putty.
PROFESSIONAL EXPERIENCE
Confidential, Minneapolis, MN
Big Data/Hadoop Developer
Responsibilities:
- Developed hive queries on clickstream data to perform analysis of a Confidential user behavior on various online modules.
- Implemented Partitioning and bucketing in Hive and optimizing the hive queries.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Extensively used Pig for data cleansing.
- Developed Map Reduce programs for some refined queries on big data.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked with various HDFS file formats like Avro, SequenceFile, text file and various compression formats like Snappy, bzip2.
- Loaded data into HDFS and extracted the data from Teradata into HDFS using Sqoop.
- Developed Sqoop scripts to import and export the data from relational sources and handled incremental loading on the customer data by date.
- Developed Hadoop streaming Map/Reduce works using Spark.
- Used Spark for logistic regression as well as linear regression and various machine learning algorithms.
- Developed Spark SQL scripts to perform analysis on the data from third party vendors.
- Experience in the field of Enterprise Data Warehousing (EDW) and Data Integration.
- Developed a GraphX solution using Spark to inter-relate several users based on their behavior and different id’s.
- Developed a data pipeline using Kafka to store data into HDFS.
- Exported data from Kafka topics to HDFS files in a variety of formats and integrated with Hive to make data immediately available for querying with HiveQL using HDFS connector.
- Used Oozie to automate/schedule business workflows which invoke HiveQL, Sqoop, MapReduce and Pig jobs as per the requirements.
- Experienced in Building a Talend job outside of a Talend studio as well as on TAC server.
- Developed Simple to complex Map/reduce Jobs using SQL.
- Mentored analyst and test team for writing Hive Queries.
- Experience in reviewing Hadoop log files to detect failures.
- Loaded data into the cluster from dynamically generated files using Flume.
- Developed reports on various hive tables by connecting Tableau server to Hadoop for data analytics purpose.
Environment: Hortonworks Hadoop, MapReduce, HDFS, Hive, Java, Pig, Linux, HBase, Zookeeper, Sqoop, Flume, Oozie, kafka, Talend, Tableau, Spark, Scala, PL/SQL.
Confidential, Durham, NC
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed data transformations using HAWQ, Map Reduce.
- Involved in creating Hive Internal and External tables, loading data and writing hive queries which will run internally in map reduce way.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Designed and implemented Customization of Keys, Values, Partitioners, Combiners, InputFormats and RecordReaders in JAVA.
- Developing Scripts and Batch Jobs to schedule various Hadoop Programs.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Worked on complex data types Array, Map and Struct in Hive.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Analyzed JSON and XML files using Hive Built in functions and SerDe’s.
- Transformed the log files into structured data using Hive SerDe’s and Pig Loaders.
- Parsed JSON and XML files in PIG using Pig Loader functions and extracted meaningful information from Pig Relations by providing a regex using the built-in functions in Pig.
- Extensively used Pig for data cleansing.
- Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Deployed and configured Flume agents to stream log events into HDFS for analysis.
- Familiarity in using NoSQL database, HBase on top of HDFS.
- Load and transform large sets of structured, semi structured using Hive and Impala.
- Connected Hive and Impala to Tableau reporting tool and generated graphical reports.
Environment: Pivotal HD, MapReduce, EDW, HDFS, Hive, Java, Pig, Linux, XML, JSON, HBase, Zookeeper, Sqoop, Flume, Oozie, Impala, Tableau, My SQL, putty.
Confidential, Patskala, OH
Hadoop Developer
Responsibilities:
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Developed efficient MapReduce programs for filtering out the unstructured data.
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed unit test cases for mapper, reducer and driver classes.
- Developed Hive queries for data sampling and analysis to the analysts.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
- Involved in developing Pig scripts.
- Used Pig as ETL tool to do transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Experience in migrating the Data warehouse from oracle to teradata.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in moving all log files generated from various sources into Hadoop HDFS using Flume for further processing.
- Good Knowledge of analyzing data in HBase using Hive and Pig. Experienced in defining job flows using Oozie.
- Used Agile/Scrum method for requirements gathering.
- Developed Java Map Reduce programs using Mahout to apply on different datasets.
- Extensive usage of Maven for building jar files of Map Reduce programs and deployed to cluster.
- Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.
- Experienced in managing and reviewing Hadoop log files.
- Configured Sentry to secure access to purchase information stored in Hadoop.
- Involved in several POCs for different LOBs to benchmark the performance of data-mining using Hadoop.
Environment: CloudEra Hadoop, MS SQL Server, Oracle, Hadoop CDH 3/4/5, PIG, Hive, ZooKeeper, Mahout, HDFS, HBase, Sqoop, Java, Oozie, Hue, Tez, UNIX Shell Scripting, PL/SQL, Maven, Ant.
Confidential, Raleigh, NC
Application Developer J2EE
Responsibilities:
- Developed JavaScript behavior code for user interaction.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Wrote Servlets class to generate dynamic HTML pages.
- Developed Servlets and back-end Java classes using Web Sphere application server.
- Developed an API to write XML documents from a database.
- Performed usability testing for the application using JUnit Test.
- Maintenance of a Java GUI application using JFC/Swing.
- Created complex SQL and used JDBC connectivity to access the database.
- Involved in the design and coding of the data capture templates, presentation and component templates.
- Part of the team that designed, customized and implemented metadata search and database synchronization.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL SQL code for procedures and functions.
Environment: Java, Web Sphere 3.5, EJB, Servlets, JavaScript, JDBC, SQL, JUnit, Eclipse IDE, Apache Tomcat 6.
Confidential
JAVA Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, Junit,Tomcat 6.