Sr. Hadoop Developer Resume
Milwaukee, WI
PROFESSIONAL SUMMARY:
- Around 8 years IT experience in all phases of software development life cycle (SDLC) using HDFS, HBASE, Spark, Scala, Hive, YARN, cloudera.
- Improved performance of an Actor - based message processing system implemented with Akka, Scala, Java, and MongoDB.
- Mentor current staff (off-shore) on Akka, Scala, MongoDB.
- Expertise in Java/J2EE and Oracle technology products like Oracle FMW (SOA11g, SOA12c), Oracle ERP solutions.
- Exposed to Agile method of software development (SCRUM).
- In-depth knowledge on Hadoop Ecosystem components like Pig, YARN, Hive, Sqoop, HBase, Oozie, Zookeeper, Hue, Cloudera Manager, Flume.
- Hands on experience in using Sqoop for importing and exporting data from HDFS to Relational Database Systems
- Hands on knowledge in Hadoop ecosystem and its components such as Map Reduce & HDFS
- Extensive experience in building composite applications involving BPEL, Mediator, Human work flow and Business Rules components.
- Experience working in Impala
- Experience working with Cloudera Distribution
- Knowledge on Spark and Scala
- ETL / Data Warehouse experience, with SQL Server/TSQL.
- Designed the ETL specification documents to gather existing workflows information from different ETL teams And shared with Integration and production maintenance team.
- Designed the ETL runs performance tracking sheet in different phases of the project and shared with Production team.
- Superior background in object oriented development including PERL, C++, Java, Scala and shell scripting.
- Generating Scala and java classes from the respective APIs so that they can be incorporated in the overall application
- Good understanding of NOSQL database HBase, MongoDB
- Experience in configuring flume agents to transfer data from external systems into HDFS
- Experience in extending Hive and Pig core functionality by writing custom UDFs
- Hands on experience in analyzing data by using Hive query language, Pig and Map Reduce
- Knowledge on workflow scheduling and job monitoring using Oozie.
- Extensive experience in designing SOAP and REST web services
- Extensive experience in Web Service Orchestration using Oracle BPEL.
- Experience in UNIX shell scripting
- Experience in Oracle Application Modules.
- Development experience with Oracle 10g and 11g database
- Able to work effectively at all organizational levels and have an ability to manage rapid changes.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS)
- Knowledge on various types storage technologies RAID, Cloud Computing, Flash storage, E-Series system, Flash Arrays, NAS & SAN storage systems
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
- Working knowledge in SQL, Stored Procedures, Functions,Packages, DB Triggers and Indexes.
- Hands on Experience in using IDE tools like Eclipse.
- Experience working on installing and configuration of windows active directory.
- Well versed in designing and implementing Map Reduce jobs using JAVA on Eclipse to solve real world scaling problems.
TECHNICAL SKILLS:
Languages: C, C++, Java,Python scripting, Shell Scripting
Big Data Technologies: Cloudera, Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Spark, Kafka, Storm, Scala, Impala
Databases: Oracle, MySQL and SQL Server, Microsoft SQL Server 2000, MS Access
DB Languages: SQL
Operating Systems: Linux, Windows XP, Server 2003, Server 2008
Development Tools: Eclipse 3.3
Other programming Languages: HTML5,CSS 3,JAVASCRIPT,AJAX,JQUERY,.NET,Visual Studio 2010
Network protocols: TCP/IP, UDP, HTTP, DNS, DHCP, OSPF, RIP
Frameworks: MVC
PROFESSIONAL EXPERIENCE:
Confidential, Milwaukee, WI
Sr. Hadoop Developer
Responsibilities:
- Worked on analysing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Wrote Map Reduce job using Scala.
- Responsible for building scalable distributed data solutions using Hadoop. I have been experienced with kafka to ingest data into spark engine.
- Developed analytical components using scala, spark, strom, and spark streaming.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode high availability, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Setup Jenkins, build jobs to provide continuous automated builds based on polling the subversion source. control system during the day and periodic scheduled
- Configured build scripts for multi module projects with Maven and Jenkins CI.
- Deployed an Apache Solr search engine server to help speed up the search.
- Implemented authentication using Apache Sentry.
- Used Sqoop job to import the data from RDBMS using Incremental Import.
- Customized Avro tools used in MapReduce, Pig and Hive for deserialization and to work with Avro ingestion framework.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
- Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Worked with NoSQL database Hbase to create tables and store data.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Used Pig to store the data into HBase.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
- Used Pig to parse the data and Store in Avro format.
- ETL / Data Warehouse experience, with SQL Server/TSQL.
- Stored the data in tabular formats using Hive tables and Hive SerDe's..
- Designed the ETL specification documents to gather existing workflows information from different ETL teams and shared with Integration and production maintenance team.
- Designed the ETL runs performance tracking sheet in different phases of the project and shared with Production team.
- Prepared the validation report queries, executed after every ETL runs, and shared the resultant values with Business users in different phases of the project.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON, Avro data files and sequence files for log files.
Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Scala, Cloudera, Solr, Apache Sentry, Snappy, Zookeeper, NoSQL, HBase, Akka, Shell Scripting, Ubuntu, Linux Ubuntu.
Confidential, Richfield, MN
Sr. Hadoop Developer
Responsiblities:
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on the spark SQL modules of spark extensively.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries to analyze/transform the data in HDFS.
- Designed and Implemented Partitioning (Multi-level), Buckets in HIVE.
- Analyzing/Transforming data with Hive and Pig.
- Setup Cassandra with Apache Solr.
- Implemented authentication using Apache Sentry.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioural patterns.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Extensively involved in ETL code using Informatica tool in order to meet requirements for extract, cleansing, Transformation and loading of data from source to target data structures.
- Worked on various issues on existing Informatica Mappings to produce correct output and used ETL debugger extensively to identify the performance bottlenecks within the mappings.
- Specialized in creating ETL Mappings, Sessions and Workflows using various transformations like Update Transformation, Lookups, Filters, Routers and XML etc.
- Involved in running MapReduce jobs for processing millions of records.
- Scheduling all the ETL workflows for the parallel run comparison.
- Extensively involved in ETL testing, Created Unit test plan and Integration test plan to test the mappings,
- Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
- Experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, YARN, HDFS, HBase, Hive, Pig, Flume, Sqoop, and Cassandra.
- Experience in installing, configuring and maintaining the Hadoop Cluster including YARN configuration using Cloudera, Hortonworks.
Environment: Spark,scala, Hive, Pig, Map reduce, Flume, Oracle, Sqoop, Flume,Casssandra,YARN, Hadoop, Hbase
Confidential, Woodland Hills, CA
Hadoop Developer
Responsibilities:
- Responsible for business logic using java and JavaScript, JDBC for querying database.
- Involved with the application teams to install Hadoop updates, patches and version upgrades as required.
- Development experience in UNIX, LINUX and Windows (Vista, XP, NT, 2000, 95) and Cloud based virtual platforms.
- Worked on analysing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment
- Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
- Involved in peer & lead level design & code reviews.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Experience in developing the user interfaces using HTML, CSS, AJAX and JAVASCRIPT.
- Experience working on installing and configuration of windows active directory.
- Strong Knowledge on HDFS, Map Reduce and NoSQL Database like HBase.
- Experience in client side Technologies such as HTML, CSS, JavaScript, jQuery
- Responsible for writing Hive Queries for analysing terabytes of customer data from Hbase and put the results in output file.
- Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
- Responsible for the overall layout design, color scheme of the web site using HTML, bootstrap and CSS3.
- Created Server Side of application for project management using Node JS and Mongo DB.
Environment: Java 6, Jdbc, Mongo, Apache Web server, HTML, JDBC, NoSQL, meteor.js, Eclipse, UNIX, HTML, CSS, XML, Java Script, JQuery, Oracle, SQL.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and map reduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
- Used Pig tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyse data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard
- Loaded the aggregated data onto DB2 for reporting on the dashboard.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on Installing 20 node Hadoop cluster .
- Building, packaging and deploying the code to the Hadoop servers.
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Created Hbase tables to store various data formats of data coming from different portfolios
- Data processing using SPARK.
- Cluster co-ordination services through ZooKeeper.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Environment: JDK, Ubuntu Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, HBase
Confidential, River Woods, IL
Java Developer
Responsibilities:
- Involved in the analysis, design, and development phase of the application.
- Created Class, Activity, and Sequence Diagrams using IBM Rational Rose modeling tool.
- Developed the presentation layer using JSP and Servlets with a RAD tool.
- Designed web applications & web contents utilizing various SEARCH ENGINE OPTIMIZATION (SEO) techniques.
- Highly motivated, reliable analytical problem solver and troubleshooter with strong attention to detail.
- Demonstrated ability to complete projects in deadline oriented environments.
- Quick learner and proficient in solving the technical issues in the project.
- Excellent analytical and communication skills with capability to handle new technologies
- Used JavaScript for client side validations.
- Used Spring Core for middle tier development to achieve inversion of control.
- Developed Message Driven Beans for send asynchronous notification messages.
- Designed and developed numerous Session Beans and deployed on Web Sphere Application Server.
- Wrote stored procedures, complex queries using PL/SQL to extract data from the database, delete data and reload data on Oracle9i DB using the Toad tool.
- Wrote Test Cases for Unit Testing using JUnit.
- Involved in testing the complete flow of the modules.
- Used CVS for version control.
- Implemented Log4J for Logging Errors, debugging and tracking.
Environment: JSP, Servlets, Spring Core, EJB, JMS, Spring, AJAX, Oracle, XML, XSLT, HTML, CSS, WebSphere, UML, RAD, TOAD, PL/SQL, JUnit, Apache Ant, CVS, Log4j.
Confidential
Java Developer
Responsibilities:
- Analysed the system and gathered the system requirements.
- Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
- Developed the web tier using JSP to show account details and summary.
- Designed and developed the UI using JSP, HTML and JavaScript.
- Utilized JPA for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.
- Used Tomcat web server for development purpose.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Developed application using Eclipse.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Interacted with Business Analyst for requirements gathering.
Environment: Java, J2EE, JUnit, XML, JavaScript, Log4j, CVS, Eclipse, Apache Tomcat, and Oracle.