Sr. Big Data Hadoop Consultant Resume
MN
PROFESSIONAL SUMMARY:
- 7+ years of professional experience in Requirements Analysis, Design, Development and Implementation of Java, J2EE and Big Data technologies.
- 4+ years of exclusive experience in Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In - depth understanding of MapReduce Framework and Spark execution framework.
- Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce, Spark and Hive.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage new Hadoop features.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using Apache Sqoop.
- Worked on Java HBase API for ingestion processed data to HBase tables
- Strong experience in working with UNIX/LINUX environments, writing shell scripts.
- Good knowledge and experience of Real time streaming technologies Spark and Kafka.
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver the best results.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design p Confidential erns.
- Sound knowledge of J2EE architecture, design p Confidential erns, objects modeling using various J2EE technologies and frameworks.
- Adept at creating Unified Modeling Language (UML) diagrams such as Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Rational Rose and Microsoft Visio.
- Extensive experience in developing applications using Java, JSP, Servlets, JavaBeans, JSTL, JSP Custom Tag Libraries, JDBC, JNDI, SQL, AJAX, JavaScript and XML.
- Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
- Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Experience in writing test cases in Java Environment using JUnit.
- Hands on experience in development of logging standards and mechanism based on Log4j.
- Experience in building, deploying and integrating applications with ANT, Maven.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Demonstrated technical expertise, organization and client service skills in various projects undertaken.
TECHNICAL SKILLS:
Operating Systems: UNIX, Linux, RHEL, Ubuntu, Windows XP,7
Programming Languages: Java, C, C++, Python, R
Visualization: MS Excel, R, SAS and Tableau.
RDBMS: Oracle, SQL, PL/SQL
Big Data ecosystems: Hadoop 1.x, 2.x, Map Reduce, Spark, Hive, Pig, Sqoop, Flume, Oziee, Zookeeper
NoSql: HBase 0.94, Cassandra 2.0
Distribution Systems: Cloudera, Hortonworks, MapR
Java Technologies: Servlets, JSPs, JavaBeans, JDBC
Scripting: Unix shell
Databases: Teradata, Oracle 10g/11i,My SQL
Other Tools: MSTFS, HPSD,GIT for Version control.
Protocols: TCP/IP, HTTP, FTP, SFTPMachine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, SVM, Random Forests, Boosting, Bagging, Factor Analysis, Neural Networks, Deep Learning, Probabilistic Graphical Models.
Configuration Management: Puppet, Chef
WORK EXPERIENCE:
Confidential, MN
Sr. Big Data Hadoop Consultant
Responsibilities:
- Is Responsible for building and driving alignment to an Enterprise Reference Architecture.
- Lead discussions with client leadership explaining architecture options and recommendations.
- Responsible for managing risk within an organization through a reusable framework and process
- Responsible for creating design p Confidential erns and best practices for data and the integration of data through the enterprise.
- Responsible for setting standards and providing governance through a process or framework, such as an Architecture Review Board.
- Design sustainable and optimal database models to support the client's business requirements.
- Advise clients on database management, integration and BI tools. Assist in performance tuning of databases and code.
- Mentor development team members in design and development of complex ETL and BI implementations.
- Perform architectural assessments of the client's Enterprise Data Warehouse (EDW).
- Work with the sales team to create and deliver client proposals and demonstrations.
- Understanding the existing rules of analyzing risk and develop a strategy (ETL) to reduce false positives.
- Preparation of the estimates, time lines of the deliverables and project execution plan.
- Analysis of the data sources.
- To determine the ETL jobs design flow for the (Extract, transform, load, reconciliation process etc.) and Data quality reports. parsing XML files, extensively used java API connectors.
- Adhering to cleanup standards using Data Clean.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop
- Responsible for the implementation and ongoing administration of Big Data platforms on Hortonworks.
- Worked with the Hortonworks support for resolving the issues.
- Installed and configured Flume, Hive, Pig, Sqoop, HBase on the Hadoop cluster.
- Implemented workflows using Apache Oozie framework to automate tasks.
- Is responsible for transferring data from/to Relational Database and the Hadoop Ecosystem using Apache Sqoop.
- Managing and scheduling Jobs on a Hadoop cluster.
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Worked on installing cluster, commissioning & decommissioning of data node, namenode recovery, capacity planning, and slots configuration.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
- Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs
- Created tables in HBase to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible for managing data coming from different sources.
- Installed and configured Hive and also written Hive UDFs.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Experience in managing and reviewing Hadoop log files.
- Implemented & migrated windows power shell scripting to IBM web sphere Data stage jobs
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Used agile methodologies (Scrum) for software development. Involved in daily status meetings and team code reviews.
- Involved in story writing's for the Scope items.
Environment: Oracle 10g, Perl, Unix shell scripting, Unix shell scripting python, PL/SQL . Hadoop, Hadoop distribution of Hortonworks, HIVE, Pig, UNIX, Data Stage 8.1/9.1, DB2, My Sql, Teradata, Mainframe, BTEQ, Sql Server, Netezza, XML, Win SCP, Command Centre, Star team, SQL developer, Squirrel, RDC, Bridger XG, choice point, humming bird, putty, IPMS, IQMS, NORKOM, Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL, Control-M
Confidential, Charlotte, NC
Sr. Hadoop/ Spark Developer
Responsibilities:
- Collaborate with the Internal/Client BA's in understanding the requirement and architect a data flow system.
- Developed complete end to end Bigdata processing in hadoop eco system.
- Optimized hive scripts to use HDFS efficient by using various compression mechanisms.
- Developed Apache Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Responsible for batch processing of data sources using Apache Spark.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.
- Writing UDF/MapReduce jobs depending on the specific requirement.
- Created Java algorithms to find the mortgage risk factor and credit risk factors.
- Created algorithms for all complex Map and reduce functionalities of all Mapreduce programs.
- Testing all the month end changes in DEV, SIT and UAT environments and getting the business approvals to perform the same in Production.
- Successfully migrated Hadoop cluster of 120 edge nodes to other shared cluster (HaaS •Hadoop as a service) and setup the environments (DEV, SIT and UAT) from scratch.
- Worked in writing shell scripts to schedule the Hadoop process in Autosys by creating JIL files.
- Worked in writing SPARK sql scripts for optimizing the query performance.
- Converted all the VAP processing from Netezza and implemented by using SPARK data frames and RDD's.
- Extensively worked in code reviews and code remediation’s to meet the coding standards.
- Written sqoop scripts to import and export data in various RDBMS systems.
- Written Apache PIG scripts to process unstructured data and available to process in Apache Hive.
- Created Hive schemas using performance techniques like partitioning and bucketing.
- Used SFTP to transfer and receive the files from various upstream and downstream systems.
- Configured Unix service id's and AD groups in all the environments (DEV, SIT, UAT and PROD) to access the resources based on the AD groups.
- Developed Oozie workflow jobs to execute hive, pig, sqoop and mapreduce actions.
- Developed Autosys jil scripts for defining, scheduling and monitoring jobs (Unix shell scripts)
- Involved in complete end to end code deployment process in Production.
- Prepared automated script to deploy every month end code changes in all the environments.
- Worked on all the CDH upgrades and did the regression testing.
- Worked in exporting data from Hive tables into Netezza database.
- Implemented all VAP processing's in hive tables.
- Worked with Hadoop administration team for configuring servers at the time of cluster migration.
- Responsible to business and clients on every month job schedules and change requirements to validate the data.
- Responsible for all the SLA meet times to make sure the Hadoop job's run in time.
- Co-ordinate with offshore team to explain the business requirements and prepare the code changes for every month end releases.
Environment: CDH 5.8.3, HDFS, SPARK, Pig, Hive, Beeline, Sqoop, Map Reduce, Oozie, Putty, HaaS(Hadoop as a Service), Java 6/7, Netezza, SQL Server 2012, Sub Version, Toad, Teradata, Oracle 10g, YARN, UNIX Shell Scripting, Autosys, Agile Methodology, JIRA, Version One.
Confidential, Dallas, TX
Hadoop Developer
Responsibilities:
- Worked on Sqooping the tables from various Databases like TERADATA (Customer Data Warehouse), DB2 and SQL SERVER to Hadoop File System
- Developed shell scripts to automate the ingestion and deploy the tables for both snapshots and deltas.
- Implemented custom delta processing strategy for ingestion of deltas for which Apache sqoop does not support deltas (i Confidential emental data) on composite.
- Developed automated Teradata TPT scripts to extract data from Teradata database and store in the form of text files
- Ingested 100 tb of frozen data to hadoop.
- Developed Spark scripts by using Scala programming language as per the requirement.
- Good understanding on spark transformations and actions.
- Migrated MapReduce programs into Spark transformations using Spark and Scala
- Worked on design, development of business rule engine using Hortonworks Hadoop platform (Hive, HCatalog, HDFS) and Java.
- Load cassandra data into Spark RDD, perform transformations and load data to hadoop using python programming language.
- Developed Shell Script to perform Data Profiling on the ingested data with the help of hive bucketing.
- Worked with different File Formats like TEXTFILE, CSV and AVRO for HIVE querying and processing.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS.
- Developed Wrapper scripts for SQOOP Ingestion and Hadoop Copy Merge.
- Worked on Hive joins to produce the input data set to the Qlikview model.
- Developed HiveQL jobs in sparksql using python programming.
- Involved in creating Dashboard that shows count of files that were ingested and number of failures on daily basis which will be very helpful for Run Team.
- Installed and configured Tableau Desktop to connect to Hortonworks Hive Framework which contains the Bandwidth data form the locomotive through the Hortonworks ODBC connector for further analytics of the data.
- Worked on Integrating Tableau with HiveServer2 using ODBC Driver with LDAP Security.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Implemented Kafka•Spark streaming integration
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
ENVIRONMENT: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Scala, AWS, SQL, PIG, Zookeeper, Scala, Spark 1.6.2, Tableau, Sqoop, Flume, Teradata, Servlets, JDBC, JSP, JavaScript, Eclipse, CVS, CSS, Xml, Json
Confidential
Java Developer
Responsibilities:
- Involved in design and requirements gathering for project for further improvements and enhancements in the current site.
- Development of application using J2EE, Hibernate, Web Services (SOAP & REST), Oracle, Maven technologies.
- Developed using new features of Java 1.7 Annotations, Generics, enhanced for loop and Enums.
- Used spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.x
- Involved in implementation of enterprise integration with Web Services and Legacy Systems using SOAP, and REST.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Involved in using JPA (Java Persistence API) frameworks and APIs such as JDO (Java Data Objects) and Hibernate.
- Modified the Spring Controllers and Services classes so as to support the introduction of spring framework.
- Creating GUI using JSP, java Script, Angular Js and jQuery.
- Developed Angular custom directives for showing outage locations in ArcGIS Map, maintenance locations etc.
- Utilized "agile" process to streamline development process with iterative development and that includes daily scrum with team.
- Implemented the persistence layer using Hibernate and configured Hibernate with spring to interact with the Database from the DAO.
- Front-end development using JSF, JSP, HTML and wrote custom tags.
- Reviewed the XML logical data model and developed XML schema (XSD) to validate the model and used Jax-B for Xml-Java Mapping and Xml-Xslt conversion.
- Developed server tier using EJB, JMS, Web Services and Spring modules.
- Responsible configuring Log4j configuration for the application.
- Designed and developed web based UI using JSP, Struts Taglibs and developed action classes to handle the user request.
- Wrote POJO Classes, hbm files for Hibernate Object-To-Relational Mapping.
- Database development required creation of new tables, PL/SQL stored procedures, functions, views, indexes and constraints, triggers and required SQL tuning to reduce the response time in the application.
- Focused on Test Driven Development; thereby creating detailed JUnit tests for every single piece of functionality before actually writing the functionality.
- Worked on Spring Quartz functionality for scheduling tasks such as generating monthly reports for customers and sending those mails about different policies.
Environment: Java7, Core Java, Eclipse 3.3, JSF, HTML, Oracle 10g, Spring, Hibernate, Ajax, XML, JBOSS6.0, RESTAPI, HTML, JSP 2.1, WSDL, SOAP, Log4j 1.3, d3.js, JUnit,, NoSQL/Mongo DB, UML, JMS, Active MQ, Rabbit MQ EJB, JSP, HTML5, JavaScript, Struts, Design P Confidential erns, Servlets, Spring, XML, ANT, Eclipse, IBM RAD, Easy Mock framework(JUNIT), PL/SQL, Drools, Triggers, Stored Procédures, AWS, SVN, Web Services, Websphere JAX-WS, JAX-RS, Log4J, CSS3.