Hadoop/ Spark Developer Resume
Carmel, IN
SUMMARY
- Around 6+ years of professional experience in Information Technology includes 3+ years in Big Data and Hadoop Ecosystem related technologies.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Excellent understanding / knowledge of Hadoop architecture and various components such as Big Data and Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop Map Reduce programming paradigm.
- Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
- Expertise in Hadoop, MapReduce, Spark - Scala, YARN, Spark Stream, Hive, Pig, HBase, Kafka, Cassandra, Oracle.
- Hands on experience on Hadoop ecosystem- HDFS, Map Reduce, Hive, Pig, Oozie, Flume and Zookeeper.
- Writing ETL procedures to move large data sets from legacy systems into the new system schema using Oracle SQL Developer.
- Creating ad-hoc reports usingSQLwrittenSQLDeveloper.
- Hands on experience with major components in Hadoop Ecosystem including Flume, Kafka, Oozie, Zookeeper and MapReduce frameworks, Cassandra.
- Expertise in Designing and developing a distributed processing system running into a Data Warehousing platform for reporting.
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in loading streaming log data from various web servers into HDFS using Flume.
- Performed data analytics using PIG and Hive for Data Architects and Data Scientists within the team.
- Expertise in writing Map Reduce Programs and UDFs for both Hive and Pig in JAVA.
- Expertise in job workflow scheduling and monitoring tools like Oozie, external schedulers like Autosys, cronjobs.
- Worked on NOSQL database like HBase and Cassandra.
- Experience in transferring data between HDFS and Relational Database with Sqoop
- Experience in writing numeroustestcases using JUnit framework.
- Strong Knowledge on full Software Development life cycle-Software analysis, design, architecture, development and maintenance.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent Analytical, problem solving and communication skills with the ability to work as a part of the team as well as independently.
- Good understanding of distributed systems and parallel processing architectures.
TECHNICAL SKILLS
Programming Languages: Java, C, SQL, HQL, Scala, Pig Latin.
Big Data Technologies: HDFS, Hive, Impala, Map Reduce, Pig, Sqoop, Oozie, kafka, Zookeeper, YARN, Avro, Spark.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript.
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate.
Build and Deployment(CICD): Apache Maven, Jenkins, Github, SVN, Nexus, Puppet
Databases: Oracle 11g, MySQL, MS SQL Server, Teradata.
NoSQL Databases: HBase, Casaandra, MongoDB.
IDE: Eclipse, Netbeans, JBuilder.
RDBMS: MS Access, MS SQL Server, MySQL, IBM DB2, PL/SQL.
Operating Systems: Linux, Windows, Mac
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP.
PROFESSIONAL EXPERIENCE
Confidential, Carmel, IN
Hadoop/ Spark Developer
Responsibilities:
- Involved in the abnormal state outline of the Hadoop 2.6.3 engineering for the current information structure and Problem articulation and setup another group and arranged the whole Hadoop stage.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Data Interface to get data of clients utilizing Rest API and Pre-Process information utilizing MapReduce 2.0 and store into HDFS (Hortonworks).
- Extracted documents from MySQL, Oracle, and Teradata through Sqoop 1.4.6 and set in HDFS Cloudera Distribution and handled.
- Worked with different HDFS document groups like Avro 1.7.6, Sequence File, Json and different pressure positions like Snappy, bzip2.
- Developed effective MapReduce programs for sifting through the unstructured information and built up numerous MapReduce occupations to perform information cleaning and preprocessing on Cloudera.
- Involved in changing over MapReduce employments into start changes in Spark RDD's utilizing Python.
- Continuous checking and dealing with the Hadoop group utilizing Ambari.
- Used Pig to perform information approval on the information ingested utilizing Sqoop and Flume and the washed down informational index is pushed into Hive.
- Collecting and conglomerating a lot of log information utilizing Apache Spark and arranging information in HDFS for assist investigation.
- Designed and constructed the Reporting Application, which utilizes the Spark SQL to get and create writes about HBase table information.
- Developed custom Unix SHELL contents to do pre-post approvals of ace and slave hubs, when arranging the name hub and information hubs separately.
- Developed information pipeline utilizing Kafka to ingest behavioral information, utilized Spark Streaming for information
- Configured a 7 hubs Kafka platform with 2 Web servers, 3 Kafka dealers and 2 Kafka shoppers Spark Streaming (Data Frames) with 2 Zookeeper hubs, where Kafka agents ready to maintain 1 million composes (message) every second.
- Develop Pentaho Kettle Graphs to wash down and change the crude information into valuable data and load it to Kafka Queue's and further stacked to HDFS, Neo4j database for UI group to show it utilizing the web application.
- Automated the procedure for extraction of the information from distribution centers and weblogs into Hive tables by creating work processes and facilitator employments in Oozie.
- Developed little circulated applications in our tasks utilizing Zookeeper 3.4.7 and planned the work processes utilizing Oozie 4.2.0.
Environment: Hadoop, HDP, Hive, Oozie, Hortonworks Sandbox, Java, Eclipse LUNA, Zookeeper, JSON file format, Spark.
Confidential, Dallas, TX
Hadoop / Admin Developer
Responsibilities:
- Imported information utilizing Sqoop to stack information from MySQL to HDFS on standard premise.
- Involved in gathering and collecting a lot of log information utilizing Apache Flume and organizing information in HDFS for assist investigation.
- Push information from Amazon S3 stockpiling to Redshift utilizing Key, Value combines as required by BI group.
- Processed information utilizing Athena on S3 took a shot at door hubs and connectors (Jar records) interfacing sources with AWS cloud.
- Developed proficient MapReduce programs for sifting through the unstructured information and built up different MapReduce occupations to perform information cleaning and preprocessing on EMR.
- Collected and amassed a lot of web log information from various sources, for example, webservers, portable and system gadgets utilizing Apache Kafka and put away the information into HDFS for investigation.
- Developed various Kafka Producers and Consumers sans preparation actualizing according to association's necessities.
- Setup Flume for various sources to convey the log messages from outside to Hadoop HDFS.
- Responsible for making, changing themes (Kafka Queues) as and when required with differing setups including replication components, parcels and TTL.
- Wrote and tried complex MapReduce occupations for totaling distinguished and approved information.
- Created Managed and External Hive tables with static/dynamic partitioning.
- Written Hive questions for information examination to meet the Business prerequisites.
- Increased execution of the HiveQLs by part bigger questions into little and by presenting impermanent tables in the middle of them.
- Extensively engaged with execution tuning of the HiveQL by performing bucketing on vast Hive tables
- Used open source web scratching structure for python to slither and extricate information from site pages.
- Optimized the Hive questions by setting distinctive mixes of Hive parameters.
- Developed UDF's(User Defined Functions)to expand center usefulness of PIG and HIVE questions according to necessity.
- Extensive involvement in composing Pig contents to change crude information from a few information sources into framing standard information.
- Implemented work process utilizing Oozie for running Map Reduce employments and Hive Queries.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Cloudera HDFS, Eclipse.
Confidential, Houston, TX
Big Data Developer
Responsibilities:
- Developed Java web benefits as a major aspect of utilitarian necessities.
- Installed and arranged Hadoop and in charge of keeping up bunch and overseeing and evaluating Hadoop log documents.
- Supported in setting up QA condition and refreshing designs for executing contents with Pig and Sqoop.
- Developed MapReduce programs in Java for Data Analysis and stacked information from different information sources into HDFS.
- Worked widely on Cloudera to examine information introduce over HDFS utilizing Hive and Pig.
- Created Pig Latin contents to sort, gathering, join and channel the endeavor astute information.
- Worked on expansive arrangements of organized, semi-organized and unstructured information.
- Use of Sqoop to import and fare information from Oracle RDBMS to HDFS and the other way around.
- Involved in making Hive tables, stacking with information and composing hive questions which will run inside as MapReduce occupations.
- Coordinated with business customers to gather business requirements.
Environment: Hadoop, Hive, Map Reduce, Pig, SQOOP, MYSQL, Hbase, Flume, Spark, Scala, Hortenworks- Sandbox.
Confidential
Data Engineer
Responsibilities:
- Worked with a group to assemble and investigate the customer necessities.
- Analyzed vast informational collections appropriated crosswise over group of product equipment.
- Connecting to Hadoop cluster and Cassandra ring and executing test programs on servers.
- Hadoop and Cassandra as a major aspect of Next age stage execution.
- Developed a few progressed MapReduce projects to process got information documents.
- Handled bringing in of information from different information sources, performed changes utilizing Hive, MapReduce, stacked information into HDFS and Extracted the information from Oracle into HDFS utilizing Sqoop.
- Load the OLTP models and Perform ETL to stack Dimension information for a Star Schema.
- Built-in Request manufacturer, created in Scala to encourage running of situations, utilizing JSON arrangement records.
- Analyzed the information by performing Hive inquiries and running Pig contents to think about client conduct.
- Developed work process in Oozie to mechanize the errands of stacking the information into HDFS and pre-preparing with Pig.
- Developed the Pig UDF's to pre-process the information for examination and Migrated ETL tasks into Hadoop framework utilizing Pig Latin contents and Python Scripts 3.6.2.
- Used Pig as ETL apparatus to do changes, occasion joins, sifting and some pre-accumulations before putting away the information into HDFS.
- Worked on making MapReduce projects to parse the information for guarantee report age and running the Jars in Hadoop. Facilitated with Java group in making MapReduce programs.
- Implemented the venture by utilizing Spring Web MVC module.
- Responsible for overseeing and checking on Hadoop log records. Planned and created information administration framework utilizing MySQL.
- Worked with the application owners to understand the business requirements and mapped into technical requirements.
Environment: T-SQL, MSSQLServer 2014/2012, Visual Studio (2012), BIDS SSIS, SSRS, Autosys, Team Foundation Server (TFS), Version One, SharePoint.
Confidential
Java Developer
Responsibilities:
- Developed technical design documents and create a prototype of the critical business application using JAVA/J2EE Initiated use of Http Unit, Selenium IDE for testing.
- Design, develop, and implement rich user interfaces for complex web based systems using the frameworks like JSF.
- Also worked on Simple Network Management Protocol (SNMP) is an Internet standard protocol for managing devices on IP networks". Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks and more.
- Also worked on network management stations.(NMS)
- Analyzed and identified components for the Presentation, Business, Integration, Resource and Service Layers.
- Developing an Administration Portal using HTML5, node JS, JQuery, Java Script Frameworks like BackBone JS and requires.
- Working with process owners and business stakeholders to translate business requirements into functional requirements within Service Now.
- Generated server side PL/SQL scripts for data manipulation and validation and materialized views for remote instances.
- Working on GUI tool like Kibana to view generated logs and other tools like log stash, Elastic Search for log management.
- Worked on Distribution Engine components for Comcast agent application portal with Elastic search as DB.
- Developed technical specifications for various back end modules from business requirements and specifications are done according to standard specification formats.
- Developed BPEL Process to Transfer Vendors, Customers, Items to make the data in sync between Oracle EBS and WM using hierarchical queries.
- Designed and developed DAO layer with Hibernate3.0 standards, to access data from IBM DB2 database through JPA(Java Persistence API) layer creating Object-Relational Mappings and writing PL/SQL procedures and functions.
- Integrating Spring injections for DAOs to achieve Inversion of Control, updating Spring Configurations for managing Java objects using callbacks.
- Designed & coded Presentation (GUI) JSP's with Struts tag libraries for Creating Product Service Components (Health Care Codes) using RAD.
- Coded Action classes, Java Beans, Service layers, Business delegates, to implement business logic with latest features of JDK1.5 such as Annotations and Generics.
- Creating test environments with WAS for local testing using test profile. And interacting with Software Quality Assurance (SQA) end to report and fix defects using Rational Clear Quest.
Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse