Sr Hadoop Developer Resume
Collierville, TN
SUMMARY:
- 8+ years of experience in IT industry with extensive experience in Java, J2ee and Big data technologies.
- 4 +years working of exclusive experience on Big Data technologies and Hadoop stack
- Strong experience working with HDFS, Mapreduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
- Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
- More than one year ofhands on experience usingSpark framework with Scala.
- Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
- Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using ApacheSQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries..
- Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Mastered in using the using the differentcolumanar file formats like RCFile, ORC and Parquet formats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Good experience in optimizing Map-Reduce algorithms by using Combiners and Custom partitioners.
- Hands on experience in NOSQL databases like HBase and MongoDB.
- Experience includes application development in Java (client/server), JSP, Servlet programming, Enterprise Java Beans, Struts, JSF, JDBC, spring, Spring Integration, Hibernate.
- Very good understanding in AGILE scrum process.
- Experience in using version control tools like Bit-Bucket, SVN etc.
- Having good knowledge of Oracle 8i, 9i, 10g as Database and excellent in writing the SQL queries
- Performed performance tuning and productivity improvement activities
- Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
- Proactive in time management and problem solving skills, self-motivated and good analytical skills.
- Expertise in back-end/server side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Data base Connectivity (JDBC)
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, Teradata, Map Reduce,Spark, HDFS, HBase,Pig, Hive, Sqoop, Oozie, Storm, Kafka and Flume.
Spark Streaming Technologies: Spark Streaming, Storm
Scripting Languages: Python, Bash,Java Scripting, HTML5, CSS3
Programming Languages: Java, Scala,SQL, PL/SQL
Databases: RDBMS, NoSQL, Oracle.
Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags),JSF, Apache Struts, Junit, Hibernate 3.x,Log4J Java Beans, EJB 2.0/3.0, JDBC,RMI, JMS, JNDI.
Tools: Eclipse, Maven,Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Collierville, TN
Sr Hadoop Developer
Responsibilities:
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
- Used Spark SQL and Data Frame API extensively to build spark applications.
- Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
- Performed streaming data ingestion using Kafka to the spark distribution environment.
- Built a prototype for real time analysis using Spark streaming and Kafka.
- Closely worked with data science team in building Spark MLlib applications to build various predictive models.
- Portioned, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Developed and integrated java programs to move flat files from linux systems to Hadoop eco systems and file validations before loading it to hive tables.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
- Optimized HIVE analytics SQL queries, Created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Written Sqoop scripts to inbound and outbound data to HDFS and validated the data before loading to check the duplicated data.
- Created HBase tables to store variable data formats of data coming from different portfolios
- Used Sqoop job to import the data from RDBMS using I Confidential emental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
Environment: Hadoop, HDFS, hive, Sqoop, Spark, Scala, MapReduce, Cloudera, Kafka, Zookeeper, HBase, Shell Scripting, AWS UNIX Shell Scripting.
Confidential, Bridgewater, NJ
Sr. Hadoop/Spark Developer
Responsibilities:
- We strive to devote our people and technology to create superior products and services, thereby contributing to a better global society. Working closely together with customers, building strong ties to the local communities we serve, and responding creatively to future challenges, are principles deeply instilled in the minds of our employees.
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
- Used Spark SQL and Data Frame API extensively to build spark applications.
- Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
- Performed streaming data ingestion using Kafka to the spark distribution environment.
- Built a prototype for real time analysis using Spark streaming and Kafka.
- Closely worked with data science team in building Spark MLlib applications to build various predictive models.
- Portioned, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Developed and integrated java programs to move flat files from linux systems to Hadoop eco systems and file validations before loading it to hive tables.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
- Optimized HIVE analytics SQL queries, Created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Written Sqoop scripts to inbound and outbound data to HDFS and validated the data before loading to check the duplicated data.
- Created HBase tables to store variable data formats of data coming from different portfolios
- Used Sqoop job to import the data from RDBMS using I Confidential emental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
Environment: Hadoop, HDFS, hive, Sqoop, Spark, Scala, MapReduce, Cloudera, Kafka, Zookeeper, HBase, Shell Scripting, AWS UNIX Shell Scripting.
Confidential, Denver, CO
Java/Hadoop Developer
Responsibilities:
- Worked with 50+ nodes CDH4 Hadoop cluster on LINUX.
- Developed multiple MapReduce jobs in Java for data, cleaning and preprocessing operations.
- Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
- Wrote Pig Scripts, Sqoop to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Installed and configured Pig and also developing scripts using PigLatin and Wrote MapReduce job using Pig Latin.
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources, these pipelines had customized UDF’S to extend the ETL functionality.
- Used tableau for creating dash boards and BI reporting for management.
- Developing Hive queries for data analysis to meet the business requirements as per functional specifications.
- Involved in defining job flows and managing and reviewing Hadoop log files.
- Implemented various requirements using PigLatin scripts .
- Load and transform large sets of structured, semi-structured and unstructured data.
- Responsible for implementing MongoDB to store and analyze unstructured data.
- Supported Map Reduce Programs and business specific jar files those are run on the cluster.
- Installed and configured Hive and also written Hive UDFs .
- Involved in End to End data transformation and implementation of ETL logics.
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used CHC techniques to transform the layers of data in to host data in source to targets.
- Creating Hive tables, loading data and writing hive queries that run internally in map reduce jobs with fine tuning .
- Worked on HBase, Cassandra for internal data storage and test validations like NoSQL databases.
- Implementing and managing CDH3 Hadoop clusters.
- Created specific HBase tables to store variable data formats of data coming from different portfolios.
- Worked on installing cluster, commissioning& decommissioning of datanode, namenode recovery, capacity planning and slots configuration.
- Implemented best income logic using Pig scripts .
- Managing Cluster coordination services with Zookeeper.
- Transporting data analyzed to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Created jobs to automate applications data events using Linux shell scripting.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management and Used struts validation framework for form level validation
- Involved in managing templates and screens in XML and JavaScript.
Environment:: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, MongoDB, Sqoop, CentOS, SOLR.Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, XML, JavaScript.
Confidential, Concord, California
Hadoop Developer
Responsibilities:
- Lead a team of three developers that built a scalable distributed data solution-using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage p Confidential erns.
- Used MapReduce to Index the large amount of data to easily access specific records.
- Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
- Configured periodic i Confidential emental imports of data from DB2 into HDFS using Sqoop.
- Exported data using Sqoop from HDFS to Teradata on regular basis.
- Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
- Installed and configured Flume, Hive, Pig and Sqoop HBase on the Hadoop cluster.
- Exported and analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop .
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Wrote Pig and HiveUDFs to analyze the complex data to find specific user behavior.
- Used Solr workflow engine to schedule multiple recurring and ad-hoc Hive and Pig jobs.
- Created HBase tables to store various data formats coming from different portfolios.
- Created Python scripts in automating the work flows.
- Extracted feeds form social media sites such as Facebook Twitter using Python scripts.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- TibcoJasperSoft was used for the embedding BI reports
- Experience in writing scripts in Python for the automated jobs
- Assisted the team responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
- Conversion of Teradata, RDBMS are formulated in Hadoop backlog files.
- Worked actively with various teams to understand and accumulate data from different sources up on the business requirements
- Worked with the testing teams to fix bugs and ensure smooth and error-free code.
Environment:: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, ZooKeeper, PL/SQL, MySQL, DB2, Teradat.
Confidential
Java/ J2EE Developer
Responsibilities:
- Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
- Responsible to enhance the Portal UI using HTML, Java Script, XML, JSP, Java, CSS as per the requirements and providing the client side Java script validations and Server side Bean Validation Framework (JSR 303).
- Used Spring Core Annotations for Dependency Injection.
- Used Hibernate as Persistence framework mapping the ORM objects to table using Hibernate annotations.
- Responsible to write the different service classes and utility API which will be used across the frame work.
- Used Axis to implementing Web Services for integration of different systems.
- Developed Web services component using XML, WSDL, and SOAP with DOM parser to transfer and transform data between applications.
- Exposed various capabilities as Web Services using SOAP/WSDL.
- Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
- Used AJAX framework for server communication and seamless user experience.
- Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
- Used client side Java scripting: JQUERY for designing TABS and DIALOGBOX.
- Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
- Used Log4j for the logging the output to the files.
- Used JUnit/Eclipse for the unit testing of various modules.
- Involved in production support, monitoring server and error logs and Foreseeing the Potential Issues, and escalating to the higher levels.
Environment: Java,J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, Ajax, JUnit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML
Confidential
Java Application Developer
Responsibilities:
- Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
- Involved in Transactions, login and Reporting modules, and customized report generation using Controllers, Testing and debugging the whole project for proper functionality and documenting modules developed.
- Designed front end components using JSF.
- Involved in developing Java APIs, which communicates with the Java Beans.
- Implemented MVC architecture using Java, Custom and JSTL tag libraries.
- Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
- Implemented MVC architecture and DAO design p Confidential ern for maximum abstraction of the application and code reusability.
- Created Stored Procedures using SQL/PL-SQL for data modification.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
- Developed JUnit test cases for regression testing and integrated with ANT build.
- Implemented Logging framework using Log4J.
- Involved in code review and documentation review of technical artifacts.
Environment: J2EE/Java, JSP, Servlets, JSF, Hibernate, Spring, JavaBeans, XML, XSL, HTML, DHTML, JavaScript, CVS, JDBC, Log4J, Oracle 9i, IBM WebSphere Application Server