Sr.hadoop Developer Resume
Waltham, MA
SUMMARY
- 8+ years of overall IT experience in a variety of industries, which includes hands on experience of 3+ years in Big Data technologies and designing and implementing Map Reduce
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Storm, Kafka, Yarn, Oozie, and Zookeeper.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
- Experience in analysing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Extensive Experience on importing and exporting data using Flume and Kafka.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experience in XML, XSLT, XSD, XQuery.
- Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, Informatica Power Mart, OLAP, OLTP and AutoSys.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
- Experience in using various IDEs Eclipse, My Eclipse and repositories SVN and CVS.
- Experience of using build tools Ant and Maven.
- Preparation of Standard Code guidelines, analysis and testing documentations.
TECHNICAL SKILLS
Big Data/Hadoop Technologies: HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Zookeeper, and Oozie.
NO SQL Databases: HBase, Cassandra.
Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting.
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB.
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Cloud Computing Tools: Amazon AWS.
Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Operating Systems: UNIX, Windows, LINUX.
Build Tools: Jenkins, Maven, ANT.
Business Intelligence Tools: Tableau, Splunk
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans.
Development Methodologies: Agile/Scrum, Waterfall.
PROFESSIONAL EXPERIENCE
Confidential - Waltham, MA
Sr.Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists intoCassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- DevelopedSparkscripts by using Scala shell commands as per the requirement.
- UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce inSpark1.3 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning ofSparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data intoSparkRDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities ofSparkusing Scala.
- Experienced in handling large datasets using Partitions,Sparkin Memory capabilities, Broadcasts inSpark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on migrating Map Reduce programs intoSparktransformations usingSparkand Scala.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications for implementing in project.
- Worked on Cluster of size 130 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Gained experience in managing and reviewing Hadoop log files using Elastic Search.
Environment: Hadoop, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Impala, Oozie, Elastic Search, Cassandra, Tableau, Cloudera, Oracle 10g, Linux.
Confidential -Newark, NJ
Sr. Hadoop developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Hadoop Ecosystem Upgrades and installation of tools that that uses Hadoop ecosystem.
- Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
- Used Apache Storm for running near Real-time Data processing Applications.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Consumed the data from Kafka queue using Storm
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the Storm cluster.
- Configured different topologies for Storm cluster and deployed them on regular basis.
- Created HBase tables to store various data formats of incoming data from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Used Reporting tools like Tableau to generate daily reports of processed data.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Supported MapReduce Programs written in python those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
- Exported the result set from HIVE to MySQL using Shell scripts
- Actively involved in code review and bug fixing for improving the performance.
- Involved in development, building, testing, and deploy toHadoopcluster in distributed mode.
- Created Linux Scripts to automate the daily ingestion of IVR data
- Processed the raw data using Hive jobs and scheduling them in Crontab.
- Helped the Analytics team with Aster queries using HCatlog.
- Automated the History and Purge Process.
- Developed the verification and control process for daily load.
- Experience in Daily production support to monitor and trouble shootsHadoop/Hive jobs
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Storm, Zoo Keeper, Kafka, Tableau, HBase, LINUX, Horton Works, MySQL.
Confidential -NYC, NY
Hadoop/Big Data Developer
Responsibilities:
- Worked on the proof-of-concept for Apache Hadoop1.20.2 framework initiation
- Installed and configured Hadoop clusters and eco-system
- Developed automated scripts to install Hadoop clusters
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment ofHadoopcluster in fully distributed modeMapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validated
- Performed load and retrieve unstructured data (CLOB, BLOB etc.)
- Developed Hive jobs to transfer 8 years of bulk data from DB2 to HDFS layer
- Implemented Data Integrity and Data Quality checks inHadoopusing Hive and Linux scripts
- Job automation framework to support & operationalize data loads
- Automated the DDL creation process in hive by mapping the DB2 data types
- Monitored Hadoop cluster job performance and capacity planning
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
- Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
- Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Used AVRO, Parquet file formats for serialization of data.
- Developed several test cases using MR Unit for testing Map Reduce Applications
- Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
- Used Bzip2 compression technique to compress the files before loading it to Hive
- Support/Troubleshoot hive programs running on the clusterand Involved in fixing issues arising out of duration testing.
- Prepare daily and weekly project status report and share it with the client.
Environment: Hadoop, Map Reduce, Flume, Sqoop, Hive, Pig, Restful Service, MR Unit, MS-SQL Server, DB2, HBase
Confidential
Sr.Java/J2EE Developer
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Applied OOAD principle for the analysis and design of the system.
- Implemented XML Schema as part of XQuery query language
- Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
- Used RAD for the Development, Testing and Debugging of the application.
- Used Websphere Application Server to deploy the build.
- Developed front-end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.
- Used J2EE for the development of business layer services.
- Developed Struts Action Forms, Action classes and performed action mapping using Struts.
- Performed data validation in Struts Form beans and Action Classes.
- Developed POJO based programming model using spring framework.
- Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
- Used Web Services to connect to mainframe for the validation of the data.
- SOAP has been used as a protocol to send request and response in the form of XML messages.
- JDBC framework has been used to connect the application with the Database.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Log4j framework has been used for logging debug, info & error data.
- Used Hibernate framework for Entity Relational Mapping.
- Used Oracle 10g database for data persistence.
- SQL Developer was used as a database client.
- Extensively worked on Windows and UNIX operating systems.
- Used SecureCRT to transfer file from local system to UNIX system.
- Performed Test Driven Development (TDD) using JUnit.
- Used Ant script for build automation.
- PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.
- Used Rational Clearquest for defect logging and issue tracking.
Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, Websphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Oracle 10g, WinSCP, Log4J, JUnit.
Confidential
Java/J2EE Developer
Responsibilities:
- Designed and developed the application using agile methodology.
- Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.
- Wrote technical design document with class, sequence, and activity diagrams in each use case.
- Created Wiki pages using Confluence Documentation.
- Developed various reusable helper and utility classes which were used across all modules of the application.
- Involved in developing XML compilers using XQuery.
- Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
- Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
- Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
- Written Java classes to test UI and Web services through JUnit.
- Design the mappings to Extract, Transform and load the data from source to the target system using Informatica Power Center.
- Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.
- Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
- Used Soap UI for testing the Web Services.
- Use of MAVEN for dependency management and structure of the project
- Create the deployment document on various environments such as Test, QC, and UAT.
- Involved in system wide enhancements supporting the entire system and fixing reported bugs.
- Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.
- Done data manipulation on front end using JavaScript and JSON.
Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, Informatica Power center 7.1, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema