Hadoop/spark/scala Developer Resume
St Louis, MissourI
SUMMARY:
- 8 years of IT experience in Design, Development, Deployment, Maintenance and Support of Java/J2EE applications. Focused on quality and efficiency.
- 3 years of experience in Hadoop distributed file system (HDFS), Impala, Hive, Hbase, Spark, Hue, Map Reduce framework and Sqoop.
- Experienced as Hadoop, expertise in providing end to end solutions for real time big data problems by implementing distributed processing concepts such as map reduce on Hadoop frameworks such as HDFS and Hadoop Ecosystem components
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Experience in working on large scale big data implementations and in production environment
- Hands on experience on Data Migration from Relational Database to Hadoop Platform using SQOOP.
- Extensively used Apache Flume to collect logs and error messages across the cluster.
- Experienced in using Pig scripts to do transformations, event joins, filters and some pre - aggregations before storing the data onto HDFS.
- Around 1year experience on Spark and Scala.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Experience in Complete Software Development Life Cycle (SDLC) which includes Requirement Analysis, Design, Coding, Testing and Implementation using Agile (Scrum), TDD and other development methodologies.
- Expertise in developing both Front End and Back End applications using Java, Servlets, JSP, Web Services, JavaScript, HTML, Spring, Hibernate, JDBC, XML, JSON.
- Worked on Web logic, Tomcat Web Server for Development and Deployment of the Java/J2EE Applications.
- Good experience in Spring & Hibernate and Expertise in developing Java Beans.
- Working knowledge of Web logic server clustering.
- Proficient in various web based technologies like HTML, XML, XSLT, and JavaScript.
- Expertise in unit testing using JUnit.
- Experience in error logging and debugging using Log4J.
- Strong knowledge in creating/reviewing of data models that are created in RDBMS like Oracle 10g, MySQL databases.
- Responsible for the formation and direction of Business Intelligence, Data Governance, Enterprise Data Warehouse (EDW) and Enterprise Data Management (EDM) (Oracle Appliance (11g), Informatica 9, Business Objects XI 3.1, Erwin)
- Worked with operating systems like Linux, UNIX, Solaris, and Windows 2000/XP/Vista/7.
- Experience in working with versioning tools like Git CVS & Clear Case.
- Goal oriented, organized, team player with good interpersonal skills; thrives well within group environment as well as individually.
- Strong business and application analysis skills with excellent communication and professional abilities.
TECHNICAL SKILLS:
Languages: Java, PL/SQL, Scala
Big Data: Apache Hadoop, Hive, HDFS, Spark, MapReduce, Sqoop
RDBMS: Oracle, SQL Server, Teradata
Scripting Languages: UNIX Shell script, Java Script, python
Web Servers: Tomcat 7.x.
Tools: and Utilities: MS Team Foundation Server, SVN, Maven, Gradle
Development Tools: Eclipse, IntelliJ IDEA
Operating systems: Windows NT/2000/XP, UNIX, Linux
Methodology: Waterfall, Agile Methodologies.
PROFESSIONAL EXPERIENCE:
Confidential, St. Louis Missouri
Hadoop/Spark/Scala Developer
Responsibilities:
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Developing UDFs in java for hive and pig.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Data analysis through Pig, Map Reduce, Hive.
- Design and develop Data Ingestion component.
- Cluster coordination services through Zookeeper
- Import of data using Sqoop from Oracle to HDFS
- Import and export of data using Sqoop from or to HDFS and Relational DB Teradata.
- Developed POC on Apache-Spark and Kafka
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Hands on experience in installing, configuring and using eco-System components like Hadoop MapReduce, HDFS,
- Hbase, Pig, Flume, Hive and Sqoop.
- Developed analytical component using Scala, Spark and Spark Stream.
Environment: Java, Scala, Python, J2EE, Hadoop, Spark, Cassandra, HBase, Hive, Pig, Sqoop, MySQL, TeraData, GithubConfidential, Charlotte, NC
Hadoop/Spark Developer
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
- Developed and executed shell scripts to automate the jobs
- Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using PySpark
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Involved in loading data from UNIX file system to HDFS
- Extracted the data from Teradata into HDFS using Sqoop
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Manage and review Hadoop log files.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- As Scrum Master for the Shared Risk Platform team within the Operational Risk technology area, I have established Scrum process, coached the team on Agile principles, values and practices, introduced JIRA as the Agile tool and led the team as a servant leader to deliver multiple increments.
- Worked on the core and Spark SQL modules of Spark extensively..
- Experienced in running Hadoop streaming jobs to process terabytes data.
- Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, SQL, Teradata, UNIX Shell Scripting
Confidential
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop on a cluster.
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing
- Developed Simple to complex Map Reduce Jobs using Hive and Pig
- Extending Hive and Pig core functionality by writing custom UDFs
- Analyzed large data sets by running Hive queries and Pig scripts
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Acted as Scrum Master for Product teams with a focus on guiding the teams towards improving the way they work.
- Experienced in defining job flows using Oozie
- Experienced in managing and reviewing Hadoop log files
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources and application
- Working Knowledge in NoSQL Databases like HBase and Cassandra.
- Good Knowledge of analyzing data in HBase using Hive and Pig.
- Involved in Unit level and Integration level testing.
- Prepared design documents and functional documents.
- Based on the requirements, addition of extra nodes to the cluster to make it scalable.
- Involved in running Hadoop jobs for processing millions of records of text data
- Involved in loading data from local file system (LINUX) to HDFS
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing
- Submit a detailed report about the daily activities on a weekly basis.
Environment: Hadoop-HDFS, Pig, Sqoop, HBase, Hive, Flume MapReduce, Cassandra, Oozie and MySQL
Confidential, Madison WI
Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Cassandra, Zookeeper and Sqoop.
- Involved with Business Analysts in gathering requirements.
- Involved in designing Logical/Physical Data Models.
- Deployed Hadoop Cluster in Pseudo-distributed and Fully Distributed modes.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Created complex mappings using different transformations like Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Union, Expression and Aggregator transformations to pipeline data to DataMart. Also, made use of variables and parameters.
- Developed PowerCenter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica 8.6.1.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in Big data analysis using Pig and User defined functions (UDF).
- Managed and scheduled Jobs on a Hadoop cluster.
- Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Implemented Name node backup using NFS. This was done for High availability.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Set up standards and processes for Hadoop based application design and implementation.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Implemented various Performance Tuning techniques on Sources, Targets, Mappings, and Workflows.
- Written shell scripts in UNIX to execute the workflow in a loop to process ‘n’ number of files and FTP Scripts to pull the files from FTP server to Linux Server.
- Developed and followed agile project management plan (Agile Ceremonies). Facilitated build requirements log (product backlog) with cost estimates and priority.
- Conducted Scrum Daily standup, Product backlog, Sprint Planning, Sprint Review & Sprint Retrospective meetings.
- Determined the team capacity (velocity) from historical data. Created Work Break down structure (user stories) and corresponding activities (tasks).
- Worked on Hadoop Backup Recovery and Upgrade.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
- Involved with reporting team to generating reports from Data Mart using Cognos.
Environment: Apache Hadoop, EDW, EDM Informatica PowerCenter 8.6/8.1, SQL Server 2005, TOAD, Rapid SQL, Oracle 10g (RAC), HDFS, Map Reduce, Mongo DB, Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux, UNIX, DB2.
Confidential
Software Engineer
Responsibilities:
- Involved in different phases of Software Development Lifecycle (SDLC) like Requirements gathering, Analysis, Design and Development of the application.
- Wrote several Action Classes and Action Forms to capture user input and created different web pages using JSTL, JSP, HTML, Custom Tags and Struts Tags.
- Designed and developed Message Flows and Message Sets and other service component to expose Mainframe applications to enterprise J2EE applications.
- Used standard data access technologies like JDBC and ORM tool like Hibernate
- Worked on various client websites that used Struts 1 framework and Hibernate
- Wrote test cases using JUnit testing framework and configured applications on WebLogic Server
- Involved in writing stored procedures, views, user-defined functions and triggers in SQL Server database for Reports module.
Environment: Java, JSP, JUnit, Eclipse, JIRA, JDBC, Struts 1, Hibernate, Visual Source Safe (VSS), WebLogic, Oracle 9i.
Confidential
Java Developer
Responsibilities:
- Developed Web interface using JSP, Standard Tag Libraries (JSTL), and Struts Framework.
- Used Struts as MVC framework for designing the complete Web tier.
- Developed different GUI screens JSPs using HTML, DHTML and CSS to design the Pages according to Client Experience Workbench Standards.
- Validated the user input using Struts Validation Framework.
- Client side validations were implemented using JavaScript.
- Implemented the mechanism of logging and debugging with Log4j.
- Version control of the code and configuration files are maintained by CVS.
- Developed PL/SQL packages and triggers.
- Developed test cases for Unit testing and performed integration and system testing.
Environment: J2EE, Weblogic, Eclipse, Struts 1.0, JDBC, JavaScript, CSS, XML, ANT, Log4J, VSS, PL/SQL and Oracle 8i.