Sr.apache Spark Developer Resume
San Jose, CA
SUMMARY:
- Around 8+ years of strong experience in software development using BigData, HaDoop, Apache Spark Java/J2EE, Scala, Python technologies.
- Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs.
- Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
- Strong technical, administration, &mentoring knowledge in Linux and BigData/HaDoop technologies.
- Hands on experience on major components in HaDoop Ecosystem like Hadoop Map Reduce, HDFS HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro
- Experienced the deployment of Hadoop Cluster using Puppet tool
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
- Installing, configuring and managing of Hadoop Clusters and Data Science tools
- Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue
- Setting up the High-Availability for Hadoop Clusters components and Edge nodes
- Experience in developing Shell scripts and Python Scripts for system management
- Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes
- Experience with Object Oriented Analysis and Design (OOAD)methodologies
- Experience in installations of software, writing test cases, debugging, and testing of batch and online systems
- Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing
- Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX
- Extensively worked on implementing SOA (Service Oriented Architecture) using XMLWeb services (SOAP, WSDL, UDDI and XML Parsers).
- Worked with XML parsers like JAXP (SAX and DOM)and JAXB
- Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications
- Proficient with Core Java, AWT and also with the markup languages likeHTML 5.0,XHTML,DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
- Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers
- Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills
- Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies
- Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies
TECHNICAL SKILLS:
- Java Scala
- J2EE Python
- RabbitMQ R
- Big Data Data Mining
- HaDoop Apache Spark
- Map Reduce AWS
- HDFS Apache Cassandra
- Apache KAFKA Apache Storm
- MongoDB SQL
PROFESSIONAL EXPERIENCE:
Sr.Apache Spark Developer
Confidential, San Jose, CA
Environment : Apache Spark, Apache Kafka,MLIB Libraries, Scala, AKKA, Python, Cassandra, Hive,Storm, PIG, Hive,Big Data,R programming
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
- Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models.
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Developed Scala and SQL code to extract data from various databases
- Champion new innovative ideas around the Data Science and
- Advanced Analytics practices Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Developed statistical models to forecast inventory and procurement cycles.8 Developed Python code to provide data analysis and generate complex data report.
- Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
- Implemented the data backup strategies for the data in the Cassandra cluster.
- Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
- Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
- Imported the data from relational databases into HDFS using Sqoop.
- Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
- Utilized Python Panda Frame to provide data analysis.
- Utilized Python regular expressions operation (NLP) to analysis customer review.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores,
- NOSQL, Hadoop, PIG,MySQLand Oracle databases Used Spark MLLIB
- Libraries for designing recommendation Engines Analysis predicted by using Statistical analysis using R.
Sr.Big Data Engineer
Confidential, San Francisco, CA
Environment : Hortonworks (HDP 2.2), HDFS, MapReduce, Apache Cassandra,Apache KAFKA,YARN, Spark, Hive, Pig, Flume, Sqoop, Puppet, Oozie, ZooKeeper, Ambari, Oracle Database, MySQL, HBase, SparkSQL, Avro, Parquet, RCFile, JSON, UDF, Java (jdk1.7), CentOS
Responsibilities:
- Involved in architecture design, development and implementation of
- Hadoop deployment, backup and recovery systems.
- Experience in working on multi-Petabyte clusters both administration and development.
- Developed Chefmodules to automate the installation, configuration and deployment of ecosystem tools, OS's and network infrastructure at a cluster level.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Performed cluster co-ordination and assisted with data capacity planning and node forecasting using ZooKeeper.
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
- Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO,
- Snappy compression techniques.
- Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Responsible for cluster maintenance, rebalancing blocks, commissioning and decommissioning of nodes, monitoring and troubleshooting, manage and review data backups and log files.
- Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using Chef, Maven and Jenkins.
- Develop Pentaho Kettle Graphs to cleanse and transform the raw data into useful information and load it to a Kafka Queue (further loaded to
- HDFS) and Neo4j database for UI team to display it using the Web application.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
- Tune/Modify SQL for batch and online processes.
- Commissioning and decommissioning the nodes.
- Manage cluster through performance tuning and enhancement.
Big Data Engineer
Confidential, Chicago, IL
Environment: Hadoop 0.20.2 - PIG, Hive, Apache Cassandra, Cloudera manager
Responsibilities:
- Hadoop installation, configuration of multiple nodes in Cloudera platform.
- Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
- Developed Simple to complex Map/reduce streaming jobs Analyzing data with Hive, Pig and Hadoop Streaming.
- Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
- Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and
- Extracted the data from MySQL into HDFS using Sqoop Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Stored the data in an Apache Cassandra Cluster Used Impala to query the Hadoop data stored in HDFS.
- Manage and review Hadoop log files.
- Support/Troubleshoot Map/Reduce programs running on the cluster Load data from Linux file system into HDFS.
- Install and configure Hive and write Hive UDFs.
- Create tables, load data, and write queries in Hive.
- Develop scripts to automate routine DBA tasks using Linux Shell Scripts, Python.
Big Data Engineer
Confidential, Saint Louis, MO
Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux,Scala,Play
Responsibilities:
- Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
- Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
- Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
- Implemented helper classes that accessHBase directly from Java using Java API.
- Integrated MapReduce with HBase to import bulk amount of data into
- HBase using MapReduce programs.
- Experienced in converting ETL operations to Hadoop system using Pig
- Latin Operations, transformations and functions.
- Extracted the needed data from server and into HDFS and bulk loaded the cleaned data into HBase.
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Participated with admins in installation and configuring Map Reduce, Hive and HDFS.
- Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring.
- Used Hive to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
- Managed and reviewed Hadoop log files.
- Worked on Scala Play framework for application development.
- Involved in review of functional and non-functional requirements.
Java/J2EE Developer
Confidential, Buffalo, NY
Environment: Java, J2EE, JSP 1.2, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i,SOA, Web Services, WSDL.
Responsibilities:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase using Rational Rose.
- Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
- Designed User Interface using Java Server Faces (JSF), Cascading Style
- Sheets (CSS), and XML.
- UsedJNDI to perform lookup services for the various components of the system.
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
- Developed deployment descriptors for the EJB have to deploy on Web
- Sphere Application Server.
- ImplementedService Oriented Architecture (SOA)using JMS for sending and receiving messages while creating web services.
- DevelopedWeb Services for data transfer from client to server and vice versa using Apache Axis, SOAP, WSDL, and UDDI.
- Developed XML documents and generated XSL files for Payment
- Transaction and Reserve Transaction systems.
- Implemented various J2EE Design patterns like Singleton, Service Locator,
- Business Delegate, DAO, Transfer Object, and SOA.
Java/J2EE Developer
Confidential
Environment: Java, J2EE, JSP 1.2, Spring1.2,Hibernate 2.0, JSF1.2,EJB 1.2, IBM WebSphere6.0, Servlets, JDBC, XML, XSLT, DOM, CSS, HTML, DHTML, SQL, Java Script, Log4J, ANT1.6, WSAD6.0, Oracle 9i, Windows 2000
Responsibilities:
- Developed Servlets and Java Server Pages (JSP).
- Developed PL/SQLqueries, and wrote stored procedures and JDBC routines to generate reports based on client requirements.
- Enhancement of the System according to the customer requirements.
- Involved in the customization of the available functionalities of the software for an NBFC (Non-Banking Financial Company).
- Involved in putting proper review processes and documentation for functionality development.
- Providing support and guidance for Production and Implementation Issues.
- Used Java Script validation inJSP.
- Used Hibernate framework to access the data from back-end SQL Server database.
- Used AJAX (Asynchronous JavaScript and XML) to implement user friendly and efficient client interface.
- Used MDB for consuming messages from JMS queue/topic.
- Designed and developed Web Application using Struts Framework.
- ANT to compile and generate EAR, WAR, and JAR files.
- Created test case scenarios for Functional Testing and wrote Unit test cases with JUnit.
- Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.