Hadoop /spark Developer Resume
San Francisco, CA
PROFESSIONAL SUMMARY:
- Overall 9 years of IT experience in Design, Development, Deployment, Maintenance and support of Java applications which includes close to 4 years of experience in all Big Data ecosystems such as Spark and Hadoop.
- Having extensive experience on Spark SQL, Spark streaming, Spark include tune - up of the Spark applications.
- Strong experience on AWS-EMR, Spark installation, HDFS and MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera.
- Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop, Flume, Kafka, Cassandra, SparkSQL, Spark Streaming and Flink.
- Extensively used Apache Flume to collect logs and error messages across the cluster.
- Good exposure to performance tuning hive queries, MapReduce jobs, Spark jobs.
- Excellent skills in identifying and using appropriate Big Data tools for given task.
- Expertise in design and implementation of Big Data solutions in Banking, Insurance and health domains.
- Experience data processing like collecting, Aggregating, moving from various sources using Apache Flume and Kafka.
- Hands of experience on Data Migration from Relational Database to Hadoop platform using Sqoop
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Good working experience in client-side development with HTML, XHTML, CSS, JavaScript, JQuery, JSON and AJAX.
TECHNICAL SKILLS:
Hadoop Framework: HDFS, Hive, Pig, Flume, Spark, Oozie, Zookeeper, HBase and Sqoop
NoSQL Databases: Hbase
Programming/Scripting: C, Scala, SQL, PIG LATIN, UNIX shell scripting
Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2008, MS Project
Databases: MySQL, Oracle, Redshift
Operating Systems: Linux, Cent OS, Windows
Cluster Management Tools: Cloudera Manager, Hue.
IDE: Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office
PROFESSIONAL EXPERIENCE
Hadoop /Spark Developer
Confidential, San Francisco, CA
Technical Scope: Cloudera Manager, HDFS, YARN, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Spark, Scala, Hue, AWS, MySQL.
Responsibilities:
- Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3.
- Created Hive external tables for querying the data.
- Use Spark Data frame APIs to inject Oracle data to S3 and stored in Redshift.
- Write a script to get RDBMS data to Redshift.
- Process the complex/nested JSON and CSV data using Data frame API.
- Automatically scale-up the EMR instances based on the data.
- Apply transformation rules on top of Data frames.
- Run and Schedule the Spark script in EMR pipes.
- Process Hive, CSV, JSON, Oracle data at a time (POC).
- Validate the source and final output data.
- Test the data using Dataset API instead of RDD.
- Debug and test the process is reaching client’s expectations or not.
- Query execution is trigger. Improve the process timing.
- Based on new spark versions, applying different optimization transformation rules.
- Debug the script to minimize the shuffling data.
- Analyze and report the data using Splunk.
- Create dashboards in Splunk.
HADOOP/Spark Developer
Confidential, Seattle, WA
Technical Scope: Java (JDK 1.7), Linux, Shell Scripting, Amazon Redshift, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and Hbase, Business Objects and Tableau.
Responsibilities:
- Install Spark, and integrate with other Big data ecosystems like Hive, HBase.
- Integrate Kafka with Spark and get the social media data through Twitter API.
- Collaborate with Other Analysis team (R & Python & Tableau) to analyze the data.
- Integrate HiveQL, JSON, CSV data and run SparkSQL on the top of the different datasets.
- Process JSON, csv, xml datasets, write Scala script and Implement projects in Zeppelin
- Use Tachyon to optimize Spark performance & to process vast amount of data.
Confidential
Responsibilities:
- Install and configure SQL workbench, SQL developer and configure the drivers.
- Get oracle data through spark and apply transformations rules.
- Import and export redshift data using Spark
- Clean the data (unsupported files) in Redshift
- Save the data in Redshift and S3 using spark.
Confidential
Responsibilities:
- Create topics in Kafka, Generate logs to process in Spark.
- Provide high availability to Kafka brokers using Zookeeper process the logs in Spark, finally store these logs in Cassandra.
- Run Cqlsh commands in Cassandra.
- Integrate Spark, Cassandra and Kafka
Hadoop Engineer
Confidential, Sunnyvale, CA
Technical Scope: Cloudera Manager, HDFS, YARN/MRV2, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Hue, Teradata and MySQL and Oracle
Responsibilities:
- Installed Hadoop on clustered Environments on all Environments
- Installed Cloudera Manager on CDH3 clusters
- Configured the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria
- Implemented the Hadoop Name-node HA services to make the Hadoop services highly available
- By using flume collected web logs from different sources and dumped them into HDFS
- Implemented Oozie work-flow for ETL Process
- Developed Hive Scripts and Temporary Functions for Complex Business Analytics
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP
- Implemented shell scripts for Log-Rolling day to day processes and made it automated
- Coordinating FLUME, HBASE nodes and master using zookeeper
- Enabled Kerberos for AD Authentication.
- Commissioned/decommission nodes as needed.
- Streamlined cluster scaling and configuration
- Developed the cron job for storing the Name-node metadata onto the NFS mount directory
- Worked on file system management and monitoring and Capacity planning
- Execute system and disaster recovery processes
- Work with the project and application development teams to implement new business initiatives as they relate to Hadoop.
- Installed and configured operating systems packages.
Hadoop Engineer
Confidential, Fort Wright, KY
Technical Scope: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose.
- Used default MapReduce Input and Output Formats.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Monitoring Hadoop cluster-using tools like Nagios, Ganglia and Cloudera Manager.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop cluster.
- Export filtered data into HBase for fast query.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Created data-models for customer data using the Cassandra Query Language.
- Ran many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive and MapReduce) and move the data files within and outside of HDFS.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
Java Developer
Confidential, Raleigh, NC
Technical Scope: Java, Spring, JSP, Restful Web services, HTML, CSS AJAX, Java Script, MySQL
Responsibilities:
- Responsible for DAO layer development using Hibernate.
- Created stateless session beans for providing transaction support for updates and help with application scalability.
- Created Value Objects for populating and transferring Data between layers.
- Responsible for developing struts Action classes for performing search, select and save operations on form data.
- Developed JSP pages with extensive use of html, CSS, JavaScript.
- Actively involved in developing utility classes which are commonly shared among all modules in the application.
- Used extensive SQL joins to avoid orphan data.
Jr.Java Developer
Confidential
Technical Scope: Java, JavaScript, CSS, AJAX, JSP, HTML, XML, JDBC, Eclipse, MYSQL, Apache Tomcat, STAR-UML.
Responsibilities:
- Involved in the Software Development Life Cycle of the project development.
- Gathered the business requirements and converted them to technical specifications and use cases.
- Used STAR-UML to create the use cases and activity diagrams.
- Developed the client side view using J2EE, JavaScript, JQUERY, CSS, JSP and AJAX.
- Performed client side validations by using JavaScript.
- Worked on the application development usingjava.
- Developed JDBC commands to add and retrieve the patient records from the database.
- Responsible for writing SQL queries for storing and retrieving the patient record.
- Used Eclipse for development and debugging the application.
- Log4j was used for application logging and debugging.