Hadoop/big Data Developer With Google Cloud Resume
San Jose, CA
SUMMARY:
- Around 8 years of IT experience which includes involvement in creating, actualizing and configuring Hadoop ecosystem and expertise in delivering solutions to Network Optimization.
- Having 6 years of experience as a Hadoop developer with extensive knowledge on Hive, Pig, Sqoop, Flume, Spark, PySpark, Scala, HBase, Oozie, ZooKeeper, impala and Cassandra.
- Experience in developing custom Map - Reduce & YARN programs in java and python to process huge data as per the requirement.
- Extensive knowledge in importing and exporting data using Sqoop from RDBMS (Relational Data Base Systems) to HDFS.
- Worked on Google Cloud Platform Services like Vision API, Instances.
- Worked on various file formats like Json, CSV, Avro, Sequence file, Text files and XML files.
- Good knowledge on No-SQL databases like HBase, Cassandra and MongoDB.
- Experience in administration of clusters using Ambari and Cloudera.
- Experience in Operational Intelligence using Splunk.
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, AWS CLI.
- Experience in breaking down information utilizing Pig Latin Scripts and Hive Query Language, Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFI and Apache Kafka.
- Worked on backend application projects related to ETL, data migration and Developed shell scripts for automation on DBA tasks.
- Responsible for Designing Logical and Physical data modelling for various data sources on Amazon Redshift. Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Ability to synthesize high-tech process with intense Conceptual, Business and Analytical skills to present capacity solutions and result-oriented analytic performance and command skills.
TECHNICAL SKILLS:
Hadoop Technologies: MapReduce, HDFS, YARN, Hive, Pig, Sqoop, Flume, Zookeeper, HBase, Spark, Kafka, Impala, Oozie
Programming Languages: Java, Python, SQL, PL/SQL, Shell Scripting, C, UNIX Shell Scripting, HTML
Frameworks: Hibernate 2.x/3.x, spring 2.x/3.x, Struts 1.x/2.x
Database Systems: Oracle, MySQL, Postgress, Teradata, HBase, Cassandra, Mongo DB
Web Technologies: Web Logic, Web Sphere, HTML5, CSS, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, SOAP and Rest Web Services
IDE Tools: Eclipse, NetBeans, RAD
Visualization Tools: Tableau
Operating Systems: Windows XP, 7, 10, Linux, Unix
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Hadoop/Big Data Developer with Google Cloud
Responsibilities:- Performed joins, group by and other operations in Hive.
- Wrote and executed PIG scripts using Grunt shell.
- Wrote Py Spark scripts for processing huge data sets.
- Extensively worked on data frames for processing large data manipulations.
- Used Rest API to Access HBase data to perform analytics.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations previously storing the data into HDFS.
- Worked on Google Vision API for detecting information from Confidential ’s internal data(images, V Cards etc).
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases
- Experienced in analyzing Cassandra database and correlate it with other open-source NoSQL databases to find which one of them better suites the stream requirements.
- Exported the analyzed data to the relational databases employing Sqoop for visualization and to generate reports for the BI team.
- Worked on log parsing and created well-structured search queries in order to minimize the performance issues.
- Worked on the sending data to Splunk Enterprise using Http Event Collector(HEC).
- Worked on Oozie workflow engine for job scheduling.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Effectively participate the team in achieving the big data tasks, delivering the projects in time and learned the optimal way to process any kind of tasks.
Environment: Apache Hadoop 2.2.0, Cloudera, Hue, MapReduce, Hive, HBase, HDFS, Cassandra, PIG, Sqoop, Oozie, PySpark, UNIX, Splunk 6.6.4, Google Vision API.
Confidential, Kent, WA
Hadoop/Spark Developer
Responsibilities:- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Wrote and executed PIG scripts using Grunt shell.
- Developed Spark scripts by using Scala shell commands and PySpark as per the requirement.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift..
- Used Rest API to Access HBase data to perform analytics.
- Design and Implementation of ETL process in Hadoop.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations previously storing the data into HDFS.
- Created UDF’s to encode the client sensitive data and stored into HDFS and performed evaluation employing PIG.
- Involved in the development of Talend Jobs and preparation of design documents, technical specification documents
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Designed the web-based structure for business analytics and data visualization in Hadoop ecosystem integrated Tableau on Hadoop frame work to visualize and analyze data.
- Experienced in analyzing Cassandra database and correlate it with other open-source NoSQL databases to find which one of them better suites the stream requirements.
- Exported the analyzed data to the relational databases employing Sqoop for visualization and to generate reports for the BI team.
- Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
- Worked on Oozie workflow engine for job scheduling.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Effectively participate the team in achieving the big data tasks, delivering the projects in time and learned the optimal way to process any kind of tasks.
Environment: Apache Hadoop 2.2.0, HDP2.2, Ambari, MapReduce, Hive, HBase, HDFS, Cassandra, AWS, PIG, Sqoop, Oozie, Java 1.7, UNIX, Shell Scripting, XML.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:- Provide technical designs, architecture, Support automation, installation and configuration tasks and upgrades and planning system upgrades of Hadoop cluster.
- Developed data pipeline using Flume and Sqoop to ingest customer behavioral data and financial histories into HDFS for analysis.
- Did comparative analysis of the Hive vs Impala.
- Maintained Hadoop clusters for dev/staging/production. Trained the development, administration, testing and analysis teams on Hadoop framework and Hadoop eco system.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Successfully integrated Hive tables and MongoDB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Define business and technical requirements, design Proof of Concept for evaluating afms agencies data evaluation criteria and scoring and select data integration and information management.
- Integrating Big data technologies and analysis tools into the overall architecture.
Environment: Hadoop, Cassandra, HBase, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Java, JSP, RMI, JNDI, JDBC, Tomcat, Apache, Shell Scripting.
Confidential
Hadoop Developer
Responsibilities:- Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with ETL process using Pig.
- Worked on data analysis in HDFS using MapReduce, Hive and PIG jobs.
- Worked on MapReduce programming and Hbase.
- Involved in creating external table, partitioning, bucketing of table.
- Ensuring adherence to guidelines and standards in project process.
- Facilitating testing in different dimensions.
- Used Crontab for automation of scripts.
- Wrote and modified stored procedures to load and modifying of data according to business rule changes.
- Worked on production support environment.
- Extracted the data from Teradata into HDFS using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Kerberos security was implemented to safeguard the cluster.
- Worked on a stand-alone as well as a distributed Hadoop application.
Environment: Apache Hadoop, Cloudera, Pig, Hive, SQOOP, Flume, Java/J2EE, Oracle 11G, Crontab, JBoss 5.1.0Application Server, Linux OS, Windows OS, AWS.
Confidential
Software Engineer
Responsibilities:- Analysis of the specifications provided by the clients
- Preparing and changing Documents (Technical and UI).
- Designed and developed business components and front end using JSP and Servlets.
- Implemented Struts framework in the presentation tier for all the essential control flow, business level validations and for communicating with the business layer.
- Coding using HTML pages, Struts, JSP.
- Front end validation using Javascript
- Developed required PL/SQL scripts.
- Involved in testing, debugging, bugs fixing and documentation of the system.
- Testing - unit testing and performance testing.
- Used CVS as the Configuration Management and Version Control.
Environment: HTML, CSS, Apache Tomcat server, JAVA, JSP, Servlets, Javascript, JDBC, TOAD, Eclipse, ANT, CVS, UNIX.