Hadoop / Srk Developer Resume
PA
SUMMARY
- Around 6 years of professional experience in IT in Analysis, Design, Development, Testing, Documentation, Deployment, Integration, and Maintenance of Client/Server applications using Java and Big Data technologies.
- Experience in working with Cloudera (CDH3, CDH4&CDH5) and Horton Works Hadoop Distributions.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- 3 years of hands on experience in Big Data / Hadoop Ecosystem technologies which includes HDFS, Map Reduce, Pig, Hive, HBase, Spark, Sqoop, Flume, Oozie and Zookeeper.
- Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributed File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming.
- Developed Hadoop Map Reduce program using Java.
- Developed Spark jobs using Java and Scala.
- Worked on spark RDD’s, Dataframes and Datasets.
- Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
- Deep understanding of performance tuning, partitioning for optimizing spark applications.
- Experienced in writing complex Spark jobs which work with different file formats like Text, Xml, JSON and Avro.
- Experience in retrieving data from databases like MySQL, Teradata, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase, Cassandra and HDFS.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
- Experience in job workflow scheduling and monitoring tools like Oozie.
- Developed bash scripts to launch spark jobs and automated with Control-M.
- Expertise in developing Pig Latin scripts and using Hive Query Language.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Serde like JSON and ORC.
- Experience in importing and exporting terra bytes of data using Sqoop and Apache Nifi from HDFS to Relational Database Systems and vice-versa.
- Working knowledge on NoSQL databases like HBase and Cassandra.
- Worked on importing data into HBase using HBase Shell and HBase Client API.
- Strong experience in RDBMS technologies like SQL, Stored Procedures, Triggers, Functions.
- Experienced in Object Oriented Analysis and Object-Oriented Design using UML.
- Implemented Design patterns such as MVC, View Dispatcher, Data Access Objects, Singleton, Observer, Factory, and Session Facade.
- Expertise working in J2EE, JSP, JDBC, Servlets environments.
- Working experience of control version tools like SVN, Git and continuous integration tool Jenkins.
- Experience with Agile Methodology, Scrum Methodology and release management.
- Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.
- Experience in working with Unit testing using Junit , Scala Test , Spock and Easy Mock and Mockito
- Knowledge in UNIX Shell Scripting and Perl Scripting.
- Experience in handling defect tracking tools JIRA/ALM.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Flume, Yarn, Spark, SparkSql, Apache Nifi, Kafka, Spark Streaming
DB Languages: SQL, PL/SQL
Programming Languages: Scala, Java, Python
Scripting Languages: Unix, Python
Web Services: SOAP, Restful
Databases: Oracle, SQL Server, HBASE, Cassandra.
Tools: Eclipse, Intellij, ERWIN, Visio, Putty, WinScp
Platforms: Windows, Linux, Unix
ETL & Visualization: Informatica, Tableau
Application Servers: Apache Tomcat, Web Sphere, Web logic
PROFESSIONAL EXPERIENCE
Confidential, PA
Hadoop / Spark Developer
Responsibilities:
- Developed Spark code using Java, Scala for faster testing and data processing.
- Experience with batch processing of data sources using Apache Spark.
- Experience with real time processing of data sources using Apache Spark Streaming.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
- Worked on a product team using Agile Scrum methodology to design, develop, deploy and support solutions that leverage the Client big data platform.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Spark and Kafka
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Created External Hive Table on top of parsed data.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Automating and scheduling the Sqoop, Map Reduce and Spark jobs in a timely manner using Unix Shell Scripts and Control-M.
- Created Unit Test Documents and performing unit testing.
- Used Jira for bug tracking and Quick Build for continuous Integration.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, Json, CSV formats.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop.
- Hive QL scripts to create, load, and query tables in a Hive.
- Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
Environment: Spark, Spark Streaming, Java8, Scala, Linux, Maven, Subversion, Jira, ETL, MapReduce, Quick Build, IntelliJ, UNIX Shell Scripting, Control-M, Cloudera, HBase, Git, Kafka, Hive.
Confidential, NE
Hadoop Developer
Responsibilities:
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Deployed the Hadoop cluster in cloud (AWS) environment with scalable nodes as per the business requirement.
- Writing Spark programs to load, parse, refined and store sensor data into Hadoop and process analyzed and aggregate data for visualizations.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
- Used Apache Nifi for loading data from RDBMS to HDFS and HBase.
- Used Apache Solr 5.4 as the Search Engine.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
- Worked on the Publish component to read the source data, extract metadata and apply transformations to build Solr Documents, index them using SolrJ.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in Hive.
- Created HBase tables to store variable data formats.
- Prepare Developer (Unit) Test cases and execute Developer Testing.
- Continuous integration using Jenkins and ensured code coverage.
- Involved in Kafka implementation POC.
- Used Apache Phoenix tool on top of HBASE to perform CURD operations.
- Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS. Worked on the Publish component to read the source data, extract metadata and apply transformations to build Solr Documents, index them using SolrJ.
- Develop bash scripts to launch Spark, Hive jobs.
- Design and automate the workflows in Control M.
- Create a complete Roadmap for Data Lake that will stream in data from multiple sources and enable analytics using standard BI analytic tools on the Data Lake.
Environment: AWS, AWS S3, Hive, Spark, Java, Linux, Maven, SQL Server, Git, Jira, ETL, Apache Solr, Apache Tika, Toad 9.6, UNIX Shell Scripting, Scala, Apache Nifi.
Confidential, TN
Hadoop Developer
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked with different feeds data like JSON, CSV, XML and implemented data lake concept.
- Defined UDFs using PIG and Hive in order to capture customer behavior.
- Design and implement MapReduce jobs to support distributed processing using java, Hive and Pig.
- Create Hive external tables on the MapReduce output before partitioning, bucketing is applied.
- Maintenance of data importing scripts using Hive and MapReduce jobs.
- Worked on hive data warehouse modeling to interface with BI tools such as Tableau.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Involved in writing Flume and Hive scripts to extract, transform and load the data into Database.
- Very Good experience on UNIX shell scripting, Python and WLST.
- Good experience on developing of ETL Scripts for Data cleansing and Transformation.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Expertise in designing python scripts to interact with middleware/back end services.
- Worked on python scripts to analyze the data of the customer.
- Used Jira for bug tracking.
- Used Git to check-in and checkout code changes.
Environment: Hadoop, Hortonworks, Linux, Python, HDFS, Pig, Hive, Sqoop, Zookeeper, MapReduce, Restful Service, Teradata, Tableau, Jenkins.
Confidential, FL
Hadoop Developer
Responsibilities:
- Involved in writing Java MapReduce.
- Developed several test cases using MR Unit for testing Map Reduce Applications.
- Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
- Converted delimited data and XML data to common format (JSON) using java MapReduce.
- We stored data in compress mechanism like Apache Avro .
- Involved in creating Hive tables, loaded and analysed data using Hive queries.
- Worked extensively with SQOOP for importing metadata from Oracle .
- Developed PIG UDFs for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders.
- Worked with completely structured data of size in TB.
- Created AVRO schemas for these data.
- Created Partitions for these data, these helps quick results from large hive tables.
- Created tables and views for different Customers according to their permissions.
- Performed partitioning and bucketing of hive tables to store data on Hadoop .
- Involved in loading data from UNIX file system to HDFS .
- Integrated HBase with Map Reduce to move bulk amount of data into HBase.
- Creating external tables using hive and providing to the downstream data.
- Used Zookeeper operational services for coordinating cluster and scheduling workflows.
- Create ETL transforms and jobs to move data from files to our operational database and from operational database to our data warehouse.
- Exporting the results of transaction and sales data to RDBMS after aggregations and computations using SQOOP.
Environment: Java, MapReduce, MySQL, Linux/UNIX, Ubuntu, Hadoop 2.0.3, Oozie, Pig, Hive, SQOOP, ZOOKEEPER, HBase, FLUME.