We provide IT Staff Augmentation Services!

Software Developer/big Data Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Willing to update my knowledge and learn new skills according to business requirement.
  • To deliver my duties very sincerely and regularly in the interest of organization.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Passionate about working on the most cutting - edge Big Data technologies.
  • Overall 6 years of experience in IT Industry which includes 3+ years of experience as Hadoop/Spark Developer using Big Data Technologies like Hadoop ecosystem, Spark ecosystems and 2+ years of Core Java and SQL.
  • Hands on experience using Hadoop ecosystem components like HDFS, Hive, Pig, Yarn, Sqoop, Flume, Hbase, Impala, Oozie, Zoo Keeper, Kafka, Spark. Good Understanding of Spark Architecture including Spark Core, Spark SQL, Dataframes, Spark Streaming.
  • Experience in analysis, design, Coding and testing phases of software development life cycle (SDLC). Experience in processing large volume of data using spark and transforming into required data sets as needed by the business. Experience in using accumulator and broadcast variables in spark.
  • Experience with various big data application phases like data ingestion, data massaging and data transformations ETL.
  • Expertise in using Spark-SQL with various sources like JSON, parquet and Hive. Experience in using various Hadoop distribution like cloudera 5.3, Horton works distribution and amazon aws experience in transferring data from RBDMS to HDFS and HIVE table using sqoop. Experience in creating tables, partitioning, bucketing, loading and aggregating data using HIVE. Migrating hive queries into Spark SQL using Scala.
  • Experience in analyzing hive using HiveQL. Uploaded various formats of data structured and unstructured into HDFS and S3. Knowledge on core java concepts like exceptions, collections, data structures.

Languages: SQL,HiveQL,scala,NoSQL,python,java

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Apache Spark, Strom, Kafka, Yarn, NiFi, TEZ

Operating System: Windows, Unix, Linux

Languages: Java, SQL, PL/SQL, Shell Script, Python, scala

Testing tools: Junit

Front - End: HTML, JavaScript, CSS, XML, XSL, XSLT

SQL Databases: MySQL, Oracle 11g/10g/9i, SQL Server, DB2, Postgres

NoSQL Databases: HBase, Cassandra, MongoDB, Neo4j

File System: HDFS

Reporting Tools: Tableau, Qlickview

IDE Tools: Eclipse, NetBeans, Spring Tool Suite, IntelliJ

Application Server: IBM WebSphere, Web Logic, JBoss

Version control: SVN, GIT and CVS, BitBucket

Build Tools: Maven.

Integration and Deployment tools: Bamboo

AWS services: EC2 instance, EMR, cloud formation, s3 buckets.

PROFESSIONAL EXPERIENCE

Confidential

Software Developer/Big Data Developer

Responsibilities:

  • Ingested files with various format json,csv to enterprise data zone and making changes in existing framework.
  • Storing data into S3 from raw zone after making required transformations using spark scala.
  • Written spark application to trigger shell script which load data into redshift servers based on configuration file from s3 location.
  • Decoupled some of the applications on need basis in order reduce processing resources.
  • Involved in tuning redshift table DDL’s to improve performance on data read by adjusting sort and distribution keys.
  • Input various analysis for data inconsistency issue that are identified while comparing legacy system.
  • Worked extensively with Hortonworks Distributed Hadoop platform.
  • Working in Agile development environment in sprint cycles of two weeks by dividing and organizing various tasks which include defects related to production support and developing new data pipelines.
  • Involved in collecting requirements related to various datasets and finding effective means to deliver them to the consumers.
  • Producing various datasets by writing ETL spark code on various source hive tables.
  • Ingesting various json files from different sources/vendors into hive external tables and move to Redshift table.
  • Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques while storage for faster data access.
  • Implemented Hive UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
  • Experience in pulling the data from AWS Amazon S3 buckets to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Performed performance tuning in spark in order to speed up the job and also to reduce the cluster resources utilization.
  • Used Coalesce and repartition on data frames while optimizing the spark jobs.
  • Providing technical expertise support during acceptance testing, resolving/fixing the code issues related to application functionality and implementing emergency fixes.
  • Using Data warehouse & Data mart concepts in building Dimension/Fact tables.
  • Migrated various applications which load data into hive from Spark 1.6 to Spark 2.1.
  • Participated in retrospective meetings after every sprint to discuss about the overall ranking of the previous sprint and to discuss about the drawbacks and scope for development.
  • Working with Production/Support to solve various on going issue by looking into logs of different jobs.
  • Involved in writing shell scripts in order to load data into redshift tables from s3 buckets.
  • Developed scala code for performing various business logic to do data massaging and populate data into various hive table or redshift table.
  • Worked on various GitHub (version control system) commands to store project and keep track of changes to files.
  • Performing research and analysis of business problems and developing Technical solutions.
  • Performed various joins, aggregrations and transformations in spark scala dataframes and loading data into redshift for reporting purpose.
  • Using spark redshift connector processed data in hive and provided various structured data sets in redshift.
  • Used JIRA for creating the user stories and creating branches in the bitbucket repositories based on the story for development and used Service Now for creating various defects & change management.
  • Implemented compression techniques while storing data into s3 while processing in order to reduce storage space on the cluster.
  • Knowledge on visualizing various dashboards, metrics and filters.
  • Used Solr in order to maintain logs and setting log levels.
  • Worked on creating Hbase table and hive integration for having snapshot of the tables.

Environment: Hortonworks, HDFS, Scala, Hive, Spark, Oozie, Linux, Maven, Putty, HBase, S3 Buckets, Jira, GIT, ServiceNow, Redshift, Bitbucket.

Confidential

Software Developer

Responsibilities:

  • Developing Spark programs using Scala API’s to compare the performance of Spark with Hive and SQL.
  • Used Spark API over Hortonworks Hadoop yarn to perform analytics on data in Hive.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Used impala for querying HDFS data to achieve better performance.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD’s.
  • Implemented spark code in order to load various JSON and CSV files into HIVE tables.
  • Involved in creating various hive external and internal tables in order to store data in Hive.
  • Used orc format and snappy compression techniques to store data in Hive external tables.
  • Comparing and analyzing existing CSAT data with the data generated using

Confidential

Big Data Developer/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in importing the data from various data sources into HDFS using Sqoop and applying various transformations using Hive, apache Spark and then loading data into Hive tables or AWS S3 buckets.
  • Involved in moving data from various DB2 tables to AWS S3 buckets using Sqoop scripts.
  • Configuring splunk alerts in-order to get the log files while execution and storing them to a location in s3 bucket when cluster is running.
  • Involved in Hive/SQL queries performing spark transformations using spark RDDs in python(pyspark).
  • Writing oozie scripts in-order to schedule and automate the jobs in EMR cluster.
  • Used Bitbucket as a repository for storing the code and integrated with bamboo for integration purpose.
  • Experienced in bringing up EMR cluster and deploying code into the cluster in S3 buckets.
  • Migrated the existing on-prem code to AWS EMR cluster.
  • Experienced in using NoMachine and Putty in-order to SSH the EMR cluster and running spark-submit.
  • Developed Apache Spark Applications by using Scala, python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Developed a spark streaming application using KAFKA with a batch frequency of 10sec.
  • After pulling data from Kafka topic into data frame, filtered for bad records (non JSON, empty records) before calling Rest API to reduce redundant API calls.
  • Developing spark code using pyspark to applying various transformations and actions for faster data processing.
  • Knowledge on Kafka Producers and consumers for getting various transaction log files.
  • Knowledge on Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
  • Knowledge on Kafka topics and partition within the kafka broker which are used for distribution purpose.
  • Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
  • Developed various spark applications using pyspark and numpy.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Experience in working with Elastic MapReduce(EMR) and setting up environments on amazon AWS EC2 instances.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, stored in S3 Buckets.
  • Worked with different File Formats like textfile, avro, parquet for HIVE querying and processing based on business logic.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Involved in Test Driven Development writing unit and integration test cases for the code.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in developing code and generated various data frames based on the business requirement and created temporary tables in hive.
  • Experience in build scripts using Maven and did continuous system integrations like Bamboo.
  • Knowledge on Sonar in-order to validate the code and to follow coding standards.
  • Knowledge on reporting tools tableau and Qlickview in order to visualize data in various charts.

Environment: Cloudera, Map Reduce, HDFS, Scala, Hive, Sqoop, Spark, Oozie, Linux, Maven, control-M, Splunk, NoMachine, Putty, HBase, Python, AWS EMR Cluster, EC2 instances, S3 Buckets, STS, Bamboo, Bitbucket.

Confidential

Software Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developing spark code using pyspark applying various transformations and actions for
  • Involved in importing the data from various data sources into HDFS using Sqoop and applying various transformations using Hive, apache Spark and then loading data into Hive tables and AWS S3 buckets.
  • Involved in moving data from various DB2 tables to AWS S3 buckets using Sqoop scripts.
  • Involved in Hive/SQL queries performing spark transformations using spark RDDs and python(pyspark).
  • Used Putty -SSH Client to connect to the servers.
  • Delivering and maintaining technical support services for Systems applications and fixing defects/errors and compiling documentation.
  • Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
  • Writing oozie scripts in-order to schedule and automate the jobs in EMR cluster.
  • Used Bitbucket as a repository for storing the code and integrated with bamboo for integration purpose.
  • Experienced in bringing up EMR cluster and deploying code into the cluster in S3 buckets.
  • Developed Apache Spark Applications by using Scala, python and Implemented Apache Spark data processing project to handle data from various RDBMS.

Confidential

Software Developer

Responsibilities:

  • Involved in sqoop process which helps to import or export data from any relational DB to HDFS or viceversa.
  • Creating Hive tables and storing the data and processing them using HQL queries.
  • Managing and scheduling Jobs to run in a particular sequence in HDFS using Oozie workflows.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Written multiple Map Reduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Knowledge and experience working on Information Technology Projects based on JAVA, DBMS, and HTML.
  • Implemented Web application development project on all three business, persistence and presentation tiers of an E-commerce website based on JAVA.
  • Design and support RESTFUL API based web services for data distribution to downstream applications.
  • Used HTML, CSS, JavaScript for create web pages.
  • Involved in Database design and developing SQL Queries, stored procedures on MySQL.
  • Involved in developing and coding the Interfaces and classes required for the application and created appropriate relationships between the system classes and the interfaces provided.
  • Having good knowledge on various software development process like agile, waterfall model
  • Implemented the project according to the Software Development Life Cycle (SDLC).
  • Analyzing and Preparing the requirement Analysis Document.
  • Involved in requirement gathering, requirement analysis, defining scope, and design.

Environment: Hadoop, Map Reduce, Hive, Sqoop, Java, Oozie, Linux

We'd love your feedback!