We provide IT Staff Augmentation Services!

Bigdata/spark Developer Resume

3.00/5 (Submit Your Rating)

Bentonville, ArkansaS

PROFESSIONAL SUMMARY:

  • 5 years of work experience in IT, which includes 4+ years of experience in Development and Implementation of Hadoop, Data warehousing solutions.
  • Experience on druid ingestion and segments.
  • Experience on GCP services like dataproc.
  • Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala on Hortonworks.
  • Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
  • Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
  • Good Knowledge in writing Spark Applications in PySpark and Scala using Dataframes.
  • Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Having working experience with Building RESTful web services, and RESTful API
  • Strong understanding of real time streaming technologies Spark and Kafka.
  • Strong understanding of Logical and Physical data base models and entity - relationship modeling.
  • Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Excellent analytical, communication and interpersonal skills.
  • Possess excellent communication, interpersonal and analytical skills along with positive attitude.

TECHNICAL SKILLS:

Programming/Scripting Languages: Scala, PySpark, Python, SQL

Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kafka, Nifi

Other tools: VM ware, Git

Databases: NoSQL, Oracle, MYSQL, Apache-Cassandra, HBase

Big data Eco System: HDFS, Oozie, Zookeeper, Spark, SQL, Spark streaming, Hue, Ambari, Impala.

File Formats: Txt, XML, JSON, Avro, Parquet, ORC

Cloud Computing: Google cloud, AWS

Visualization and Reporting Tools: Tableau

PROFESSIONAL EXPERIENCE:

Confidential - Bentonville, Arkansas

Bigdata/spark developer

Responsibilities:

  • Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
  • Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
  • Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS.
  • Worked on creating Hive managed and external tables based on the requirement.
  • Implemented Partitioning and Bucketing on Hive tables for better performance.
  • Used Spark-SQL to process the data and to run on Spark engine.
  • Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
  • Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
  • Worked on various file formats like Parquet, Json and ORC.
  • Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
  • Worked with external vendors or partners to onboard external data into Target GCS buckets.
  • Worked on Oozie to develop workflows to automate ETL data pipeline.
  • Worked visualization tools like Google visual studio and internal tool like Domo.
  • Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
  • Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
  • Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
  • Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
  • Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
  • Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
  • Imported data from various sources into Spark RDD for analysis.
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
  • Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
  • Imported data from hive to gcs buckets and later ingested with druid.
  • Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.

Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau, GCP, dataproc, druid.

Confidential - Atlanta, Georgia

Hadoop developer

Responsibilities:

  • Developed Hive scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing the data using One Automation.
  • Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
  • Generate reports using Tableau.
  • Utilized SQOOP, ETL and Hadoop File System API’s for implementing data ingestion pipelines
  • Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly.
  • Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager
  • Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari and Hive.
  • Optimizing Hive queries by parallelizing with portioning and bucketing.
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC.
  • Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs.
  • Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making
  • Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
  • Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager
  • Used Agile Scrum methodology/ Scrum Alliance for development

Environment: Hadoop, HDFS, AWS, cloudera, Scala, Kafka, MapReduce, YARN, Drill, Spark, Hive, Scala, Java, NiFi, HBase, MySQL, Kerberos, Maven

Confidential

Hadoop Developer

Responsibilities:

  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
  • Performed ETL operations using Hive to transform transactional data into de-normalized form.
  • Created adhoc reports by gathering requirements from different teams.
  • Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
  • Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
  • Implemented Hive generic UDFs to incorporate business logic into Hive queries.
  • Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
  • Exported data to relational databases using Sqoop for visualization and to generate reports.
  • Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop and Hive.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.

Confidential

Java/J2EE Developer

Responsibilities

  • Analyzed and reviewed client requirements and design
  • Worked on testing, debugging and troubleshooting all types of technical issues.
  • Good knowledge in OOPS concepts
  • Used JDBC for database connectivity and manipulation
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Working as java j2ee backend developer in creating the Maven web application project
  • Built the application using MAVEN and deployed using WebSphere Application server.
  • Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
  • Developed Custom Tags to simplify the JSP code.
  • Designed UI screens using JSP and HTML.

Environment: Java, HTML, Servlets, Oracle DB, SQL, Jasper Reports, Maven, Jenkins.

We'd love your feedback!