Bigdata/spark developer Resume Bentonville, Arkansas - Hire IT People

PROFESSIONAL SUMMARY:

5 years of work experience in IT, which includes 4+ years of experience in Development and Implementation of Hadoop, Data warehousing solutions.
Experience on druid ingestion and segments.
Experience on GCP services like dataproc.
Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala on Hortonworks.
Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
Good Knowledge in writing Spark Applications in PySpark and Scala using Dataframes.
Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Having working experience with Building RESTful web services, and RESTful API
Strong understanding of real time streaming technologies Spark and Kafka.
Strong understanding of Logical and Physical data base models and entity - relationship modeling.
Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
Experience in writing complex SQL queries, creating reports and dashboards.
Excellent analytical, communication and interpersonal skills.
Possess excellent communication, interpersonal and analytical skills along with positive attitude.

TECHNICAL SKILLS:

Programming/Scripting Languages: Scala, PySpark, Python, SQL

Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kafka, Nifi

Other tools: VM ware, Git

Databases: NoSQL, Oracle, MYSQL, Apache-Cassandra, HBase

Big data Eco System: HDFS, Oozie, Zookeeper, Spark, SQL, Spark streaming, Hue, Ambari, Impala.

File Formats: Txt, XML, JSON, Avro, Parquet, ORC

Cloud Computing: Google cloud, AWS

Visualization and Reporting Tools: Tableau

PROFESSIONAL EXPERIENCE:

Confidential - Bentonville, Arkansas

Bigdata/spark developer

Responsibilities:

Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS.
Worked on creating Hive managed and external tables based on the requirement.
Implemented Partitioning and Bucketing on Hive tables for better performance.
Used Spark-SQL to process the data and to run on Spark engine.
Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
Worked on various file formats like Parquet, Json and ORC.
Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
Worked with external vendors or partners to onboard external data into Target GCS buckets.
Worked on Oozie to develop workflows to automate ETL data pipeline.
Worked visualization tools like Google visual studio and internal tool like Domo.
Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
Imported data from various sources into Spark RDD for analysis.
Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
Imported data from hive to gcs buckets and later ingested with druid.
Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Assisted with data capacity planning and node forecasting.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.

Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau, GCP, dataproc, druid.

Confidential - Atlanta, Georgia

Hadoop developer

Responsibilities:

Developed Hive scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing the data using One Automation.
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
Generate reports using Tableau.
Utilized SQOOP, ETL and Hadoop File System API’s for implementing data ingestion pipelines
Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly.
Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager
Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari and Hive.
Optimizing Hive queries by parallelizing with portioning and bucketing.
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC.
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs.
Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making
Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager
Used Agile Scrum methodology/ Scrum Alliance for development

Environment: Hadoop, HDFS, AWS, cloudera, Scala, Kafka, MapReduce, YARN, Drill, Spark, Hive, Scala, Java, NiFi, HBase, MySQL, Kerberos, Maven

Confidential

Hadoop Developer

Responsibilities:

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
Performed ETL operations using Hive to transform transactional data into de-normalized form.
Created adhoc reports by gathering requirements from different teams.
Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
Implemented Hive generic UDFs to incorporate business logic into Hive queries.
Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
Exported data to relational databases using Sqoop for visualization and to generate reports.
Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Shared responsibility for administration of Hadoop and Hive.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.

Confidential

Java/J2EE Developer

Responsibilities

Analyzed and reviewed client requirements and design
Worked on testing, debugging and troubleshooting all types of technical issues.
Good knowledge in OOPS concepts
Used JDBC for database connectivity and manipulation
Used Eclipse for the Development, Testing and Debugging of the application.
Working as java j2ee backend developer in creating the Maven web application project
Built the application using MAVEN and deployed using WebSphere Application server.
Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
Developed Custom Tags to simplify the JSP code.
Designed UI screens using JSP and HTML.

Environment: Java, HTML, Servlets, Oracle DB, SQL, Jasper Reports, Maven, Jenkins.

We provide IT Staff Augmentation Services!

Bigdata/spark Developer Resume

Bentonville, ArkansaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship