Bigdata/spark Developer Resume
Bentonville, ArkansaS
PROFESSIONAL SUMMARY:
- 5 years of work experience in IT, which includes 4+ years of experience in Development and Implementation of Hadoop, Data warehousing solutions.
- Experience on druid ingestion and segments.
- Experience on GCP services like dataproc.
- Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala on Hortonworks.
- Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
- Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
- Good Knowledge in writing Spark Applications in PySpark and Scala using Dataframes.
- Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Having working experience with Building RESTful web services, and RESTful API
- Strong understanding of real time streaming technologies Spark and Kafka.
- Strong understanding of Logical and Physical data base models and entity - relationship modeling.
- Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
- Experience in writing complex SQL queries, creating reports and dashboards.
- Excellent analytical, communication and interpersonal skills.
- Possess excellent communication, interpersonal and analytical skills along with positive attitude.
TECHNICAL SKILLS:
Programming/Scripting Languages: Scala, PySpark, Python, SQL
Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kafka, Nifi
Other tools: VM ware, Git
Databases: NoSQL, Oracle, MYSQL, Apache-Cassandra, HBase
Big data Eco System: HDFS, Oozie, Zookeeper, Spark, SQL, Spark streaming, Hue, Ambari, Impala.
File Formats: Txt, XML, JSON, Avro, Parquet, ORC
Cloud Computing: Google cloud, AWS
Visualization and Reporting Tools: Tableau
PROFESSIONAL EXPERIENCE:
Confidential - Bentonville, Arkansas
Bigdata/spark developer
Responsibilities:
- Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
- Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
- Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS.
- Worked on creating Hive managed and external tables based on the requirement.
- Implemented Partitioning and Bucketing on Hive tables for better performance.
- Used Spark-SQL to process the data and to run on Spark engine.
- Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
- Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
- Worked on various file formats like Parquet, Json and ORC.
- Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
- Worked with external vendors or partners to onboard external data into Target GCS buckets.
- Worked on Oozie to develop workflows to automate ETL data pipeline.
- Worked visualization tools like Google visual studio and internal tool like Domo.
- Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
- Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
- Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
- Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
- Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
- Imported data from various sources into Spark RDD for analysis.
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
- Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
- Imported data from hive to gcs buckets and later ingested with druid.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.
Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau, GCP, dataproc, druid.
Confidential - Atlanta, Georgia
Hadoop developer
Responsibilities:
- Developed Hive scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing the data using One Automation.
- Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
- Generate reports using Tableau.
- Utilized SQOOP, ETL and Hadoop File System API’s for implementing data ingestion pipelines
- Worked on Batch data of different granularity ranging from hourly, Daily to weekly and monthly.
- Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager
- Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari and Hive.
- Optimizing Hive queries by parallelizing with portioning and bucketing.
- Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC.
- Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs.
- Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making
- Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
- Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager
- Used Agile Scrum methodology/ Scrum Alliance for development
Environment: Hadoop, HDFS, AWS, cloudera, Scala, Kafka, MapReduce, YARN, Drill, Spark, Hive, Scala, Java, NiFi, HBase, MySQL, Kerberos, Maven
Confidential
Hadoop Developer
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
- Performed ETL operations using Hive to transform transactional data into de-normalized form.
- Created adhoc reports by gathering requirements from different teams.
- Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
- Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
- Implemented Hive generic UDFs to incorporate business logic into Hive queries.
- Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
- Exported data to relational databases using Sqoop for visualization and to generate reports.
- Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop and Hive.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.
Confidential
Java/J2EE Developer
Responsibilities
- Analyzed and reviewed client requirements and design
- Worked on testing, debugging and troubleshooting all types of technical issues.
- Good knowledge in OOPS concepts
- Used JDBC for database connectivity and manipulation
- Used Eclipse for the Development, Testing and Debugging of the application.
- Working as java j2ee backend developer in creating the Maven web application project
- Built the application using MAVEN and deployed using WebSphere Application server.
- Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
- Developed Custom Tags to simplify the JSP code.
- Designed UI screens using JSP and HTML.
Environment: Java, HTML, Servlets, Oracle DB, SQL, Jasper Reports, Maven, Jenkins.