Big Data Engineer Resume
2.00/5 (Submit Your Rating)
FloridA
SUMMARY
- 6+ years of experience in Bigdata ecosystem related technologies.
- Hands - on-Experience on major components in Bigdata ecosystem Hadoop, HDFS, MapReduce, Spark, Spark streaming, Kafka, Hive, Pig, HBase and Impala.
- Having very good knowledge on SQL.
- Have experience working with Terabytes of data.
- Experience in data management and implementation of Bigdata applications using Spark and Hadoop frameworks.
- Good understanding of Spark Data frames and RDD.
- Experience in writing Map/Reduce programs to handle complex business logics.
- Experience in analyzing data using Spark-SQL, HiveQL and PIG Latin.
- Experience in extending the Hive functionality by writing custom UDF's in JAVA and Python.
- Good understanding of NoSQL databases like HBase and Cassandra.
- Experienced in Design, Development, Testing and Maintenance of various Data Warehousing and Business Intelligence (BI) applications in complex business environments.
- Well versed in Conceptual, Logical/Physical, Relational, and Multi-dimensional modeling, Data analysis for Decision Support Systems (DSS), Data Transformation (ETL) and Reporting.
- Proficient in developing Entity-Relationship diagrams, Star/Snow Flake Schema Designs, and expert in modeling Transactional Databases and Data Warehouse.
- Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning, and System Testing.
- Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
- Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, creation of cubes.
- Excellent technical, logical, code debugging and problem-solving capabilities and ability to watch the future environment, the competitor and customer's probable activities carefully.
- Efficient in generating reports using Tableau, Excel & Crystal reports.
- Good team player with strong analytical and communication skills.
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Prepared High Level Logical Data Models and BRDs (Business Requirement Documents), supporting documents.
PROFESSIONAL EXPERIENCE
Confidential, Florida
Big data Engineer
Responsibilities:
- Analyzing the data to find out pattern of users attention on various actions.
- Involved in several client meetings to understand the problem requirements and other requirements.
- Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like MapReduce, Hive, Pig, HBase, Flume and Sqoop.
- Developed MapReduce Programs for data parsing and data cleaning.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using flume to extract the data from weblogs and store in HDFS.
- Imported and exported Data from/to Different Relational Data Sources like RDBMS to/from HDFS using Sqoop and Connectors.
- Created Hive/HBase table schemas, data loading and report generation.
- Delivering the reports and getting feedback from customer to enhance the analysis.
- Implemented partitioning and bucketing in Hive for efficient data access.
- Developed customized UDF's in Java where the functionality is too complex.
- Integrated the Hive warehouse with HBase.
- Worked on MapReduce jobs to standardize the data and clean it to calculate aggregations.
- Developed Pig Latin scripts to sort, group, join and filter enterprise data.
- Used Sqoop to do incremental import/export data from RDBMS to HDFS and vice versa.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Generated reports using Tableau, Excel & crystal reports.
Confidential, Ohio
Data Analyst
Responsibilities:
- Involved in designing, development of Spark environment.
- Working on a 14 node cluster which runs HDFS, Spark, Hive, Imapala, Sqoop and Oozie.
- Involved in loading millions of structured and un-structured documents onto HDFS.
- Involved in writing Spark RDDs, DataFrames and Datasets to extract, transform and store the data from structured and un-structured files using Scala.
- Involved in writing Spark SQL, Hive and Impala queries to analyze ad-hoc data from structured as well as semi-structured data.
- Involved in creating Sqoop Exports to move data from HDFS to Oracle database and vice-versa.
- Involved in creating Oozie workflow scheduling.
- Identified/documented data sources and transformation rules required to populate and maintain data warehouse content.
- Created mapping document to map the data from source to target.
- Assisted in designing the overall ETL strategy.
- Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
- Used the DataStage Designer to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Datamart’s.
- Generated reports using Tableau, Excel & crystal reports.
Confidential
Big data Engineer
Responsibilities:
- Extensively worked in building a Data collection platform with a wide range of tech stack (Kafka, Spark streaming, Solace, Elasticsearch, HBase)
- Developed Kafka-spark streaming jobs to read real time streaming messages from Kafka topics and produce them to Solace topics and write the data onto HDFS with zero data loss.
- Developed solace-spark streaming jobs to consume real time messages with zero data loss from solace queues and write it on to HDFS, Elasticsearch, HBase/M7.
- Contributed in developing a Solace utility to produce and consume messages to Solace topics/queues.
- Helped in building a development Hadoop cluster with MapR distribution which ranged from 40 nodes.
- Worked in developing REST API's to collect clickstream/service/event logs real time data from various end points.
- Worked and supported an API that collects mobile clickstream data from Mix panel endpoint and developed/supported python scripts to validate data, to check data quality, to flatten the JSON messages in to delimited files and used Hadoop streaming to ingest the flatted data into the HDFS using the internal data ingestion framework.
- Extensively worked in ingesting data from various sources in to the enterprise Bigdata warehouse.
- Developed python scripts and used Hadoop streaming to load data into Elasticsearch to serve as an input to drill down Dashboard interfaces which were extensively used by Business users.
- As part of POC's extensively worked on HiveQL and Spark-SQL in analyzing the various categories of data to determine customer spending behavior, recommend card upgrade to customers and finding insights from CSP (Customer Service Profession) data.
- Developed Hive UDF's in Java and Python to implement business logic.
- Worked on various business requirements in analyzing the data using Hive and load the output to Elasticsearch for users to make business decisions.
- Expertise in delivering reports like Tableau for visualization & Crystal reports