We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Plano, TX

PROFESSIONAL SUMMARY:

  • Proactive IT developer with 9 years of working experience in Java/J2EE Technology and development design of various scalable systems using Hadoop Technologies on various environments.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works , and Cloudera (CDH3, CDH4 ) distributions on Amazon web services (AWS).
  • Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework.
  • Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra, and Mongo DB.
  • Configured Zookeeper, Cassandra and Flume to the existing Hadoop cluster.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL ( Queries), Pig Latin ( Data flow language ), and custom MapReduce programs in Java .
  • Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala .
  • Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Hands - on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
  • Experience in NoSQL Column-Oriented Databases like HBase , Cassandra and its Integration with Hadoop cluster.
  • Experience in maintaining the big data platform using open source technologies such as Spark and Elastic Search.
  • Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQL databases.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Good hands on experience in creating the RDD' s, DF's for the required input data and performed the data transformations using Spark Scala.
  • Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
  • Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO .
  • Experience in Service Oriented Architecture using Web Services like SOAP & Restful.

TECHNICAL SKILLS:

Big Data Eco systems: HDFS, Map Reduce, Hive, YARN, Pig, Sqoop, Kafka, Storm, Flume, Oozie, and ZooKeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr, Rabbit MQ, Scala.

No SQL Databases: Hbase, Cassandra, mongoDB

Programming Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, Python

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery, AngularJS

Frameworks: MVC, Struts, Spring, Hibernate

Version control: SVN, CVS

Business Intelligence Tools: Tableau, QlikView, Pentaho, IBM Cognos intelligence

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, Net Beans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer, IntelliJ.

Cloud Technologies: Amazon Web Services (AWS), CDH3, CDH4, CDH5, HortonWorks, Mahout, Microsoft Azure Insight, Amazon Redshift

PROFESSIONAL EXPERIENCE:

Confidential, Plano, TX

Sr. Data Engineer

Responsibilities:

  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark , Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL / Teradata .
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Used Spark-Streaming APIs to perform required transformations and actions on the learner data model which gets the data from Kafka in near real time.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Used File Broker to schedule workflows to run Spark jobs to transform data on a persistent schedule.
  • Experience developing, deploying Shell Scripts for automation/notification/monitoring.
  • Extensively used Apache Kafka , Apache Spark , HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience.
  • Worked on Performance tuning on Spark Application.
  • Worked with Apache Spark SQL and data frame functions to perform data transformations and aggregations on complex semi structured data.
  • Hands on experience in creating RDD s, transformations and actions while implementing Spark applications.

Environment : Hadoop, HDFS, Hive, Spark AWS EC2, S3, Kafka, Yarn, Shell Scripting, Scala, Agile methods, Linux, MySQL, Teradata

Confidential, Bellevue, WA

Sr. Bigdata Developer

Responsibilities:

  • Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
  • Utilized Spark - SQL to event enrichment and used Spark-SQL to prepare various levels of user behavior summaries.
  • Worked on SQS Queue receiver using Spark Streaming context to consume the data from extended queue and integrated with ETL Functions.
  • Real time streaming the data using Spark with SQS . Responsible for handling Streaming data from web server console logs.
  • Optimize the Hive tables using optimization techniques such as partitions and bucketing to provide better performance with HiveQL queries.
  • Worked on migrating data from traditional RDBMS to HDFS .
  • Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark .
  • Written Programs in Spark using Scala for Data quality check.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.

Environment : Hadoop, HDFS, Hive, Spark AWS EC2, S3, Kafka, Yarn, Shell Scripting, Scala, Pig, Oozie, Java, Agile methods, Linux, MySQL, Elastic Search, Kibana, Teradata.

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Developed Spark Applications by using Spark , Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS .
  • Involved in converting Hive / SQ L queries into Spark transformations using Spark RDD , Scala .
  • Used Spark SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled structured data using Spark SQ.
  • Imported data from AWS S3 into Spark RDD , Performed transformations and actions on RDD's.
  • Used Spark and Spark SQL to read the parquet data and create the tables in hive using the Scala API .
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/Map Reduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Processing the schema oriented and non-schema-oriented data using Scala and Spark .
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS .
  • Worked on streaming pipeline that uses Spark to read data from Kafka transform it and write it to HDFS.
  • Analyzed the weblog data using the HiveQL , integrated Oozie with the rest of the Hadoop stack Utilized cluster co-ordination services through Zookeeper .

Environment : Scala, Spark, Spark SQL, Spark Streaming, Azkaban, Presto, Hive, Apache Crunch, Elastic Search, GIT Repository, Amazon S3, Amazon AWS Ec2/EMR, Spark cluster, Hadoop Framework, Sqoop, DB2.

Confidential, Glendale, CA

Data Engineer

Responsibilities:

  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Design and develop ELT data pipeline using Spark App to fetch data from Legacy system and third-party API, social media sites.
  • Developed custom mappers in python script and Hive UDFs and UDAF s based on the given requirement.
  • Design and develop DMA (Disney Movies anywhere) dashboard for BI analyst team.
  • Perform data analytics and load data to Amazon s3 / Data Lake / Spark cluster .
  • Involved in querying data using Spark SQL on top of Spark engine.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Writing Pig and Hive scripts with UDF in MR and Python to perform ETL on AWS Cloud Services.
  • Worked with file formats text , avro , parquet and sequence files .
  • Involved in migrating HiveQL into Impala to minimize query response time.
  • Created Hive tables, dynamic partitions , buckets for sampling, and working on them using HQL .
  • Defined job flow using Azkaban , scheduler to automate the Hadoop jobs and installed Zookeepers for automatic node failovers.
  • Performed Tableau type conversion functions when connected to relational data sources.

Environment : Languages/Technologies: Java (JDK1.6 and higher), Azkaban, Spark SQL, Presto, Hive, Apache Crunch, Elastic Search, Spring boot, Eclipse, GIT Repository, Amazon S3, Amazon AWS Ec2/EMR, Spark cluster, Hadoop Framework, Sqoop.

Confidential, San Francisco, CA

Hadoop Developer

Responsibilities:

  • Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Integrated Elastic Search and implemented dynamic faceted-search.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
  • Reviewed basic SQL queries and edited inner, left, and right joins in Tableau Desktop by connecting live/dynamic and static datasets.

Environment: Hadoo p , Scala, Map Reduce, HDFS, Spark, Scala, Kafka, AWS, Apache SOLR, Hive, Cassandra, maven, Jenkins, Pig, UNIX, Python, MRUnit, Git.

Confidential, Mountain View, CA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop .
  • Worked in joining raw data with the data using Pig scripting.
  • Implemented DataStax Enterprise Search with Apache Solr .
  • Created java operators to process data using DAG streams and load data to HDFS.
  • Configured, Designed implemented and monitored Kafka cluster and connectors.
  • Developed ETL jobs using Spark-Scala to migrate data from Oracle to new hive tables.
  • Developed and Deployed applications using Apache Spark, Scala.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data. .
  • Helped in troubleshooting Scala problems while working with Micro Strategy to produce illustrative reports and dashboards along with ad-hoc analysis.
  • Developed Hive queries for the analysts and I have written scripts using Scala.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Continuous Integration environments in SCRUM and Agile methodologies.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Managed real time data processing and real time Data Ingestion in HBase and Hive using Storm.

Environment: Hadoop , HDFS, Pig, Hive, Oozie, HBase, Kafka, Apache SOLR, MapReduce, Apache SOLR, Sqoop, Storm, Spark, Scala, LINUX, Cloudera, Maven, Jenkins, Java, SQL.

Confidential, Tampa, Florida

Java/Hadoop Developer

Responsibilities:

  • Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API .
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Designed and developed Java batch programs in Spring Batch.
  • Installed and configured Pig and wrote Pig Latin scripts .
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
  • Involved in loading data from UNIX file system to HDFS.
  • Created java operators to process data using DAG streams and load data to HDFS.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Involved in Develop monitoring and performance metrics for Hadoop clusters.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: Hadoop , HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.

We'd love your feedback!