We provide IT Staff Augmentation Services!

Big Data/hadoop Engineer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • 8+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Big Data and Spark related technologies.
  • 4 years of experience in Hadoop and Big Data related technologies as a Developer, Administrator.
  • Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, Hadoop, StreamSets MySql, Cloudera, MongoDB, UNIX Shell Scripting, PIG scripting, Hive, FLUME, Zookeeper, Sqoop, Oozie. Python, Spark, Git and a variety of RDBMS in the UNIX and Windows environment, agile methodology.
  • Hands on experience with Cloudera and Apache Hadoop distributions.
  • Followed Test driven development of Agile Methodology to produce high quality software.
  • Worked with Data ingestion, querying, processing and analysis of big data.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • Experienced in Architecture and Capacity planning for MongoDB clusters.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
  • Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
  • Experience in loading data from LINUX file system to HDFS using Sqoop.
  • Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
  • Worked with different services provided by Openstack such as Neutron, Nova and Cinder.
  • Ability to work effectively in cross - functional team environments.
  • Ability to learn new technologies and to deliver outputs in short deadlines.
  • Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
  • Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames
  • Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing.
  • Hands on experience with Real-time Streaming using Kafka & Spark into HDFS.
  • Ability to spin up different Openstack instances using cloud formation templates.
  • Experience with Cloud Infrastructure like Confidential Cloud Services.
  • Hands on experience with NoSQL Databases like HBase for performing analytical operations.
  • Hands on experience on writing Queries, Stored procedures, Functions and Triggers.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data/Hadoop Engineer

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced developing and maintaining ETL jobs.
  • Performed data profiling and transformation on the raw data using Pig, Python, and oracle
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Extracted files from Mongo DB through Sqoop, placed in HDFS, and processed.
  • Developed complete end to end Big-data processing in hadoop eco system.
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
  • Configured Streamsets to store the converted data to SQL SERVER using JDBC drivers.
  • Converted some existing hive scripts to Spark applications using RDD's for transforming data and loading into HDFS.
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like snappy, Gzip and zlib.
  • Extensively used Hive optimization techniques like partitioning, bucketing, MapJoin and parallel execution.
  • Worked with different tools to verify the Quality of the data transformations.
  • Creation, configuration and monitoring Shards sets. Analysis of the data to be shared, choosing a shard Key to distribute data evenly.
  • Worked with Spark-SQL context to create data frames to filter input data for model execution.
  • Configured the setup of Development and PROD environment.
  • Worked Extensively with Linux platform to setup the server.
  • Extensively Worked on Amazon S3 for data storage and retrieval purposes.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.

Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.2, R, CDH 5.8.2, Spring 3.0.4, Gradle 2.13, Hive, HDFS, Sqoop 1.4.3, Kafka, MongoDB, UNIX Shell Scripting, Python 2.6, Solar, Green Plum.

Confidential, Raleigh, NC

Big Data/Cloud Engineer

Responsibilities:

  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’S and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Configured replication with replica set factors, arbiters, voting, priority, server distribution, slave delays in MongoDB.
  • Installation of MongoDB RPM's, Tar Files and preparing YAML config files.
  • Performed Data Migration between multiple environments using mongodump and mongo restore commands.
  • Evaluating the Indexing strategies to support queries and sort documents using index keys.
  • Experience with OpenStack Cloud Platform.
  • Good Understanding of Spark.
  • Experience in converting Hive/SQL queries into Spark transformations using Scala.
  • Optimizing the cluster overall performance by caching or persisting and unpersisting the data.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, solar, Java, MySQL, Spark, Kafka, Scala.

Confidential, Dallas, TX

Big Data/Hadoop Engineer

Responsibilities:

  • Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Extracted data from Oracle database transformed and loaded into Green Plum database according to the Business specifications.
  • Created Mappings to move data from Oracle, SQL Server to new Data Warehouse in Green Plum.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Good experience with continuous Integration of application using Jenkins
  • Used AWS services like EC2 and S3 for small data sets.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Participated in requirement gathering of the project in documenting the business requirements.
  • Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
  • Experienced in working with Cloud Computing Services, Networking between different Tenants.
  • Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
  • Developed HIVE queries to analyze the data imported to hdfs.
  • Worked with Sqoop commands to import the data from different databases.
  • Experience with building kafka cluster setup required for the environment.
  • Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
  • Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Involved in Installing, Configuring Hadoop components using CDH 5.2 Distribution.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process
  • Involved in efficiently collecting and aggregating large amounts of streaming log data into Hadoop Cluster using Apache Flume.
  • Able to integrate state-of-the-art Big Data technologies into the overall architecture and lead a team of developers through the construction, testing and implementation phase. Involved in gathering the requirements, designing, development and testing
  • Good understanding in writing Linux Scripts.
  • Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON, Avro data files and sequence files for log files.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS. Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.

Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Shell Scripting, Oozie, Pig, Hive, Impala, HBase, Spark, Linux, Java.

Confidential, Atlanta, GA

Java / Hadoop Developer

Responsibilities:

  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, CSV and XML formats
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
  • Involved in implementing security on Cloudera Hadoop Cluster using with working along with operations team to move non-secured cluster to secured cluster.
  • Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in map, reduce way.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
  • Written Hive join query to fetch info from multiple tables, written multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Performed Filesystem management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
  • Used Apache Maven 3.x to build and deploy application to various environments
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions

Environment: : Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, MongoDB, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, Eclipse, SQl.

Confidential, TX

Java/J2EE Developer

Responsibilities:

  • Involved in Analyzing, preparing technical design specification documents as per the Requirements,
  • Involved in study of User Requirement Specification, Communicated with Business Analysts to resolve ambiguity in Requirements document. Handled performance issues and worked on background job, which executes huge records.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
  • Sending Email Alerts to supporting team using BMC m send.
  • Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used.
  • Developed nightly batch jobs which involved interfacing with external third-party state agencies.
  • Involved in configuration of Spring MVC and Integration with Hibernate.
  • Normalized Oracle database, conforming to design concepts and best practices.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Used Core java and object-oriented concepts.
  • Developed JavaScript behavior code for user interaction.
  • Created database program in SQL server to manipulate data accumulated by internet transactions.
  • Wrote Servlets class to generate dynamic HTML pages.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in developing web pages using HTML and JSP.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.

Environment: Java, XML, HTML, JavaScript, JDBC, CSS, SQL, PL/SQL, XML, Web MVC, Eclipse, Ajax, JQuery, spring with Hibernate, Active MQ, Jasper Reports, Ant as build tool and My SQL and Apache Tomcat

We'd love your feedback!