Big Data Developer Resume

SUMMARY

7+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
4+ years of comprehensive experience as a Hadoop Developer.
Data analysis, Data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming.
Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
Experience in working with Tez, MapReduce programs using Apache Hadoop.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
Good understanding of Data Mining and Machine Learning techniques.
Good experience in writing Spark applications using python and Scala.
Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in python.
Extensive experience with SQL, PL/SQL and database concepts.
Knowledge of NoSQL databases such as HBase and MongoDB.
Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper.
Experience with databases like DB2, Oracle 8g, MySQL, SQL Server and MS Access.
Strong programming skills in Core Java, and J2EE technologies.
Strong Experience in Object Oriented Design and Analysis, Iterative Agile Programming Methodologies and Test - Driven Development and Maintenance (TDD).
Experienced in Web Services approach for Service Oriented Architecture (SOA).
Extensive use of Open-Source Software such as Web/Application Servers like Apache Tomcat 6.0 and Eclipse 3.x IDE.
Experience in communicating with team members, discuss the designs and solutions to the problems.

TECHNICAL SKILLS

Programming Languages: Python Scala and JAVA.

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Impala.

Database Systems: Oracle, SQL, MS-SQL Server, MS-Access

Operating system: Linux, Unix, windows7/8/9

Programming Tools: Eclipse 2.1/3.7, Visual studio.

PROFESSIONAL EXPERIENCE

Big Data Developer

Confidential

Responsibilities:

Analyzed large data sets by running Hive queries and Pig scripts.
Migrated an existing on-premises application toAWS. UsedAWSservices likeEC2andS3for small data sets processing and storage,Experiencedin Maintaining the Hadoop cluster onAWS EMR.
Imported data fromAWS S3intoSpark RDD,PerformedtransformationsandactionsonRDD's.
Developed Spark applications using Scalaand Spark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Developed the strategy/implementation of Hadoop Impala integration with existing ecosystem of RDBMS using Apache spark.
Used AWS Glue for (ETL) extraction, transformation and loading data from heterogeneous source systems.
Involved in creating Hive tables and loading and analyzing data using hive queries/ Spark queries and new data is added into the files with control of Impala
Developed Simple to complex MapReduce in Scala Jobs using Hive and Spark.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Developed multiple MapReduce jobs in scala for data processing and cleaning.
Involved in loading data from LINUX file system to HDFS and Responsible for managing data from multiple sources.
Installed and configured Hive and also written Hive UDFs.
Extracted files from oracle through Sqoop and placed in Spark and processed.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Load and transform large sets of structured, semi structured and unstructured data and Assisted in exporting analyzed data to relational databases using Sqoop

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, impala, Kudu, LINUX, and Aws cloud.

Big Data Developer

Confidential

Responsibilities:

Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
BuiltS3buckets and managed policies for S3 buckets and usedS3 bucketandGlacierfor storage and backup onAWS.
Involved in ETL file movements betweenHDFSandAWS S3and extensively worked withS3 bucketinAWS.
Involved in convertingHive/SQL queriesinto Spark Transformations using Spark RDD’s and Apache PySpark
Involved in analyzing system failures, identifying root causes, and recommended course of actions.
Converted allHadoopjobs to run inAWS EMRby configuring the cluster according to the data size.
Designed a data processing warehouse using impala.
Developed Hive Scripts to extract the data from the web server output files to load into HDFS.
Implemented custom UDF for Confidential Kudu then Developed Hive UDF’S to pre-process the data for analysis and Develop Spark for the analysts.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Spark.
Co-ordination of AWS cluster services through Zookeeper.
Collected the logs data from web servers and integrated in to HDFS using impala.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs in scala given by the users.
Managed and reviewed Hadoop log files and Spark to analyze point-of-sale data and coupon usage.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
Worked with highly engaged Informatics, Scientific Information Management and enterprise IT teams.

Environment: Hadoop, HBase, HDFS, Hive, Spark, Spark Sql, Pig, Zookeeper, Oozie, Impala, AWS.

Big Data Developer

Confidential

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
UsedSpark API over Hadoop Yarnto perform analytics on data and monitor scheduling.
Experienced in performance tuning of Spark Applications for setting right Batch Interval Time, correct level of Parallelism and memory tuning.
Involved in data ingestion from relational databases into HDFS using Sqoop and Data processing and data enrichment is done using HiveQL.
Build exception files for all non-compliant data using hive and Created Hive External table for Semantic data and loaded the data into tables and query data using HQL.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and generated output is stored, used for creating Spark Data Frames for further analysis
Load and transform large sets of structured, semi structured and unstructured data into HDFS using Hive and Zookeeper
Responsible for creating Hive tables, loading data and writing hive queries.
Handled importing data from various data sources, performed transformations using Hive, scala Map Reduce and Apache Spark loaded data into HDFS.
Performed Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
Worked and learned a great deal fromAWSCloud services likeEC2, S3, EBS, RDSandVPC.
Migrated an existing on-premises application toAWS using Glue ETL. UsedAWSservices likeEC2andS3for small data sets processing and storage,Experiencedin Maintaining the Hadoop cluster onAWS EMR.
Extracted the data from Teradata into HDFS using the Query grid and Sqoop.
Installed Oozie workflow engine to run multiple Hive and Apache Spark jobs which run independently with time and data availability.
Managed and reviewed Hadoop log files.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Map Reduce, HBase, Imapal,UNIX Shell Scripting.

Big Data Developer

Confidential

Responsibilities:

Responsible for loading customer's data and event logs intoHDFSusing Sqoop and Teradata query grid.
CreatedHivetables to store variable data formats of input data coming from different portfolios.
Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
Used Apache spark API over Horton work Hadoop YARN to perform analytics on data in hive.
Optimized HIVE analytics SQL queries and achieve job performance Tuning.
Developed spark code using scala and spark SQL for faster testing and processing of data.
Involved in importing and exporting data (SQL Server, Oracle, csv and text file) from local/external file system and RDBMS to HDFS.
Write Apache Spark Map Reduce code in scala for data processing when consuming unstructured text data.
Exported data Structure from Oracle using Sqoop and Analyzed data using Hadoop components like Hive.
Used Zookeeper to co-ordinate cluster services. Installed Oozie workflow engine to run multiple Hive.
Created and maintained Technical documentation for launching HADOOP Kudu Clusters and for executing Hive queries and Pig Scripts.
Designed a data processing warehouse using hive, created and managed Hive tables in Hadoop.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Map Reduce, HBase, UNIX Shell Scripting.

Java Data Developer

Confidential

Responsibilities:

Involved in the analysis, design, implementation, and testing of the project.
Developed this application based on MVC Architecture using open-source spring.
Implemented the presentation layer with HTML and JavaScript.
Developed web components using JSP, Servlets and JDBC.
Developed Client-Side Validations using Java Script.
Deployed the applications on Web sphere Application Server
Wrote complex SQL queries and stored procedures.
Involved in templates and screens in HTML and JavaScript.
Involved in fixing bugs and unit testing with test cases using Junit.
Actively involved in the system testing and implementing service layer using Spring IOC.
Spring framework AOP features were extensively used and Monitored logs by using LOG4J.
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects

Environment: Java, J2EE 1.4, Servlets 3.0, JDBC, JavaScript, spring 2.0, MySQL 5.0, JUnit, Eclipse IDE 2.1.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship