Big Data Developer Resume
SUMMARY
- 7+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- 4+ years of comprehensive experience as a Hadoop Developer.
- Data analysis, Data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
- Experience in working with Tez, MapReduce programs using Apache Hadoop.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Good understanding of Data Mining and Machine Learning techniques.
- Good experience in writing Spark applications using python and Scala.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in python.
- Extensive experience with SQL, PL/SQL and database concepts.
- Knowledge of NoSQL databases such as HBase and MongoDB.
- Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper.
- Experience with databases like DB2, Oracle 8g, MySQL, SQL Server and MS Access.
- Strong programming skills in Core Java, and J2EE technologies.
- Strong Experience in Object Oriented Design and Analysis, Iterative Agile Programming Methodologies and Test - Driven Development and Maintenance (TDD).
- Experienced in Web Services approach for Service Oriented Architecture (SOA).
- Extensive use of Open-Source Software such as Web/Application Servers like Apache Tomcat 6.0 and Eclipse 3.x IDE.
- Experience in communicating with team members, discuss the designs and solutions to the problems.
TECHNICAL SKILLS
Programming Languages: Python Scala and JAVA.
Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Impala.
Database Systems: Oracle, SQL, MS-SQL Server, MS-Access
Operating system: Linux, Unix, windows7/8/9
Programming Tools: Eclipse 2.1/3.7, Visual studio.
PROFESSIONAL EXPERIENCE
Big Data Developer
Confidential
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts.
- Migrated an existing on-premises application toAWS. UsedAWSservices likeEC2andS3for small data sets processing and storage,Experiencedin Maintaining the Hadoop cluster onAWS EMR.
- Imported data fromAWS S3intoSpark RDD,PerformedtransformationsandactionsonRDD's.
- Developed Spark applications using Scalaand Spark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Developed the strategy/implementation of Hadoop Impala integration with existing ecosystem of RDBMS using Apache spark.
- Used AWS Glue for (ETL) extraction, transformation and loading data from heterogeneous source systems.
- Involved in creating Hive tables and loading and analyzing data using hive queries/ Spark queries and new data is added into the files with control of Impala
- Developed Simple to complex MapReduce in Scala Jobs using Hive and Spark.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed multiple MapReduce jobs in scala for data processing and cleaning.
- Involved in loading data from LINUX file system to HDFS and Responsible for managing data from multiple sources.
- Installed and configured Hive and also written Hive UDFs.
- Extracted files from oracle through Sqoop and placed in Spark and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data and Assisted in exporting analyzed data to relational databases using Sqoop
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, impala, Kudu, LINUX, and Aws cloud.
Big Data Developer
Confidential
Responsibilities:
- Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
- BuiltS3buckets and managed policies for S3 buckets and usedS3 bucketandGlacierfor storage and backup onAWS.
- Involved in ETL file movements betweenHDFSandAWS S3and extensively worked withS3 bucketinAWS.
- Involved in convertingHive/SQL queriesinto Spark Transformations using Spark RDD’s and Apache PySpark
- Involved in analyzing system failures, identifying root causes, and recommended course of actions.
- Converted allHadoopjobs to run inAWS EMRby configuring the cluster according to the data size.
- Designed a data processing warehouse using impala.
- Developed Hive Scripts to extract the data from the web server output files to load into HDFS.
- Implemented custom UDF for Confidential Kudu then Developed Hive UDF’S to pre-process the data for analysis and Develop Spark for the analysts.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Spark.
- Co-ordination of AWS cluster services through Zookeeper.
- Collected the logs data from web servers and integrated in to HDFS using impala.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs in scala given by the users.
- Managed and reviewed Hadoop log files and Spark to analyze point-of-sale data and coupon usage.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
- Worked with highly engaged Informatics, Scientific Information Management and enterprise IT teams.
Environment: Hadoop, HBase, HDFS, Hive, Spark, Spark Sql, Pig, Zookeeper, Oozie, Impala, AWS.
Big Data Developer
Confidential
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- UsedSpark API over Hadoop Yarnto perform analytics on data and monitor scheduling.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval Time, correct level of Parallelism and memory tuning.
- Involved in data ingestion from relational databases into HDFS using Sqoop and Data processing and data enrichment is done using HiveQL.
- Build exception files for all non-compliant data using hive and Created Hive External table for Semantic data and loaded the data into tables and query data using HQL.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and generated output is stored, used for creating Spark Data Frames for further analysis
- Load and transform large sets of structured, semi structured and unstructured data into HDFS using Hive and Zookeeper
- Responsible for creating Hive tables, loading data and writing hive queries.
- Handled importing data from various data sources, performed transformations using Hive, scala Map Reduce and Apache Spark loaded data into HDFS.
- Performed Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Worked and learned a great deal fromAWSCloud services likeEC2, S3, EBS, RDSandVPC.
- Migrated an existing on-premises application toAWS using Glue ETL. UsedAWSservices likeEC2andS3for small data sets processing and storage,Experiencedin Maintaining the Hadoop cluster onAWS EMR.
- Extracted the data from Teradata into HDFS using the Query grid and Sqoop.
- Installed Oozie workflow engine to run multiple Hive and Apache Spark jobs which run independently with time and data availability.
- Managed and reviewed Hadoop log files.
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Map Reduce, HBase, Imapal,UNIX Shell Scripting.
Big Data Developer
Confidential
Responsibilities:
- Responsible for loading customer's data and event logs intoHDFSusing Sqoop and Teradata query grid.
- CreatedHivetables to store variable data formats of input data coming from different portfolios.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Used Apache spark API over Horton work Hadoop YARN to perform analytics on data in hive.
- Optimized HIVE analytics SQL queries and achieve job performance Tuning.
- Developed spark code using scala and spark SQL for faster testing and processing of data.
- Involved in importing and exporting data (SQL Server, Oracle, csv and text file) from local/external file system and RDBMS to HDFS.
- Write Apache Spark Map Reduce code in scala for data processing when consuming unstructured text data.
- Exported data Structure from Oracle using Sqoop and Analyzed data using Hadoop components like Hive.
- Used Zookeeper to co-ordinate cluster services. Installed Oozie workflow engine to run multiple Hive.
- Created and maintained Technical documentation for launching HADOOP Kudu Clusters and for executing Hive queries and Pig Scripts.
- Designed a data processing warehouse using hive, created and managed Hive tables in Hadoop.
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Map Reduce, HBase, UNIX Shell Scripting.
Java Data Developer
Confidential
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Developed this application based on MVC Architecture using open-source spring.
- Implemented the presentation layer with HTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Developed Client-Side Validations using Java Script.
- Deployed the applications on Web sphere Application Server
- Wrote complex SQL queries and stored procedures.
- Involved in templates and screens in HTML and JavaScript.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Actively involved in the system testing and implementing service layer using Spring IOC.
- Spring framework AOP features were extensively used and Monitored logs by using LOG4J.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects
Environment: Java, J2EE 1.4, Servlets 3.0, JDBC, JavaScript, spring 2.0, MySQL 5.0, JUnit, Eclipse IDE 2.1.