We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

4.00/5 (Submit Your Rating)

ChicagO

PROFESSIONAL SUMMARY:

  • Over 8+ years of IT experience in analysis, design, and development using Hadoop, JavaJ2EE, SQL. Apache Hadoop components such as HDFS, Map - Reduce, Hive, HBase, PIG, Scala, Spark, Impala, OOZIE, Flume, HCatalog and Sqoop.
  • FLUME in collecting the data and populate Hadoop.
  • Architected, Designed and Developed Big Data solutions for various implementations.
  • Worked on Data Modelling using various ML (MachineLearningAlgorithms) via R and Python (Graphlab) Worked on Programming Languages like CoreJava and Scala.
  • Worked on HBase in conducting the quick look ups such as updates, inserts, and deletes in Hadoop.
  • Experience in Datamodeling, complexdatastructures, Dataprocessing, Dataquality, Datalifecycle.
  • Experience in Amazon AWS cloud which includes services like: EC2, S3, EBS, ELB, AMI,IAM, Route53, Autoscaling, CloudFront, CloudWatch, Security Groups.
  • A very good experience in developing and deploying the applications using Weblogic, ApacheTomcat, and JBoss.
  • Strong Experience withSQL, PL/SQL, and the database concepts.
  • Experience on NoSQL Databases such as Hbase and Casandra.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and ControlM.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
  • Experience in running Map-Reduce and Spark jobs over YARN.
  • Hands-on experience in complete project life cycle (design, development, testing, and implementation) of Client Server and Web applications
  • Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, CloudFormation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, and auto-scaling.
  • Participated in design reviews, code reviews, unit testing and integration testing.
  • Worked on HDFS, NAMENODE, JOBTRACKER, DATANODE, TASKTRACKER and the Map-Reduce concepts.
  • Expertise in SFDC Administrative tasks like creating Profiles, Roles, OWD, Field Dependencies, Custom objects, Page Layouts, Validation rules, Approvals, Workflow rules, Security and sharing rules, Delegated Administration, Tasks and actions, Public Groups, Queues .
  • Experienced in Developing Triggers, Batch Apex, Scheduled Apex classes.
  • Hands-On Experience in Sales Cloud, Service Cloud, Chatter, and Marketing, Customer Portal and Partner Portal and recommended solutions to Improve Business processes using Salesforce CRM.
  • Strong understanding of CRM Business processes like Forecasting, Campaign, Lead, Order, Accountand Case Management .
  • Experience in working with Developer Toolkits like Force.com IDE, Force.com Ant Migration Tool, Eclipse IDE, Mavens .
  • Worked on Production requests for Salesforce.com on Service cloud, Sales cloud, Marketing cloud, Apttus CPQ, Managed Packages like FieldFX,Flousum .
  • Gained Hands-on experience in Lightning Connect to access the Real time data from third party system by using external objects and Connectors in salesforce.
  • Extensively worked on Salesforce.com Environments which includes Creation/Refreshing of Sandboxes and Configuring connections between Production and Sandbox Environments.
  • Experience in Front-end Technologies like HTML , CSS, HTML5, CSS3, AJAX.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
  • Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Apache Storm. 
  • Good Knowledge of HDFS high availability (HA) and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node
  • Familiar with data warehousing and ETL tools like Informatica.
  • Defined extract-translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
  • Familiar with Core Java with a strong understanding and working knowledge of Object Oriented Concepts like Collections, Multithreading, DataStructures, Algorithms, JSP, Servlets, Multi-Threading, JDBC, HTML.
  • Having knowledge to implement Horton works (HDP 2.3 and HDP 2.1), Cloudera (CDH3, CDH4, CDH5)

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Hortonworks, Kafka, Avro, BigQuery.

Languages: Core Java, XML, HTML and HiveQL.

J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.

Frameworks: Spring 2, Struts 2 and Hibernate 3.

Hadoop Ecosystem: HDFS, MapReduce, Hive, Yarn, Pig, HBase, Sqoop, Oozie, Flume, Zookeeper, Spark, Impala, Storm and Kafka.

Reporting Tools: BIRT 2.2.

Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.

Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.

Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2

IDE: Eclipse and Edit plus.

PM Tools : MS MPP, Risk Management, ESA,

Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.

EAI Tools: TIBCO 5.6.

Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.

Operating System: Windows 98/2000, Linux /Unix and Mac.

PROFESSIONAL EXPERIENCE:

Confidential, Chicago

Hadoop/Big Data Developer

Responsibilities:

  • Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on Installing and configuring the HDPHorton works 2.x and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments.
  • Volumetric Analysis for 43 feeds (CurrentApproximate Size of Data (70TB):
  • Based on which size of ProductionCluster is to be decided.
  • Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data ProcessingLayer.
  • Migrated complex Map reduce programs into Spark RDD transformations, actions.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and spark.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated the performance of Apache Spark in analyzing genomic data.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Implemented Impala for data analysis.
  • Prepared Linux shell scripts for automating the process.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hive jobs using Kettle and Oozie (Work Flow management).
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured, and unstructured data with Map Reduce, Hive, and Pig.
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions. 
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from various sources.
  • Created partitioned tables in Hive, mentored analyst and test team for writing Hive Queries.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Responsible for coding SQL Statements and Stored procedures for back end communication using JDBC.

Environment: Hadoop, Spark, HDFS, Pig, Hive, Flume, Sqoop, kafka, Oozie, HBase, Zookeeper, MySQL, Shell scripting, Linux Red Hat, core Java 7, Eclipse.

Confidential, Dallas,TX

Hadoop/Big Data Developer

Responsibilities:

  • Developed Pig Latin scripts using operators such as LOAD , STORE , DUMP , FILTER , DISTINCT , FOREACH , GENERATE , GROUP , COGROUP , ORDER , LIMIT , UNION , SPLIT to extract data from data files to load into HDFS .
  • Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on Installing and configuring the HDPHortonWork2.X and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments.
  • Volumetric Analysis for 43 feeds (CurrentApproximate Size of Data (70TB), Based on which size of ProductionCluster is to be decided.
  • Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data ProcessingLayer.
  • Worked on Capacity planning for the ProductionCluster.
  • Installed HUE Browser.
  • Involved in loading data from UNIX file system to HDFS.
  • Created data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
  • Involved in creating Hivetables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Worked on Installation of HORTONWORKS 2.1 in AZURELinuxServers.
  • Worked on cluster up gradation in Hadoop from HDP2.1 to HDP 2.3.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure
  • Managed and reviewed Hadoop log files.
  • Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
  • Worked on indexing the HBase tables using Solr and indexing the Json data and Nested data.
  • Responsible for Clustermaintenance, Monitoring, commissioning and decommissioningDatanodes, Troubleshooting, Manage and review data backups, Manage&review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to another environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Ambari.
  • Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of JSON files.
  • Collaborating with application teams to install the operating system and Hadoopupdates, patches, version upgrades.
  • Involved in configuring Flume and Avro and HBase
  • Monitored workload, job performance, and capacity planning
  • Involved in Analyzingsystemfailures, identifyingrootcauses, and recommended a course of actions.
  • Extensively used TOAD for source and target database activities.

Environment: Hadoop, MapReduce,TAC, HDFS, HBase, HDP Horton, AWS S3, EMR, Airflow Sqoop, SparkSQL, Hive ORC, Data Processing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron, JSON, XML, Parquet.

Confidential, California, U.S

Hadoop/Big Data Developer

Responsibilities:

  • Installing, configuring and testing Hadoop ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Hue and HBase.
  • Imported data from various sources into HDFS and Hive using Sqoop.
  • Involved in writing custom MapReduce, Pig and Hive programs.
  • Experience in writing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created Partitions and Buckets in Hive for both Managed and External tables for optimizing performance.
  • Worked on several PoC's involving No SQL Databases like HBase, MongoDB and Cassandra.
  • Configured Tez as execution engine for Hive queries to improve the performance.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
  • Hands on experience in Spark and Spark Streaming creating RDD's, Applying operations -Transformation and Actions on it.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • In-depth knowledge of Scala and experienced in building the Spark applications using Scala.
  • Configured Flume to stream data into HDFS and Hive using HDFS Sinks and Hive sinks.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, Pig and Spark jobs.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.

Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.

Confidential, Lowa, U.S

Hadoop/Big Data Developer

Responsibilities:
  • Installed and configured Hadoop MapReduce, Flume, Avro and HBase
  • Involved in configuring Flume and Avro and HBase
  • Written HBase queries for finding different metrics.
  • Gained very good business knowledge on Transactions processing, fraud suspect identification, appeals process etc.
  • Developed pig scripts to transform data and loaded into Hbase tables
  • Created Hive snapshot tables and Hive ORC tables from hive tables.
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like Hbase. Developed MapReduce programs to perform data filtering for unstructured data.
  • Optimized hive joins for large tables and developed map reduce code for the full outer join of two large tables.
  • Experience in using HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
  • Creating Hbase tables for random read/writes by the map reduce programs.
  • Used Talend Big Data Open Studio 5.6.2 to create framework for executing extract framework
  • Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
  • Used different bigdata components in Talend like thiverow, thiveInput, tHDFSCopy, tHDFSput, tHDFSGet, tMap, tdenormalize, tFlowtoIterate etc.,
  • Scheduled different talend jobs using TAC (Talend Admin Console).
  • Played a major role in entire SDLC process right from working with Business to get Requirements to Analysis, Design, Testing, Implementation and loading of data warehouse
  • Worked closely with ETL Architect and Business in building the Source to target mapping

Environment:Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.

Confidential

Hadoop/Big Data Developer

Responsibilities:

  • Installing and configuring fully distributed multinode Hadoop Clusters.
  • Installing Hadoop Eco-System Components (HDFS,YARN,Pig, Hive, Kafka, Flume, Spark and HBase)
  • Involved in Hadoop Cluster environment administration that includes cluster capacity planning, performance tuning, cluster Monitoring and Troubleshooting.
  • Deployed and managed Cloudera Hadoop clusters with CDH 5.5.1 and CDH 5.7.1.
  • Implemented and maintained Hortonworks cluster.
  • Configured and managed Hadoop Cluster security with Kerberos, CDH with LDAP; automated tasks by setting up scripts.
  • Managed the backup and disaster recovery for Hadoop data including FSimage and backups.
  • Coordinating and managing relations with vendors, IT developers and end users.
  • Managing the work streams, process and coordinate the team members and their activities to ensure that the technology solutions are in line with the overall vision and goals.
  • Design, implement and review features and enhancements to Cassandra.
  • Integrated Cassandra Querying Language called CQL for Apache Cassandra.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and Exporting data from Mysql/Oracle to HDFS, Hive as well as Hbase and Vice Versa based on the customer requirements.
  • Using Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
  • Experience in writing HIVE JOIN Queries.
  • Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
  • Configuration of various database connectivity (Oracle11g, SQL Server 2005).
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Experience in providing security for Hadoop Cluster with Kerberos.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.

Environment: HDFS, YARN, Hive, Flume, Cloudera Manager, Hortonworks, Sqoop, MySQL, HBase, Cassandra, MapReduce, Spark, UNIX Shell Scripting, Zookeeper.

Confidential

Java Developer

Responsibilities:

  • Using OOAD Technology classes are designed in UML with the help of Rational Rose tool.
  • Created user-friendly GUI interface and Web pages using HTML and DHTML embedded in JSP.
  • JavaScript was used for the client side validations.
  • Designing and developing generic validator framework for modules and injecting these validators using hibernate framework.
  • Creating Hibernate POJOs, Hibernate mapping files for all database tables.
  • Developing GUI Screens using JSF (IBM Implementation) and for Ajax functionality.
  • Developed and deployed EJB's (Session and Entity) to implement the business logic and to handle various interactions with the database.
  • Involved in debugging the application.
  • Developed Servlets using JDBC for storing and retrieving user data into and from the SQL database.
  • Used Web Logic Application Server to deliver a new class of enterprise applications that enhance business interactions and transactions between a company and its key constituencies.
  • Used Web Logic Application Server to deliver high performance and scalability.
  • Written Database objects like Triggers, Stored procedures in SQL.
  • Interacted with the users and documented the System.
  • Used HP QA to manage the defects and issues.

Environment: JSP 2.0, JDBC, HTML, OOAD, Servlets, Web Services, Rational Rose, WSAD, UML, Java, EJB, JSF, QA, Hibernate, AJAX, Windows 7/XP, CVS, XML/XSL.

We'd love your feedback!