We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • Around 9+ years of experience in the field of Information Technology including 4 years of experience in Big Data/Hadoop
  • Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Strong Experience on SQL, SSIS, SSRS, Crystal Reports
  • Hands - on Experience on Qlikview, Tableau.
  • Experience in ETL and Business intelligence Solutions, Hadoop Big Data and SQL developer
  • Very Strong Object-oriented concepts with complete software development life cycle experience - Requirements gathering, Conceptual Design, Analysis, Detail design, Development, Mentoring, System and User Acceptance Testing.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce.
  • Used different Hive Serde's like Regex Serde .
  • Worked on Data torrentPlatform and launched the Applications
  • Provided technical assistance for development and execution of test plans and cases as per client requirements in R language
  • Experience in Hadoop,Spark cluster and streams processingusing SparkStreaming.
  • Extensive knowledge on data serialization techniques like AVRO, sequence files.
  • Excellent understanding and knowledge of NoSQL databases like HBase
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Developed Map Reduce programs to perform analysis.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts to dump the Shared data from MyBI servers to HDFS.
  • Experience in data pipeline using Pig and Hive from Teradata and Netezza data sources. These pipelines had customized UDF’S to extend the ETL functionality.
  • Highly knowledgeable inWriter Comparable,Writer interfaces, Mapper and Reducer abstract classes,HadoopDataObjects such asIntWritable,ByteWritable, Text objects.
  • Hands on experience on technologies like Spark, Kafka, Apache Drill,Platfora, Sqoop, Flume
  • Experience in using Oozie 0.1 for managing Hadoop jobs.
  • Experience in cluster coordination using Zookeeper.
  • Extensively development experience in different IDE’s like Eclipse, NetBeans, Forte and STS.
  • Working on implementing Spark for fast processing, to create reports using Tableau for CampaignManagement.
  • Background with traditional databases such as Oracle, SQL Server, and ETL tools / processes. Expertise in Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
  • Expertise in relational databases like Oracle, My SQL.
  • Experience in designing both time driven and data driven automated workflows using Oozie 3.0 order to run jobs of Hadoop MapReduce 2.0
  • Experience in installation, configuration, supporting and managing- Cloudera's Hadoop platformalong with CDH4&5 clusters.
  • Experienced in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in implementing SSL/TLS to protect all network communications.
  • Development experience with Java/J2EE applications including JSP, Servlets, JDBC, Java Beans, HTML, JavaScript, XML, DHTML, CSS, complex SQL queries, Web Services, SOAP and data analysis
  • Extensive experience in working with the Customers to gather required information to analyze, debug and provide data fix or code fix for technical problems, build service patch for each version release and unit testing, integration testing
  • User Acceptance testing and system testing and providing Technical Solution documents for the Users.

TECHNICAL SKILLS

Programming Languages: Java 1.4, C++, C, SQL,Python,R, PIG, PL/SQL.

Java Technologies: JDBC.

Frame Works: Jakarta Struts 1.1, JUnit and JTest.

Databases: Oracle8i/9i, NO SQL MYSQL,SQL, MongoDB,MSSQL server.

IDE’s & Utilities: Eclipse and JCreator, NetBeans.Web Dev.

Technologies: HTML, XML.

Protocols: TCP/IP, HTTP and HTTPS.

Operating Systems: Linux, MacOS, WINDOWS 98/00/NT/XP.

Hadoop ecosystem: Hadoop and MapReduce, Sqoop,Datameer, Hive, PIG,HBASE, HDFS,Platfora,YARN,Splunk,Spark,Oozie.

PROFESSIONAL EXPERIENCE

Confidential

Senior Hadoop Developer

Responsibilities:

  • Worked with Partitions, Bucketing tables in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Solved performance issues in Hive scripts with understanding of Joins, Group and aggregation.
  • Created and worked Sqoop jobs with full refresh and incremental load to populate Hive External tables.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
  • Worked with highly unstructured and semi structured data.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node high availability, capacity planning, and slots configuration.Importing data from Oracle DB and third party vendors using Sqoop and FTP.
  • Effectively managed client expectations and delivery of consistently high standard of work ensuring deadlines are met.
  • Worked on building BI reports in Tableau with Spark using SparkSQL.
  • Developed Hadoopstreaming Map/Reduce works using Python.
  • Experienced in using Sequence files, Avro and Parquet file formats.
  • Writing optimized Hive queries for both batch processing and adhoc querying.
  • Implemented LZO and Snappy compression formats.Writing PIG UDFs to perform data cleansing and transforming for ETL activities. Transformed the log files into structured data using Hive and Pig Loaders.
  • Wrote HIVE UDFs for Data analysis and Hive table loads. Built reusable Hive UDF libraries forbusiness requirements, which enabled users to use these UDF's in Hive querying.
  • Logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled business analysts to write Hive queries.
  • Designed, Puppetized and deployed big data analytics data services platform (Hadoop, Storm, Kafka, etc.)
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Used Pig to do data transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Developed sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Created pig pipeline scripts to create the file dumps to downstream systems like SAS and Tableau.
  • Implemented test scripts to support test driven development and continuous integration.
  • Exported and analyzed data from the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Good knowledge on building Apache spark applications using python.
  • Experience in monitoring cluster health, adding data nodes, decommissioning data nodes, Job Scheduling,Monitoring and perform maintenance activities.
  • Worked on installing cluster using Cloudera Enterprise manager, data node and namenode recovery, capacity planning, and slots configuration.
  • Experienced working on moving the SQL based system to Hadoop based infrastructure .Created an FTP Based interface for incremental loads and transaction data by date.

Environment: CDH -5.4.5,Hadoop 2.5, Hive 4.0.0, Pig 1.5.0 Oozie 4.0.0, Sqoop 1.4.5, Flume 1.5.0, Apache Spark 1.1.0, MYSQL DB, Oracle 11g, R,Python, SQL Developer,DB2, LINUX.

Confidential, Newyork, NY

Senior Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Integrated scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
  • Created Data Pipeline of MapReduce programs using Chained Mappers.
  • Experience in utilizing spark machine learning techniques implemented in scala.
  • Writing optimized Hive queries for both batch processing and adhoc querying.
  • Implemented Optimized join base by joining different data sets to get prospect zipdata using MapReduce.
  • Implemented complex mapreduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Created the high level Design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Document and explain implemented processes and configurations inupgrades
  • Worked with application teams to install operating system,Hadoopupdates, patches, versionupgradesas required.
  • The processed results were consumed by HIVE, Scheduling applications and various other BI reports through data warehousing multi-dimensional models.
  • Built shell scripts to execute the hive scripts on linux platform to process and extract terabytes of data from different data warehouses and prepare datasets for ad-hoc analysis and business reporting needs.
  • All this happens in a distributed environment.
  • Developed several advanced MapReduce programs to process data files received.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java mapreduce Hive, Pig, and Sqoop.
  • Worked on SAS migration to Hadoopon Fruad Analytics and provided predictive analysys.
  • Worked on SAS migration to Hadoop Campaign and response analysys .
  • Analyzed and designed Hadoop directory structures for Archive data.
  • Developed Unit test cases using Junit and MRUnit testing frameworks.
  • Experienced in Monitoring Cluster using Cloudera manager.
  • Worked on AWS Clusters

Environment: Hadoop, HDFS, HBase, MapReduce, Java, R,Hive, SAS,Pig, SQL,Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager,spark, Cloudera Hadoop distrubution, MySQL.

Confidential, Hartford CT

Hadoop Developer

Responsibilities:

  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to Hadoop Distributed File System.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Experience in managing and reviewing Hadoop log files.
  • UsedDatameerto analyze the transaction data for the client. installed, configured and managedDatameerusers on the Hadoop cluster.
  • Exporting the analyzed and processed data to the relational databases using Sqoop for visualization and for generation of reports for the BI team.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzing large amounts of data sets to determine optimal way to aggregate and report on these data sets
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on performing majorupgradeof cluster from CDH3u6 to CDH4.4.0
  • Created dash boards using Tableau to analyze data for reporting.
  • Support for setting up QA environment and updating of configurations for implementation scripts with Pig and Sqoop.

Environment: Hadoop, HDFS,SAS, Pig,Datameer Sqoop,SQL,Python, HBase, Shell Scripting, Linux Red Hat

Confidential - Los Angeles, CA

Hadoop Consultant

Responsibilities:

  • Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
  • Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
  • Developed Map Reduce programs for applying business rules on the data.
  • Developed and executed hive queries for denormalizing the data.
  • Installed and configured Hadoop Cluster for development and testing environment.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Configure WebHDFS to support REST API, JDBC connectivity to external clients for operations
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Dumped the data using Sqoop into HDFS for analyzing.
  • Developed data pipeline using Pig and Hive from Teradata and Netezza data sources. These pipelines had customized UDF’S to extend the ETL functionality.
  • Developed job flows in Oozie to automate the workflow for extraction of data from Teradata and Netezza
  • Developed data pipeline into DB2 containing the user purchasing data from Hadoop
  • Implemented Partitioning, Dynamic Partitions, buckets in Hive and wrote map reduce programs to analyze and process the data
  • Streamlined Hadoop jobs and workflow operations using Oozie workflow engine.
  • Involved in product life cycle developed using Scrum methodology.
  • Involved in mentoring team in technical discussions and Technical reviews.
  • Involved in code reviews and verifying bug analysis reports.
  • Automated work flows using shell scripts.
  • Performance tuning of the hive queries, written by other developers.

Environment: Hadoop, HDFS, Hive, MapReduce 2.0, Sqoop 2.0.0, Oozie 3.0,SQL, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential

Responsibilities:

  • Responsible for analyzing business requirements and detail design of the software.
  • Design and developed Front End User interface
  • Developed Web based (JSP, Servlets, java beans, JavaScript, CSS, XHTML) console for reporting and life cycle management.
  • Connectivity of JDBC was established using Oracle10g.
  • Writing SQL queries to insert, update database. Used JDBC to invoke Stored Procedures.
  • Involved with project manager in creating detailed project plans.
  • Designed technical documents using UML.
  • Involved in developing presentation layer using JSP, AJAX, and JavaScript.
  • Created Junit Test cases by following Test Driven development.
  • Responsible for implementing DAO, POJO using Hibernate Reverse Engineering, AOP and service Layer.
  • Used Spring, MVC pattern, struts frame work and followed Test Driven.

Environment: Rational Application Developer (RAD) 7.5, Web Sphere Portal Server 6.1, Java 1.6, J2EE, JSP 2.1, Servlets 3, JSF 1.2, Spring 2.5, Hibernate 2.0, Web Sphere 6.1, AXIS, Oracle 10g, JUnit, XML, HTML, Java Script, AJAX, CSS, Rational Clear Case.

Confidential, Minnetonka, MN

JAVA Developer

Responsibilities:

  • Extensively used Core Java, Servlets, JSP and XML
  • Used Struts 1.2 in presentation tier
  • Generated the Hibernate XML and Java Mappings for the schemas
  • Used DB2 Database to store the system data
  • Actively involved in the system testing
  • Involved in fixing bugs and unit testing with test cases using JUnit
  • Wrote complex SQL queries and stored procedures
  • Used Asynchronous JavaScript for better and faster interactive Front-End
  • Used IBM Web-Sphere as the Application Server

Environment: Java 1.2/1.3, Swing, Applet, Servlet, JSP, XML, HTML, Java Script, Oracle, DB2, PL/SQL

Confidential

Programmer Analyst/Java Developer

Responsibilities:

  • Involved in complete software development life cycle - Requirement Analysis, Conceptual Design, and Detail design, Development, System and User Acceptance Testing.
  • Involved in Design and Development of the System using Rational Rose and UML.
  • Involved in Business Analysis and developed Use Cases, Program Specifications to capture the business functionality.
  • Design of system using JSPs, Servlets
  • Designed application using Process Object, DAO, Data Object, Value Object, Factory, Delegation patterns.
  • Involved in the design and development of Presentation Tier using JSP, HTML and JavaScript.
  • Involved in integrating the concept of RFID in the software and developing the code for its API.
  • Coordinating between teams as a Project Co-coordinator, organizing design and architectural meetings.
  • Design and developed Class diagram, Identifying Objects and its interaction to specify Sequence diagrams for the System using Rational Rose.

Environment: JDK 1.3, J2EE, JSP, Servlets, HTML, XML, UML, RATIONAL ROSE, AWT, Web logic 5.1 and Oracle 8i, SQL, PL/SQL.

We'd love your feedback!