Hadoop Admin/developer Resume
Columbus, OH
PROFESSIONAL SUMMARY:
- Over 7 years of experience spread across Hadoop, Java and ETL, that includes extensive experience into Big Data Technologies and in development of standalone and web applications in multi - tiered environment using Java,Hadoop, Hive, HBase, Pig
- Good work experience on large-scale systems development projects, especially enterprise distributed systems.
- Strong knowledge on JCL
- Very good understanding of Hadoop ecosystems like Sqoop, Spark and YARN.
- Strong Working experience on rule-based decision making, information-parsing and complex data processing using schematron and drools.
- Experience in Data Analysis, Data Validation, Data Verification, Data Cleansing, Data Completeness and identifying data mismatch.
- Experience in working with MR, PIG scripts &HIVE query Language.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
- Extending Hive and Pig core functionality by writing custom UDFs
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Extensive experience with SQL, PL/SQL, PostgreSQL and database concepts
- Knowledge of NoSQL, Mongo DB such as HBase and Cassandra
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experience in Amazon AWS cloud services (EC2, EBS, S3, SQS)
- Utilized Storm for processing large volume of datasets.
- Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
- Monitoring and support through Nagios and Ganglia.
- Experience in cluster automation using Shell Scripting
- Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation, and production support
- An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills
- Hands on experience in Scala, Kafka and Strom.
- Experience in implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, Impala, HBase, Casandra, Sqoop, Oozie, Zookeeper, Flume
Java & J2EE Technologies: Core Java
IDE Tools: Eclipse, NetBeans
Programming languages: COBOL, Java, KSH & Mark up Languages
Databases: Oracle, MySQL, DB2, IMS, PostgreSQL
Operating Systems: Windows 95/98/2000/XP/Vista/7, Unix
Reporting Tools: Tableau
Other Tools: Putty, WINSCP, EDI(Gentran), Streamweaver, Compuset
WORK EXPERIENCE:
Confidential, Columbus, OH
Hadoop Admin/Developer
Responsibilities:
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Possess good Linux andHadoopSystem Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools
- Experience in developing customized UDF's in java to extend Hive and PigLatin functionality.
- Created HBase tables to store various data formats of data coming from different sources.
- Use Maven to build and deploy code in Yarn cluster
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java REST ful web services using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
- Expert in creating and designing data ingest pipelines using technologies such as springIntegration, ApacheStorm-kafka
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration.
- Dumped the data from HDFS toMYSQLdatabase and vice-versa using SQOOP
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.z
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Create a complete processing engine, based on Cloudera's distribution
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Configured Kerberos for the clusters
Environment: Java,UNIX, HDFS, Pig, python Hive, MapReduce,Sqoop, Spring, NoSQL DB’s, Cassandra, Hbase, AWS, LINUX, Chef, Flume,Hortonworks,Maven, Oozie, Spark, Yarn, Shell scripting, JCL
Confidential, Piscataway NJ
Hadoop Developer
Responsibilities:
- Involved in complete Implementation lifecycle, writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the RDBMS using Sqoop for visualization and to generate reports for the BI team.
- Used Hive/HQL or Hive queries to query or search for a particular string in Hive tables ofHDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Installed and configured Hive and also written Hive UDFs
- Managed HBase tables to store various data formats of data.
- Implementing a technical solution on POC's, writing programming codes using technologies such asYarn, Python and Microsoft SQL Server.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Managed Amazon Web Services (AWS) EC2 withPuppet
- Utilized Python regular expressions operation (NLP) to analysis customer review
- UsedFlume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Monitoring Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager
- Experienced developingHadoopintegrations for data ingestion, data mapping and data process capabilities.
- Managed data coming from different portfolios.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Expertise inAWSdata migration between different database platforms like SQL Server to Amazon Aurora using RDS tool
- Create and consume REST ful web services with NodeJs and MongoDB
- Supported tuple processing, writing data with Storm by provide Storm-Kafka connectors
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Developed the scripts to automate routine DBA tasks using Linux/UNIX Shell Scripts (i.e. database refresh, backups, monitoring etc.).
- Real streaming the data using Spark with Kafkaand store the stream data to HDFS using Scala.
- Analyzed large amounts of datasets to determine optimal way to aggregate and report.
Environment: HadoopHortonworks, Java,Python, UNIX, HDFS, Chef, Pig,Hive, MapReduce, Sqoop, NoSQL DB’s, Cassandra, Hbase, Maven,LINUX, Flume, Oozie
Confidential, Schaumburg, IL
Hadoop Admin andDeveloper
Responsibilities:
- Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing
- Importing and exporting data into HDFS and Hive using Sqoop
- Proactively monitored systems and services, architecture design and implementation of Hadoopdeployment, configuration management, backup and disaster recovery systems and procedures
- Developed Spark jobs usingScala in test environment for faster data processing and used Spark SQL for querying
- Worked on NoSQL database including MongoDB,Cassandra and HBase.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Used Storm for the real time procession of data
- Developed Puppet scripts to install Hive, Sqoop, etc. on the nodes
- Data back up and synchronization using Amazon Web Services
- Worked on Amazon Web Services as the primary cloud platform
- Supported Map Reduce Programs those are running on the cluster
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Designed and implemented DR and OR procedures
- Used Spring Framework with Hibernate to map to Oracle database
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
- Involved in configuring Hive and writing Hive UDFs
- Worked on DevOps tools likeChef,Ansible, and Jenkins to configure and maintain the production environment
- Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud
- Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology
- Automation script to monitor HDFS and HBase through Cron jobs
- Used MRUnit for debugging MapReduce that uses sequence files containing key value pairs.
- Develop high-performance cache, making the site stable and improving its performance
- Proficient with SQL languages and good understanding of Informatica
- Administrative support for parallel computation research on a 24-node Fedora/ Linux cluster.
Environment: Hadoop, MapReduce, AWS EC2, HDFS, Chef, Jenkins, Hive, spark, Kafka, CouchDB, Flume, Cassandra, Hibernate, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, MRUnit, Informatica, JUnit, Tomcat 6, JDBC, JNDI, Maven, SQL, Oracle, XML, Eclipse.
Confidential, New York, NY
Hadoop Admin
Responsibilities:
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in defining job flows.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Extensive experience in testing, debugging and deploying MapReduc eHadoop platforms.
- Involved in loading data from UNIX file system to HDFS.
- Installation and configuration jobs ofHadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Hive and also written Hive QL scripts.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Extensive usage of Struts, HTML, CSS, JSP, JQuery, AJAX and JavaScript for interactive pages.
- Developed workflows to process flume log data using Apache Spark inScala
- Used Ganglia to monitor the cluster around the clock.
- Assist the team in their development & deployment activities.
- Instrumental in preparing TDD &developing Java Web-Services for WU applications for many of the money transfer functionalities.
- Used Web services concepts like SOAP, WSDL, JAXB, and JAXP to interact with other project within Supreme Court for sharing information.
- Involved in developing Database access components using Spring DAO integrated with Hibernate for accessing the data.
- Involved in writing HQL queries, Criteria queries and SQL queries for the Data access layer.
- Involved in managing deployments using xml scripts.
- Testing - Unit testing through JUNIT & Integration testing in staging environment.
- Followed Agile SCRUM principles in developing the project.
- Involved in development of SQL Server Stored Procedures and SSIS DTSX Packages to automate regular mundane tasks as per business needs.
- Coordinating with offshore/onshore, collaboration and arranging the weekly meeting to discuss and track the development progress.
- Involved in coordinating for Unit Testing, Quality Assurance, User Acceptance Testing and Bug Fixing.
- Coordination with team, peer reviews and collaborative System level testing.Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.6), Java, HTML, JavaScript, XML, XSLT, jQuery, AJAX, Web Services, JNDI, SQL Server, Struts2.0, Hibernate.
Confidential
Java Developer
Responsibilities:
- Involved in the elaboration, construction and transition phases of the Rational Unified Process.
- Designed and developed necessary UML Diagrams like Use Case, Class, Sequence, State and Activity diagrams using IBM Rational Rose.
- Used IBM Rational Application Developer (RAD) for development.
- Extensively applied various design patterns such as MVC-2, Front Controller, Factory, Singleton, Business Delegate, Session Façade, Service Locator, DAO etc. throughout the application for a clear and manageable distribution of roles.
- Implemented the project as a multi-tier application using Jakarta Struts Framework along with JSP for the presentation tier.
- Used the Struts Validation Framework for validation and Struts Tiles Framework for reusable presentation components at the presentation tier.
- Developed various Action Classes that route requests to appropriate handlers.
- Developed Session Beans to process user requests and Entity Beans to load and store information from database.
- Used JMS (MQSeries) for reliable and asynchronous messaging the different components.
- Extensively work on Node.js, Angular.js etc
- Wrote Stored Procedures and complicated queries for IBM DB2
- Designed and used JUnit test cases during the development phase.
- Extensively used log4j for logging throughout the application.
- Used CVS for efficiently managing the source code versions with the development team.
Environment: JDK, J2EE, Web Services (SOAP, WSDL, JAX-WS), Hibernate, Spring, Servlets, JSP, Java Beans, NetBeans, Oracle SQL Developer, JUnit, Clover, CVS, Log4j, PL/SQL, Oracle, Web sphere Application Server, Tomcat Web Server, Node.js
Confidential
JAVA Developer
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
- Reviewed the functional, design, source code and test specifications
- Involved in developing the complete front end development using Java Script and CSS
- Created real time web applications usingNode.js
- Author for Functional, Design and Test Specifications
- Implemented Backend, Configuration DAO, XML generation modules of DIS
- Analyzed, designed and developed the component
- Used JDBC for database access
- Used Data Transfer Object (DTO) design patterns
- Unit testing and rigorous integration testing of the whole application
- Written and executed the Test Scripts using JUNIT
- Actively involved in system testing
- Developed XML parsing tool for regression testing
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
Environment: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, JUnit, Node.js Oracle10g, XML, XSL and UML