Hadoop Consultant Resume
Plano, TX
SUMMARY
- In Depth knowledge in Data Warehousing concepts with emphasis on ETL and life cycle development includes requirement analysis, design, development, testing and implementation.
- Strong hands on experience in Hadoop Framework and its ecosystem including but not limited to HDFS Architecture, Map Reduce Programming, Hive, Pig, Sqoop, HBase, Oozie etc.
- Worked on disaster management with Hadoop cluster.
- Involved in Building a Multi - tenant cluster.
- Experience in Mainframe data and batch migration to Hadoop.
- Experience in Spark using Scala.
- Extensively worked using AWS services along with wide and in depth understanding of each one of them.
- Highly experienced in hive and Impala quires.
- Highly experienced in working with business users in gathering requirements, preparing mapping documents, high level/ low level design documents.
- Migrating the DataStage components and database components to the higher environments.
- Strongly experienced in writing Teradata sqls and involved in performance tuning of the sqls.
- Extensive experience and well versed with UNIX (shell, AWK, SED, Wild cards) commands.
- Experienced in writing Unix scripts.
TECHNICAL SKILLS
ETL Tools: IBM Infosphere data stage 11.5, IBM Infosphere data stage 8.7, IBM Infosphere DataStage 8.5, IBM WebSphere DataStage 8.0.1(Designer, Director, Administrator), Ascential DataStage 7.5.2 (Designer, Director, Administrator, Manager).
Database: Netezza, Oracle 11g/10g/9i/8i, IBM DB2/UDB, Teradata 13, SQL Server 2003/2005/2008 and IBM Informix.
Data Warehousing: Star & Snow-Flake Schema Modeling, Fact and Dimensions, Physical and Logical Data Modeling, Erwin. Business Objects XI R2, Cognos, Report Net, Metadata workbench
Operating systems: Windows 7x/NT/XP, UNIX, LINUX, Solaris, MS-DOS
Languages/Scripting: C, C++, Java, D2K, Visual Basic, PL/SQL, UNIX Shell scripts, DSBASIC, Scala
Testing/Defect Tracking: HP Quality Center, Test Director, Bugzilla
Hadoop Technologies: HBase, HIVE, Impala, Sqoop, Flume, HDFS, Oozie, Zoo Keeper, Spark, Pig, Kafka, Sentry, AWS
IDEs: Eclipse, NetBeans, IntelliJ
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Hadoop Consultant
Environment: Hadoop 2.6.2, Hortonworks, Hive, Spark SQL, Sqoop, Oozie, Shell script, Netezza, Datastage 11.5, Control - m, AWS,RTC tool, Spark, Python, GIT, Bamboo.
Responsibilities:
- Involved in gathering business user requirements.
- Created high level design documents.
- Understanding source data from different vendors, source systems.
- Involved in creating mapping documents.
- Involved in improving the architecture flow for existing warehouse design.
- Created jobs for brining the data from different source systems using common framework and transform according to business requirements and load in the tables.
- Involved in unit testing on tables based on business requirements.
- Involved in creating common scripts and jobs in improving the common architecture.
- Highly skilled in deployment, data security and troubleshooting of the applications using AWS services.
- Experienced with installation of AWS CLI to control various AWS services through SHELL/BASH scripting.
- Created Control-m jobs for scheduling Datastage jobs.
- Created Sqoop jobs for importing data from Netezza to HDFS.
- Involved in creating data model using Hadoop file system.
- Created Spark jobs using python for transforming data using datasets.
- Created unit test cases and executed hive quires.
- Configured oozie workflow for invoking spark and sqoop jobs.
- Involved in production deployment and support.
- Involved in fixing bugs and defect.
Confidential, Charlotte, NC
Big Data Developer
Environment: Hadoop 2.0,Cloudera CDH 5.7, PIG, Hive, Imapla, Sqoop, Oozie, HBase, Zoo keeper, Shell script, Scala, SPARK, JAVA, Oracle, MYSQL, Cassandra, Sentry, Falcon, Spark. AIX 7.0. GIT version control, JIRA planning tool, Maven, Bamboo, Jenkins.
Responsibilities:
- Implemented Data Interface to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
- Involved in business requirement analysis.
- Involved in creating micro level and macro level documents.
- Imported data from RDBMS to HDFS using Sqoop import/export options.
- Configured Oozie work flows to automate data flow, preprocess and cleaning tasks using Hadoop Actions. Used Oozie for shell actions, java actions and ETL.
- Implemented ETL migrations in Spark using Scala from data stage.
- Implemented new features to JSF application utilizing ICE faces tables for frontend and JDBC for backend data access.
- Performance tuning spark jobs based on data whether to use map side join or reduce side join.
- Implemented common spark jobs for each customer to move data from traditional files to IBM JDM data ware house using HDFS.
- Implemented Hbase features such as compression and used to design, build Spark jobs
- Experienced in optimizing Shuffle and Sort phase in Map Reduce Phase.
- Implemented Device based business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Involved migrating build jobs from Jenkins to Bamboo.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
- Balanced cluster after adding/removing nodes or major data cleanup
- Cluster Management using Cloudera Manager.
- Experience in resolving a NameNode Checkpoint failure.
- Configuring Sqoop and Exporting/Importing data into HDFS.
- Implemented dashboards that internally use Hive queries to perform analytics on Structured data, Avro and Json data to meet business requirements.
- Involved in migrating the jobs from development to higher environments.
- Involved in fixing the bugs and production support.
Confidential
Terradata/ETL Developer
Environment: IBM Infosphere DataStage 8.7 (Designer, Manager, Director and Administrator), Teradata v2r6, v2r12, Autosys Scheduling tool, Oracle 10g/9i, Toad, Shell Scripts, AIX 7.0.
Responsibilities:
- Involved in the Analysis of the functional side of the project by interacting with functional experts to design and write technical specifications.
- Involved in creating detailed technical design documents based on the technical specifications.
- Involved in working with ETL Architects in defining the ETL process.
- Involved in creating ETL design and develop Simple, Medium & complex ETL jobs using Datastage and Teradata.
- Created Data stage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems like ODS, flat files.
- Created common stored proc and user defined functions in Teradata.
- Extracted data from sources like Oracle, Flat Files and Complex Flat files.
- Involved in writing UNIX scripts for data cleansing/data masking.
- Create common audit and error logging processes job monitoring and reporting mechanism using Datastage.
- Created, optimized, reviewed, and executed Teradata SQL test queries to validate transformation rules used in source to target mappings/source views, and to verify data in target tables.
- Responsible in performing peer review of the ETL jobs developed by the team.
- Develop unit test scripts, test conditions, expected results and sample data for ETL jobs.
- Create User Manuals, Job run guides and other manuals for the components developed to provide good understanding to the clients.
- Involved in migration of ETL code to different environments (SIT, UAT, PROD).
- Closely worked with testing team in identifying the defects and fixing it.
- Provide Production support for the ETL components developed. Create tickets and resolve issues based on the priority.
- Worked on changed requests as per clients and projects technical specification needs.
Confidential, Memphis, TN
Research Assistant
Responsibilities:
- Analyzed data coming from satellite.
- Understanding the usage of range Doppler algorithm.
- Created Redundancy Analysis(RDA) program using Matlab for taking input image data.
- Developed range Doppler algorithm using mat Lab.
- Using Simulink compiled and executed the code developed.
- Captured result using Images without noise.
- Achieved block processing efficiency, using frequency domain operations in both range and azimuth.
- Implemented cell migration correction between two-one-dimensional modeling.