Etl Developer Resume
Minneapolis, MN
SUMMARY
- 6+ years of experience in IT industry comprises of development, maintenance and support, mitigation projects in Big Data, ETL, Machine Learning and Natural Language Processing technologies.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop& Flume.
- Experience in developing and implementing Map Reduce jobs using python to process and perform various analytics on large datasets.
- Experience in developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
- Experience with Informatica 8.6x and above (Source Analyzer, Mapping Designer, Mapplet Designer, Transformations Designer, Warehouse Designer, Repository Manager, and Workflow Manager/Server Manager).
- Extensively worked with different data sources non - relational databases such as Flat files, XML files, and other relational sources such as Oracle, Sql Server and DB2.
- Extensively worked in Extraction, Transformation and Loading of data from multiple sources into Data Warehouse.
- Experience in building models using various supervised and unsupervised learning algorithms.
- Proficiency in Machine learning techniques for feature extractions, statistical and probabilistic modeling of data and classifier techniques in Pattern Recognition problems.
- Good knowledge in job scheduling and monitoring through Oozie, Autosys and ZooKeeper.
- Experience in NoSql Databases like HBase.
- Good experience in Python, UNIX and shell scripts.
- Good knowledge in tuning the performance of SQL queries and ETL process.
- Experience in using open source such as NLTK, Scikit, R and Matlab.
- Experienced in working with tools like TOAD, SQL Server Management studio and SQL plus for development and customization.
- Experienced in working for the post development cycle and applications in Production Support.
- Excellent analytical and problem solving skills.
- Effective working relationships with client team to understand support requirements, and effectively manage client expectations.
- Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving skills and ability to work effectively as a team member as well as independently.
TECHNICAL SKILLS
Technologies: Big data, Machine Learning, ETL, NLP, Pattern Recognition.
ETL Tools: Informatica Power Center 9.x/8.x, Informatica Power Exchange, Reporting Service, Metadata Manager.
Languages: Hive, Mat-Lab, Oracle, Perl, Python, Pig,R, SQL, SQL, UNIX Shell Script.
Databases: Oracle 11g/10g, DB2 8.0/7.0, Confidential SQL Server2008/12, HBase
Environment: WindowsXP/ 2008/2003/2000/ NT/98/95, UNIX, LINUX.
Other Tools: Autosys, Sqoop, Flume, Oozie, Zookepper, IBM Attila,Weka, NLTK,Scikit, SRILM, Toad.
PROFESSIONAL EXPERIENCE
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Building framework for storing and processing input data from various resources.
- Exclusive experience in Hadoop and its components like HDFS, Map Reduce, Apache Pig, Hive, Sqoop, HBase and Oozie
- Extensive Experience in Setting Hadoop Cluster
- Good working knowledge with Map Reduce and Apache Pig. Involved in writing the Pig scripts to reduce the job execution time
- Created Hive tables to store the processed results in a tabular format
- Developed sqoop scripts in order to make the interaction between Pig and MySQL Database
- Writing the script files for processing data and loading to HDFS. Writing CLI commands using HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Analyzing the requirement to setup a cluster. Moved all log/text files generated by various products into HDFS location
Environment: Red Hat Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Python, HBase, MRUnit.
Confidential, New Jersey
Hadoop Developer
Responsibilities:
- Developed framework related components and Hive analytical queries to extract business critical information as per the business requirements.
- Developed the custom record reader to handle specific inputs.
- Developed the Hive scripts for processing the logs to identify the critical user information like no. of shares purchased.
- Created Hive queries to determine the Sell to cover information.
- Scheduled the workflows using the Oozie workflow scheduler.
- With Map-Reduce, focused on finding frequency of stocks options each year.
- Improved the Hive queries performance by implementing partitioning and clustering.
- Importing &Exporting data from web servers to HDFS, Hive.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
Environment: Hadoop, CDH4, Hive, SQOOP, OOZIE, Python, Unix Shell Scripting, Map Reduce, Hbase, PIG and Flume.
Confidential - Minneapolis MN
ETL Developer
Responsibilities:
- Prepared technical design/specifications for data Extraction, Transformation and Loading.
- Worked on Informatica Utilities Source Analyzer, warehouse Designer, Mapping Designer, Mapplet Designer and Transformation Developer.
- Analyzing the sources, transforming data, mapping the data and loading the data into targets using Informatica Power Center Designer.
- Created reusable transformations to load data from operational data source to Data Warehouse and involved in capacity planning and storage of data.
- Used Variables and Parameters in the mappings to pass the values between mappings and sessions.
- Used data miner to process raw data from flat files.
- Created Stored Procedures, Functions, Packages and Triggers using PL/SQL.
- Implemented restart strategy and error handling techniques to recover failed sessions.
- Building Reports according to user Requirement.
- Used Unix Shell Scripts to automate pre-session and post-session processes.
- Did performance tuning to improve Data Extraction, Data process and Load time.
- Wrote complex SQL Queries involving multiple tables with joins.
- Implemented best practices as per the standards while designing technical documents and developing Informatica ETL process.
Environment: Informatica Power Center 8.x, Oracle 10g, SQL Server 08, Autosys, Toad 9.0.1, Unix, SQL Developer, SQL..
Confidential, Camp Hill, PA
Informatica Developer
Responsibilities:
- Analyzed the source data, Coordinated with Data Warehouse team in developing Relational Model.
- Designed and developed logical and physical models to store data retrieved from other sources including legacy systems.
- Extensively used Informatica Power Center 8.1.1 to extract data from various sources and load in to staging database.
- Interacted with business representatives for Need Analysis and to define Business and Functional Specifications. Participated in the Design team and user requirement gathering.
- Performed source data analysis .Primary data sources were from Oracle & SQL server 2005.
- Extensively used ETL to load data from multiple sources to Staging area (Oracle 10g) using Informatica Power Center 8.1.1
- Performed migration of mappings and workflows from Development to Test and to Production Servers.
- Involved in the Development of Informatica mappings and mapplets and also tuned them for Optimum performance, Dependencies and Batch Design.
- Worked with pre and post sessions, and extracted data from Transaction System into Staging Area. Knowledge of Identifying Facts and Dimensions tables.
- Participated in all facets of the software development cycle including providing input on requirement specifications, high level design documents, and user’s guides.
- Tuned sources, targets, mappings and sessions to improve the performance of data load.
- Involved in Unit testing and documentation.
Environment: Informatica 8.1.1, Power Exchange, Oracle 10g, PL/SQL, Toad 9.4, SQL Server 2005,Windows NT, UNIX Shell Scripting.
Confidential
Software Engineer
Responsibilities:
- Developing Speech Recognition engine by acoustic and language model training & testing in IBM Voice Tailor, Shell, and PERL in Linux environment.
- Developing Voice Mail to Text engine for Spanish and French languages.
- Responsible for complete development from architecture to production.
- Coordinating project schedules, architecture, design, build and release schedules to ensure timely delivery to internal and external customers.
- Involved in coding, testing and implementing applications.
- Punctuation Generation: Independently created a Punctuation generation model for both English and French Voice Mail messages.
- The system was built after conducting various experiments using different feature selection mechanisms and machine learning algorithms.
- Cluster based Language Modeling: Research was conducted by building Language models using various clustering techniques to reduce perplexity in mixed domain data.
Environment: Linux, IBM Attila, SRILM, Shell Scripting, Python, R, Mat-Lab.