Etl+hadoop Developer Resume
Woodcliff Lake New, JercY
SUMMARY:
- Overall 10 + years of experience in IT development in Data warehouse - ETL tool (Informatica,Talend(6.0/6.3), Datastage) along with Big Data.
- 3 .8 years working experience in Talend(ETL Tool) to developing & leading the end to end implementation of Big Data projects, comprehensive experience as a Hadoop Developer in Hadoop Ecosystem(Cloudera and hortonworks) like Hadoop,Map Reduce, Hadoop Distributed File System ( HDFS ), HIVE,IMPALA, PostgreSQL, Yarn,Ozie, Hue,Alation,Spark,Python.
- 10 + years of experience in Oracle, PL/SQL, SQL Tuning and Unix/Linux.
- 4+ years ecperience in Teradata/DB2/SQLSERVER.
- 3+ years experience with Hadoop and related systems including distributed systems and data management.
- Worked on Importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop .
- Worked on data integration projects with RedShift/Greenplum/Exadata/Vertica/Oracle/SQL Server/Informatica/Talend/BigQuery env.
- 3 Years experience to working with Java to write Map Reduce program.
- Working experience to use Java utility to implement package,class,Inheritance,Polymorphism and Encapsulation.
- Experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Talend/Informatica ecosystems. Able to consolidate, validate and cleanse data from a vast range of sources - from applications and databases to files and Web services.
- Capable of extracting data from an existing database, Web sources or APIs. Experience designing and implementing fast and efficient data acquisition using Big Data processing techniques and tools.
- Involved in Creating tables, partitioning, bucketing of table and creating UDF's in Hive.
- Strong trouble-shooting and problem-solving skills with a Logical and pragmatic attitude
- Team player with strong oral and interpersonal skills
- Work with business to gather requirements and define the Data Quality solutions for data profiling, standardization and cleansing etc.
- Define and contribute to development of standards, guidelines, design patterns and common development frameworks & components
- Experience in analyzing data using HiveQL,SparkSql,PostgreSql, Pig Latin, and custom Map Reduce programs in Java.
TECHNICAL SKILLS:
Hadoop ecosystem: Map Reduce, Sqoop, Hive,Impala, Oozie,Hue, PIG, HBase, HDFS, Zookeeper,Yarn, Spark
ETL: Talend Big Data Studio 6.0/6.3/6.5,Informatica Power Center (6.2/7.1/8.1/8.6/9.1/9.5/9.6.1 ),IDQ,IDE, , Data stage (7.5/8.1)
Databases: Teradata (13.0/14.10),Exadata,Oracle 9i /10g/11g (SQL, PL/SQL Basics), SQL Server 2005, DB2,Greenplum,Vertica,RedShift
Tools: Toad,Microsoftvisio,Winscp,Appworx,Control-M,Remedy, AutoSys,Zenkins,Git hub,Jira,Talend Administration center.
Languages: C, C++, JAVA, SQL,PL/SQL, Pig Latin, HiveQL, Unix shell,Python scripting.
Operating System: Windows 98, 2000,7, XP, UNIX, Linux
Domain Knowledge: Finance, Banking, Telecom,Healthcare, Insurance,Manufacturing,playstation
Methodologies: Agile, Waterfall.
PROFESSIONAL EXPERIENCE:
Confidential
ETL+Hadoop Developer
Responsibilities:
- Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
- Perform dependency analysis on code to establish data lineage.
- Peer review within for developed code within the team member to make sure that each and every requirement captured correctly,Make sure that our coding as per client standard Perform some hook and crook method for performance optimization to make sure that code during run perform well or complete within given time frame.
- Pulling data from oracle and dumping on HDFS as Avro format file and then again converting Avro format to parquet, to resolve the performance issue, we are loading data into hive/impala as parquet files.
- Pulling data from webservice(tHttpsProxy) and do cleansing and keep data on hdfs and then apply some logic through Talend(ETL) and pushing into hdfs to acess the data from hive and impala.
- Migrating data from Oracle to Data Lake using Sqoop,Spark and Talend(ETL Tool).
- Use batch job to create spark job to load data on HDFS.
- Perform the unit and integration testing for each and every component to make sure the code is working fine and not impacting the other system
- Use Job Conductor to deploy the job(.zar) files and scheduled and monitor the job.
- Migrate the code from development to production environment using Nexus,scheduling job to run in production through CONTROL-M
- Partitioning, Dynamic Partitions, Buckets of HIVE.
- Implement HIVE UDF’s for evaluation, filtering, loading, and storing of data.
- Design Managed and External tables in Hive to optimize performance to improve performance
- Using AutoMap join and avoid skew join, optimize limit operator, enable Parallel Execution, enable MapReduce Strict Mode, Single Reduce for Multi Group BY function.
- Load data from different source (database and files) into Hive using Talend tool (standard, Map Reduce and Spark job), monitor System health and logs and respond to any warning or failure conditions.
Environment: Hadoop(Claudera(5.10)),HDFS,Hive,Impala,Sqoop,Spark,Oracle,UNIX,Talend Big Data Studio 6.3/6.5,CONTROL-M,Nexus
Confidential, Woodcliff Lake,New Jercy
ETL+Hadoop Developer
Responsibilities:
- Uderstand the current system(Warranty/Sales),Analyzes high level system specifications,business requirements and/or use cases.
- Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
- Migrating data from Oracle and DB2 to Data Lake using Sqoop,Spark and Talend(ETL Tool).
- Creating Hive queries to help market analyse data with reference tables use different Components of Talend (tOracleInput,tOracleOutput,tHiveInput,tHiveOutput,tHiveInputRow, tDB2Input,tDB2Output,tUniqeRow,tAggregateRow,tRunJob,tPreJob,tPostJob,tMap,tJavaRow,tJavaFlex,tFilterRow, tXMLMap, tFileInputXML, tFileOutputXML, tExtractXMLField etc) to develop standard jobs.
- Use batch job to create spark job to load data on HDFS.
- Use Job Conductor to deploy the job(.zar) files and scheduled and monitor the job.
- Partitioning, Dynamic Partitions, Buckets of HIVE.
- Implement HIVE UDF’s for evaluation, filtering, loading, and storing of data.
- Design Managed and External tables in Hive to optimize performance to improve performance
- Using AutoMap join and avoid skew join, optimize limit operator, enable Parallel Execution, enable MapReduce Strict Mode, Single Reduce for Multi Group BY function.
- Load data from different source (database and files) into Hive using Talend tool (standard, Map Reduce and Spark job), monitor System health and logs and respond to any warning or failure conditions.
- Using API interface Alation to query data and managed tables.
Environment: Hadoop(Hortonworks),HDFS,Hive,Impala,Sqoop,Spark,Oracle,DB2,UNIX,Talend Big Data Studio 6.3
Confidential, San Diego,CA
ETL+Hadoop Developer
Responsibilities:
- Uderstand the current system(DPE/GFM),Analyzes high level system specifications,business requirements and/or use cases.
- Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
- Migrating Exadata/Informatica projects to Talend.
- Experience with working in the AWS cloud.
- Handle cloud big data on using tS3get and s3put,tS3Delete,tS3Copy etc.
- Use batch job to create spark job to load data on HDFS.
- Use Job Conductor to deploy the job(.zar) files and scheduled and monitor the job.
- Partitioning, Dynamic Partitions, Buckets of HIVE.
- Data migration from relational (Oracle exadata) databases or external data to HDFS
- Implement HIVE UDF’s for evaluation, filtering, loading, and storing of data.
- Design Managed and External tables in Hive to optimize performance to improve performance
- Using AutoMap join and avoid skew join, optimize limit operator, enable Parallel Execution, enable MapReduce Strict Mode, Single Reduce for Multi Group BY function.
- Load data from different source (database and files) into Hive using Talend tool (standard, Map Reduce and Spark job), monitor System health and logs and respond to any warning or failure conditions.
- Using API interface (HUE) to query data and managed tables.
Environment: Hadoop(Cloudera),HDFS,Redshift,Hive,Impala,Sqoop,Spark,Oracle(Exadata/HotMPP),UNIX,Informatica 9.6.1,TalendBig Data Studio 6.3
Confidential, SFO,CA
ETL+Hadoop Developer
Responsibilities:
- Provides expertise during the initial phases of the project, Analyzes high level system specifications,business requirements and/or use cases. Converts information into the appropriate level specifications and system design plan for the development.
- Provides appropriate documentation for design decisions, estimating assumptions, code modules, and performance metrics as required by organization standards.
- Uses comprehensive application knowledge and or technical knowledge to provide guidance and technical leadership to project resources or maintenance resources. Maintains an awareness of other projects and their possible effect on ongoing projects.
- Build data systems and data pipelines that extract, classify, merge, and deliver new insights.
- Data Ingestion, aggregating, Loading and transforming large data sets of structured, semi structured and unstructured data into hadoop(data lake).
- Developed Spark code and Spark-SQL for faster testing and processing of realtime data structured and unstructured data.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Implemented Hive UDF's for evaluation, filtering, loading and storing of data.
- Partitioning, Dynamic Partitions, Buckets in HIVE.
- Data migration from relational(Oracle.Teradata) databases or external data to HDFS using Sqoop.
- Designed both Managed and External tables in Hive to optimize performance.To improve performance we use automap join and avoid skew join, Optimize LIMIT operator,Enable Parallel Execution,Enable Mapreduce Strict Mode,Single Reduce for Multi Group BY.
- Regular monitoring of Hadoop Cluster to ensure installed applications are free from errors and warnings.
- Develop map reduce programs using Combiners, Sequence Files, Compression techniques, Chained Jobs, multiple input and output API.
- Loading data from different source(database & files) into Hive using Talend tool.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
Environment: Hadoop(Cloudera), HDFS, Map Reduce,Hive,Sqoop,Spark,Db2,Oracle,Teradata,Eclipse,UNIX,Talend Big Data Studio 6.0
Confidential, Northbrook, IL
ETL+Hadoop Developer Project: Confidential
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from multiple data sources(Orcale,SqlServer) using Sqoop, performed Cleaning, Transformations and Joins using Pig.
- Push data as delimited files into HDFS using Talend Big data studio.
- Involved to write Map Reduce program using Java .
- Load and transform data into HDFS from large set of structured data /Oracle/Sql server using Talend Big data studio.
- Exported analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experience in providing support to data analyst in running Hive queries.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Creating Hive tables, partitions to store different Data formats.
- Involved in loading data from UNIX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Consolidate all defects, report it to PM/Leads for prompt fixes by development teams and drive it to closure.
Environment: - Apache Hadoop x.2, Map Reduce, Hive, Sqoop, Spark, SQL, Eclipse, Unix Script, Oracle, Sql server, Talend Big Data Studio 6.0.
Confidential, Dublin, OH
ETL developer
Responsibilities:
- Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
- Gather and analyze business and technical requirements.
- Work with business to gather requirements and define the Data Quality solutions for data profiling, standardization and cleansing etc.
- Define and contribute to development of standards, guidelines, design patterns and common development frameworks & components
- Working effectively in a distributed global team environment
- Prepare Unit Test Plan/ Design (with direction from customer).
- Experience to prepare HLD, LLD, UTC, Tech Design Document.
Environment: Informatica 9.6.1,Oracle11g, Sql Server, DB2, Unix, Window(7).
Confidential
ETL Module Lead
Responsibilities:
- Performed the data profiling and analysis making use of Informatica Data Explorer (IDE) and Informatica Data Quality (IDQ).
- Provide solutions for data quality operations and Informatica ETL Processes to support Data Integration and Reporting requirements.
- Data Profiling, Cleansing, Standardizing using IDQ and integrating with Informatica suite of tools
- Develop and contribute to strategic vision for Data Quality and Data Archive
- Perform hands-on development on the Data Quality tools (Informatica Developer), Analyzer.
- Work with business to gather requirements and define the Data Quality solutions for data profiling, standardization and cleansing etc.
- Define and contribute to development of standards, guidelines, design patterns and common development frameworks & components
- Working effectively in a distributed global team environment
- Informatica Administration activity (by using Admin console) to Create folders, user accounts, View and manage folder permissions and privileges, Start, Stop services, View service status and log.
- Informatica application support and maintenance.
- Act as tertiary escalation contact for issue resolution and problem management.
- Develop and contribute to strategic vision for Data Quality and Data Archive
- Perform hands-on development on the Data Quality tools (Informatica Developer),Analyzer.
Environment: Informatica(8.6/9.1),IDQ, IDE, Business Object xir3, Teradata, Oracle 10g, Appworx, Unix, Remedy, Window XP.
Confidential
ETL Module Lead
Responsibilities:
- Gathering the requirement with attending daily call with onsite.
- As a Module lead involve in mentoring the team to achieve the goal on time.
- Involve in Prepare the HLD, LLD, UTC document.
- Proper understanding & analysis of the requirement.
- Used different Client tool data stage Designer, Datastage Director,Datastage Manager extensible.
- Involved to define or prepare Unit test Plan.
- Involve reviewing the Code before deliver the objects.
- Integration Testing.
Environment: - Informatica(8.6), Datastage(7.5/8.1), Oracle 10g, Linux, Autosys, WindowXP.
Confidential
ETL developer
Responsibilities:
- Successfully handled team member for BENCAP,DBP module.
- Proper understanding & analysis of the requirement.
- Designing according to the requirement. (High level to Low Level)
- Implementation of mapping.
- Unit test Plan preparation.
- Code review and Unit Testing.
- Product Testing, Integration Testing.
- Code migration development to production environment.
- High level ETL Production Support documentation.
Environment: -Informatica(6.1/7.1), Oracle 9i, Unix, Windowxp.