We provide IT Staff Augmentation Services!

Quantus Spark Developer/big Data Engineer Resume

3.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY

  • Over 12+ Years of strong experience in software development using Big Data/Hadoop Eco Systems, Apache Spark, Python and ETL Technologies.
  • Veritable experience working wif world class organizations viz. Confidential, Group.1 Confidential, Confidential, DXC Technology, CSX Transportation Data and NTT Data Americas.
  • Hands on Experience on major components in Hadoop Eco Systems like Spark, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Kafka.
  • Work Experience wif Cloud Infrastructure like Amazon Web Services.
  • Experience in Importing and Exporting data using Sqoop from Oracle/Mainframe DB2 to HDFS/Data Lake.
  • Experience in developing Shell Scripts, Oozie Scripts and Python Scripts.
  • Involved in the Software Development Life Cycle (SDLC) various phases like Requirements, Analysis/Design, Development, Testing, Implementation and Maintenance, and Working noledge of project methodologies (i.e. Waterfall, Agile, and Scrum).
  • Hands on data modeling, design and development skills/expertise to implement and manage IT projects. Perform logical analyses of scientific problems and formulate statistical models for data & analytics solutions.
  • Experience in installing software applications, writing test cases, debugging and testing batch and online applications.
  • Extensive experience working inOracle, DB2, SQLServer andMy SQLdatabase.
  • Expertize in Oracle Data Integrator 10g/11g/12c/Big Data Edition to build various Extraction Load and Transform (ELT) processes for Enterprise Integration and Data warehousing efforts.
  • Hands on experience in different phases of big data applications like data ingestion, data analytics and data visualization and building data lake based data marts for supporting data science and Machine Learning.
  • Developed scalable and reliable data solutions to move data across systems from multiple sources in real time as well as batch modes.
  • Knowledge in implementing advanced procedures like text analytics and processing using Apache Spark wif Python language.
  • Experience in designing and handling of various Data Ingestion patterns (Batch and Near Real Time) using Sqoop, Distcp, Apache Storm, Flume and Apache Kafka.
  • Experience in designing and handling of various Data Transformation/Filtration patterns using Pig, Hive, and Python.
  • Strong Knowledge on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
  • Proficient in Various design patterns of Data Ingestion, Data Modelling and In - depth understanding ofData Structureand Algorithms.
  • Proficient in writing stored procedures, Complex SQL Queries, optimizing the SQL to improve performance, Packages, Functions and Database Triggers using SQL and Possess Strong data analysis skills using Python, Hive, Apache Spark, MS Excel and Access DB.
  • Proficient in mapping business requirements, use cases, scenarios, business analysis, and workflow analysis. Act as liaison between business units, technology and IT support teams.
  • Good at Writing reusable, testable, and efficient code and Ability to integrate multiple data sources and databases into one system.
  • Good at working as a Team player/Technical Lead and under less supervision and worked in the Onsite-Offshore model and has experience in managing remote teams.
  • Experienced in creation of the documentation needed for the project implementation topics such as rollouts, contingency plans, communications, dependencies, etc. and Perform Annual BCP (Business Continuity Plan) and Disaster recovery.
  • Having good understanding of the end-to-end Systems processes and the interfaces and dependencies wif other processes, application design documents, functionality, data flow and technical aspects.
  • Ability to multi-task for different applications and Ability to take ownership and provide coordination for tasks related to implementation to test and production.
  • Ability to deliver high-quality results under tight deadlines and ability to work In Team/Multi Diverse stake holder environment.
  • Good Facilitation and Coordination skills to guide and direct key parts of project work wif team such as meetings, planning sessions, and training of team members.
  • Expertise in developing Use Cases, Sequence Diagrams and Class Diagrams and Adhere to SCM (Software Configuration Management) during implementation.
  • Possess strong interpersonal and, excellent analytical & problem solving skills.
  • Ability to adapt to evolving technology strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop (Horton works, Cloudera): HDFS, Map Reduce, Pig, HBase, Kafka, Spark, Zookeeper, Hive, Oozie, Sqoop, Flume, Storm, Impala.

Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), HTML, SQL, PL/SQL, Apache Spark, Python, Scala, COBOL, JCL, REXX, FOCUS, DB2 Stored Procedures, SQL, XML, Windows Batch Scripts, Linux shell scripting.

ERP: PeopleSoft 9.1 and 9.2

Operating Systems: UNIX, Windows, LINUX

Application Servers: IBM Web sphere, Web Sphere, CICS

Deployment Tools: DMS (Deployable Management System), IBM Urban Code/Jenkins, Change Management

Messaging Services: Message Broker, MQ, Apache Kafka, Amazon SQS

Databases: Netezza & MySQL 4.x/5.x, Oracle, IBM DB2, IMS DB/DC

Project Management Tools: VSS (Visual Source Safe), SharePoint, REMEDY, TRAC, JIRA

Transfer Protocols: FTP, sFTP,MFT,TELNET, EDIFACT

Java IDE: IntelliJ, Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

Scheduling tools: Control-M,ESPX,Crontab,autosys and oozie

Development Tools: TOAD, SQL Developer, Maven, Application Designer, App Engine, CI, SQR, SQL, PS-Query, People code, Data Mover, Eclipse, BMCDB2, DB2MENU, IBM DATA STUDIO, OPTIM, MAINVIEW, KADET, SIA,ANACONDA, GTB (CICS Map Design Tool), IBM Serena Changeman, ROVR (Registration Ownership Versioning Route), RMS, Troux, SPAR.

PROFESSIONAL EXPERIENCE

Quantus Spark Developer/Big Data Engineer

Confidential, Piscataway, NJ

Responsibilities:

  • Teamed up wif Architects to design B2C (Bill to Cash) Data model to support cost TEMPeffective data analytics, drive Business strategy for collections, credits and PTP (Promise to pay).
  • As part of Batch Modernization initiative in B2C, Analyzed existing batch ingestion developed in Oracle Data Integrator and developed PySpark application as ETL tool. dis reduced the batch ingestion time from 3.5 hrs to 15 Minutes.
  • Identifying data points, data domains, analyzing complex ETL’s and performing data modelling and transforming them to Hadoop/Data Lake.
  • Designed and Architected Spark programs using Python to compare the performance ofSpark wif Hive and SQL and developed python scripts using both RDD and Data frames/SQL/Data sets inSpark 1.6 for Data Aggregation, queries and writing data.
  • Created Kafka data pipelines to ingest credit data to Data Lake.
  • EmployedSpark API over Hortonworks Hadoop YARN to perform analytics on data in Hive and optimized the existing algorithms in Hadoop usingSpark Context,Spark -SQL, Data Frames and RDD's.
  • Decommissioned existing ODI ETL package and created new PySpark application as ETL tool and ingested data to Corporate Data Lake.
  • Developed Sqoop jobs and Hive scripts to import 15 TB worth of rolling data from Oracle 10g/11g to Hive and also by performing reverse engineering on Hive fact tables created raw dataset having 850 Columns.
  • Developed Hive Scripts and oozie workflows to split large dataset to multiple fact tables by using SNAPPY compression.
  • Created partitions wif in the Hive table and transformed data from legacy tables to HDFS and HIVE.
  • Wrote shell scripts and oozie scripts to automate the ingestion process.
  • Participated in user session for Credit and payments swim line use case.
  • Developed Monthly Archival process using Oozie to ingest Oracle tables to Hive.
  • Act as subject matter expert to halp refine the business processes across multiple business lines (Corporate finance and marketing across wireline & wireless LOBs) by closely working wif other application teams (ERP systems, CRM systems, Financial Billing systems, Bill to Cash systems etc.) to simplify IT processes and empower data scientists to halp enable business.

Security and PeopleSoft

Confidential, Bloomington, IL

Responsibilities:

  • Teamed up wif Architects to design Spark model for processing security logs and identifying Anomalies using Graph Network Analysis.
  • Developed Graph Network Analysis using Node Level and community anomalies using Egonet Method and Traffic Dispersion Method by using Scala, Spark and GrapX Library.
  • Developed scalable and reliable data solutions to move data across systems from multiple sources in real time as well as batch modes.
  • Utilized expertise in models that leverage the newest data sources, technologies, and tools, such as Python, Hadoop, Spark, AWS, as well as other cutting-edge tools and applications for Big Data.
  • Performed oracle and DB2 data ingestion using SQOOP.
  • Performed near real time and batch syslog/files ingestion in to HDFS using Flume, and Kafka.
  • Performed data filtration and transformation using Pig and created Hive schemas for the structured data.
  • Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first.
  • Implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Develop integration solution to bridge the gap between external and HDFS and Reduced the model run times by performance tuning of the models for the business to run the model hundreds of times a day.
  • Good noledge on Spark platform parameters like memory, cores and executors.
  • Used Spark DataFrame API over Cloudera platform to perform analytics on hive data.
  • By Using Python Libraries NumPy, SciPy, scikit-learn, pandas analyzed large datasets and developed graphs.
  • Responsible in Managing and scheduling Jobs on a Hadoop cluster.
  • Implemented test scripts to support test driven development and continuous integration.
  • Used version control tools like IBM Urban code/Jenkins for deployments.
  • Created Control M jobs for automation for workflows and Experienced in fixing various production issues during user acceptance test.
  • Participating in global project planning and roadmap definition and Interacting wif external partners, customers, and vendors.
  • Expert-level noledge of Amazon EC2, Amazon S3, Amazon SimpleDB, Amazon RDS, Amazon Elastic Load Balancing, Amazon SQS, and other services of the AWS family
  • High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security, IT Infrastructure, Virtualization, and Internet Technologies.
  • Participated in END to END transition from Client by attending brainstorm KA Sessions and halped to achieve to Standstill of PeopleSoft HR applications like Environment and AppTech.
  • Prepared Project Management documents like KCD (Knowledge capture Document), Topology Diagrams, and Application related documents.
  • Good understanding of Oracle suite of applications in HCM - that includes HRMS (Human Resources Management Systems), Employee Benefits Administration (OAB), Payroll, Assets Management and Finance Modules.
  • Responsible in identifying sensitive information at NPI (Non Public Information), PI (Personal Information), PHI (Protected Health Information) and SPI (Sensitive Personal Information) level and add the metadata of Oracle, Flat files, IMS data files to IIS (IBM Infosphere Information Server) using OPTIM and Metadata Update Tool.
  • Developed Python Programs for generating ITSS Reports and Process Scheduler Alerts and Processing of Jobs.
  • Wif use of Analytic Languages like Python and VBB developed tools to halp HCM Technical team for their daily usage.
  • Helped Application Teams to fix the data issues, Oracle delivered COBOL issues and PIA issues and Co-ordinate wif Oracle for resolving of Oracle SR’s in HCM application.
  • Responsible in generating image copies using REXX Procedures and REORG jobs using BMC for all the PeopleSoft Environments.
  • Responsible for adding and maintaining assets (Meta data) and Sensitivity in IIS (IBM Infosphere Information Server).
  • Responsible to conduct Annual Access Cleanups, DSA, AUTHID Cleanups and worked wif Enterprise Teams for successfully completion of major projects like Sensitive Data in Test, Production Data Protection - Sensitive Data Identification and Ownership.
  • Creates organizational IT road map; closely interact wif Business partners and Service delivery team (SDT) in defining the IT objectives.
  • Creating primary database storage structures (table spaces) once developers has designed an application.
  • Adhere to HIPAA (Health Insurance Portability and Accountability Act) and data protection standards.
  • Modifying the database structure, as necessary, from information given by application developers and manage Backup, Security, and Controls of databases by ensuring system security and user access.
  • Performed Annual BCP (Business Continuity Plan) and Disaster recovery.

Corporate Systems Professional Application Developer

Confidential

Responsibilities:

  • Responsibilities include attending Use Case Workshops, understanding doc preparation, estimation, developing, testing and implementation.
  • Loading all flat files and DB2 tables data from various Applications to HDFS for further processing.
  • Written and executed Apache PIG scripts on top of the HDFS data.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
  • Creating the script files for processing data and loading to HDFS.
  • Writing CLI commands using HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Analyzing the requirement to setup a cluster.
  • Setting up Cron job to delete Hadoop logs/local old job files/cluster temp files.
  • Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate TEMPeffective querying on the log data.
  • TEMPEffective coordination wif Release teams during implementation.
  • Prepare deployment plan for implementation projects.
  • Fill out all documentation needed for production such as change management templates, instructions and deliverables.
  • Coordinate PROD & TEST implementation and checkout.
  • Share noledge and experience wif other team members during noledge sharing sessions.
  • Involved in System Design and Architecture of SAS Travel Agent Redemption Project, SAS Campaign Credits, NDP (New Data Project) to be imparted in System using IBM Mainframe Technologies.
  • Coordinated and worked on critical IBM Mainframe Applications like Corporate Systems and Columbus Tracking Systems.
  • Responsibilities include interacting wif Business partners and customers in various phases of SDLC.
  • Direct and lead the work of other Team members TEMPeffectively.

Lead and Senior Software Developer

Confidential, Jacksonville, FL

Responsibilities:

  • Recommends, establishes business case, and assists project manager and others in building acceptance of new proposed program modifications, methods, and procedures. Conducts and provides moderately complex cost/benefit analysis for new proposed program modifications, methods, and procedures.
  • Responsibilities include interacting wif Business partners and customers in various phases of SDLC.
  • Co-ordinate the work request in EDI, DARTS, TAXI, DARTS, Freight Claims and SMS area.
  • Worked on Super Critical and business critical Transactions in EDI, TAXI.
  • Development and Enhancements to the existing EDI and SMS system as specified by the customer using CICS, COBOL, Batch, DB2 & IMS
  • Responsibilities include doing enhancements, maintenance, testing and assigning tasks to team members, guiding, conducting trainings and conducting project meetings.
  • Interaction wif Onsite counterpart and the users.
  • Trained Team members and assigned the tasks and halping them to deliver the assigned tasks in time wif no issues.
  • Coordination of Production Releases, including major, minor, and hotfix releases.
  • Prepared project related documents and WSR’s.
  • Migration of OS/VS COBOL programs into enterprise COBOL.
  • Converting, testing and delivering the converted programs and reviewing the components to meet quality assurance.

We'd love your feedback!