Senior Big Data Developer And Pipeline Designer Resume
SUMMARY:
- Software Development | Machine Learning | Big Data | Quality Assurance
- More than 10 years of IT professional experience along with management skill supported by degree of Master of Business Management.
- Hands - on experience in cost-effective, multi-tiered, high performance distributed enterprise applications across industries including software architecture design, development, and testing.
- Develop strong client relationships using interpersonal communication skills.
- Skilled in all facets of product life-cycle development from requirements gathering, analysis and conceptual design through architecture and implementation.
CORE STRENGTHS IN:
- Object-oriented Analysis, Design & Programming
- Big Data
- Core Java, J2EE & Web Services
- Data Modeling
- Scala, Python, and Machine Learning
- Design Patterns, Agile
TECHNICAL SKILLS:
Platforms: Cloudera, Hortonworks, AWS, Azure
Languages: Java, J2EE, Python, Python2.7, Scala, SQL, JavaScript, Unix Shell,HTML
RDBMS: Oracle 10g/9/8/7, Vertica, Teradata, Sybase & MySQL, Elastic Search
Frameworks: Spring Web MVC 1.2, 2, 3.1, Struts 1.2, 2.0, Hibernate 2.0, 3.0,JSF & Hadoop
Application Servers: WebSphere Application Server 6.1, 7.0, Weblogic 7.1, Tomcat 5.0 & 6.0
Portal Servers: Weblogic 7.1, 8.1, WebSphere Application Server v6.x
Methodologies: Agile & WaterFall
Technologies: EJB, JDBC, Servlets, Java Beans, Java Thread, JSP, JSON, Java Script Spring boot, AngularJS, JQuery, RMI, JNDI, XML, SOA, OSB engineWeb Services, Restful, WSDL, IDOS engine, Tibco, JMS, JUnit; Hadoop 2.0, MapReduce, Hive, Cassandra, Kafka, ZooKeeper, Spark, ScalaImpala, Flume, Sentry, and Kerberos
Development Tools: Rational Application Developer (RAD) & Eclipse 3.x, Sublime text
Defect Tracking Tools: Application Life Management (ALM), Rational Team Concert (RTC)
PROFESSIONAL EXPERIENCE:
Confidential
Senior Big Data Developer and Pipeline Designer
Responsibilities:
- Designed end-to-end ETL pipeline for data coming from external sources into GBI division of the company and stored into several data warehouse.
- Understood the existing internal pipeline framework that is written in Python and designed the external pipeline as conformed with that parallelly.
- Used Confidential cloud blob-store as landing zone of incoming data, reading form that and writing into hdfs environment to play with that data dynamically.
- Coded the main part of pipeline in Scala in the integration with Spark as Transformstion engine.
- Used Hive as HDFS data warehousing tool, creating multilevel partitioning manner with several attributes such as Batch Journal date/time, Run date/time and Session Id so on so forth to be able to speed up data fetching for Spark and the end points like Oracle, Teradata and Vertica databases.
- After reading the multilevel partitioned data from Hive, in Spark job, exposed this data to a Rule Evaluation process, wherein rules are given by another business entity, applied given rules on that based on success/failure criterias then also saved the results of this evaluation into Vertica and Teradata databases.
- Created data models in the databases like Vertica, Oracle, and Teradata for the purpose of storing the transformed data normalizing, as well as denormalizing that for display purpose through Tableau tool.
- Developped a well-structured framework performing the entire pipeline functionality using Maven template.
- Provisioned the Hadoop and Spark clusters to make the developped ETL framework comply with the cluster set up and business requirements.
- Created test cases in order to test the framework and passed it on QA team conveniently for further test applications.
- Created sub-pipelines EL processes such as fetching metadata from Oracle and keeping in Hive tool so that holding the dynamically changing metadata information for Spark jobs.
- Designed and developed completely separate configuration mechanism so that we get the pipeline steamlined, not only for specific external sources, but also for any sources that currently being worked with and is going to be worked with in the future.
- Used Autosys scheduler to control spark jobs in dev/uat/prod environments.
- Collaborating with the other team members very frequently to make sure their works align with the main pipeline that I developed.
- Designed and developed end-to-end Disaster recovery process between clusters in any direction. This DR framework does copy any entity of primary cluster to a secondary cluster, letting all data and scheduled processes in sync between cluster of choice. Designed and developed end-to-end Disaster recovery process between clusters in any direction. This DR framework does copy any entity of primary cluster to a secondary cluster, letting all data and scheduled processes in sync between cluster of choice.
Confidential
Senior Big Data Developer and Architect
Responsibilities:
- Designed a data ingestion platform where having data into a landing zone (stand alone server) then transfer data into hdfs by using polling shcedulers then send it to end storage points like Hive, Cassandra and Elastic Search.
- Built Data ingestion platform with ETL process, Data Warehousing within diverse Data Lakes storages as well as data analytic application pipeline, saving the result of that to the Data Lakes as datasets.
- Created many hdfs clusters on the top of Cloudera Manager using Cloudera Director platform and constantly worked on new cluster creation, optimization and tool upgrades of Cloudera manager embedded technologies.
- Kerberozed clusters from scratch ground up on AWS and Azure servers, integrating FreeIPA LDAP provisioning for Unix users so that only allowed users are able to access any component of cluster with one sign on (Provisining users with their Unix names only)
- Developed an end-to-end data ingestion product on the top of the platform I built as mentioned above which is taking only one spark job or any machine learning written script and as exposed on the data in the data lake that is fed by the data pipeline included in that, or creating a workflow to initiate the data pipeline system from source into the end point as well as automated equivalence of the same process without an user interaction.
- Expertized on Jupyter notebook, installing from beginning to implementation of API functionalities as well as exposing Spark jobs along with applying Machine Learning algorihms using Python libraries like Panda, numpy, Sklearn, Scipy, so on.
- Wrote cron jobs in bash shell for schedulings and other pratic script-required processes.
- Developed APIs in Python and Scala languages in order to ingest the data, process the data, store the data, and exposing to the data to different existing and new-built environments using REST APIs that were also completely built by me.
- Used Cassandra, Elastic search, Mysql and Mongodb with all sort of provided features by these data storages.
- Used Cloud foundry to play with written code in the environmental manner.
- Used Docker for specific process in order to built some sub-platform to submit spark jobs with more efficient and fast way.
- Had a great experience on AWS VPC and surrounding technologies hosting our platform built in amazon containers.
- Worked with distribution platforms both Cloudera manager and Hortonworks.
- Still working on this project and will add more experience as I made more progress. I have not given much about the details here however I would like to talk more about that in the interview.
Confidential, New York
Senior IT Specialist
Responsibilities:
- Served in data warehousing solutions while working with a variety of database technologies with proven history of building large-scale data processing systems.
- Experienced architecting highly scalable, distributed systems using different open source tools as well as designing and optimizing large, multi-terabyte data warehouses working within security tools such as Apache Sentry and Kerberos.
- Designed and implemented fast and efficient data acquisition using Big Data processing techniques with Big Data tools, Hive, Impala, Spark (Streaming and SQL), Kafka, and Flume.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster using major components in Hadoop Ecosystem like Hadoop, Hdfs, Spark, Scala, Hive, Impala,Kafka and Flume and deployed an Apache Lucene/Solr search engine server to help speed up the search of the customer documents as well as documents for internal use.
- Worked on data ingestion where we pulled the data through gateway that is approximately between 1.5 to 2 TB in volume, delegating a data pipeline , running spark engine to distribute the data into different data storages in various data warehouse end points.
- Took place in installation, configuration, maintenance of Hadoop clusters for application development and Hadoop tools like Hive, Impala, and Zookeeper managing and reviewing log files, loading log data into Hdfs using Flume and worked on creating mapReduce jobs integrating with spark and scala technologies to empower data for search, aggregation, and process.
Confidential, New Jersey
IT Specialist
Responsibilities:
- Developed dynamic web applications with complex business logic code using various interfaces, helper and utility classes using miscellaneous frameworks such as Struts, Spring and Hibernate.
- Used spring core for dependency injection/Inversion of control (IOC), and integrated frameworks like struts and hibernate, and worked with Struts components such as action mapping, action class, dispatch action class.
- Provided requirements, architecture consultation, and development on multi-million dollar project.
- Collaborated with design and contributed class and sequence diagrams, ensuring delivery of high-quality code and fail-fast mechanism using TDD (Test Driven Development) with almost all the testing features.
- Implemented fair schedulers on Job tracker to share the resources of the clusters for the mapReduce jobs given by users.
- Created Hive External tables and loaded the data in to tables and query data using HiveQL, and worked on importing and exporting data from Oracle and MySQL into Hdfs and Hive.
Confidential
Software Java Developer
Responsibilities:
- Experienced in enterprise application design and development using Object Oriented Programming, Java / J2EE technologies following software development life cycle (SDLC) and generally with Singleton design pattern.
- Worked extensively on Eclipse IDE for developing application using EJB entity beans, stateless and stateful.
- Extensive experience in building enterprise applications and distributed systems using technologies such as JAVA, J2EE(Servlets, JSP, JSF, EJB, Struts, Hibernate)
- Identified potentially show-stopping bugs through developing and testing application codes and web services (SOAP-Restful) with top-down approach (WSDL first) and service implementation.
- Wrote test cases using JUNIT as a framework.
Confidential
Java Developer
Responsibilities:
- Wrote Java classes and properties
- Experienced on multi-threading, data structures, algorithms, object oriented design, and design patterns as well as SQL query generation and data modeling.
- Learned oracle database, table structures, manipulate tables and stored procedures.
- Contributed to development of client side and server side codes for external and internal web applications.