Aws/ Hadoop / Sle Developer Resume
CA
SUMMARY
- Over 7+ years of IT experience in Analysis, Design, Development and in Scala, Spark, Hadoop and HDFS environment and experience in JAVA, J2EE.
- Experienced in developing and Implementing MapReduce programs using Hadoop to work as per the requirement.
- Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern Matching and Map - Reducing.
- Developed ETL test scripts based on technical specifications/Data design documents and source to target mappings.
- Ability to build deployment on AWS, build scripts (Boto 3 & AWS CLI) and automated solutions using Shell and Python.
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, AWS CLI.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
- Responsible for Designing Logical and Physical data modelling for various data sources on Amazon Redshift. Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Wrote scripts to automate data load and performed data transformation operations.
- Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
- Experienced in working with flume to load the log data from multiple sources directly into HDFS.
- Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), Map Reduce, Sqoop, Apache Spark and Scala.
- Wrote scripts and indexing strategy for a migration to Amazon Redshift from SQL Server and MySQL databases
- Extensive experience working in Oracle, DB2, SQL Server and MySQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Developed Spark scripts by using Scala shell commands as per the requirement. Processing the schema oriented and non-schema oriented data using Scala and Spark.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map-Reduce and Pig jobs.
- Experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Experience with NOSQL databases like HBASE and Cassandra. Involved in Support for the weekly Production maintenance window and new SOLR deployment process for the new environments.
- Experience in scripting using UNIX Shell script. Proficiency in Linux (UNIX) and Windows OS.
- Experienced in setting up data gathering tools such as Flume and Sqoop.
- Extensive knowledge about Zookeeper process for various types of centralized configurations.
- Knowledge of monitoring and managing Hadoop cluster using Hortonworks.
- Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
- Experienced in analyzing, designing and developing ETL strategies and processes, Writing ETL specifications.
- Experiences on applications using Java, python and UNIX shell scripting.
- Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Linux, HBase, Oozie, Zookeeper, spark, storm& Kafka
Java & J2EE Technologies: Core Java
IDE’s: Eclipse, Net beans
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP, FTP
ETL Tools: Talend, Informatica, Pentaho, SSRS, SSIS, BO, Crystal reports, Cognos.
Testing: Win Runner, Load Runner, QTP
PROFESSIONAL EXPERIENCE
Confidential, CA
AWS/ Hadoop / Scale Developer
Responsibilities:
- Implemented Hadoop cluster on Cloudera and assisted with performance tuning, monitoring and troubleshooting.
- Installed and configured MapReduce, HIVE and the HDFS.
- Implementations were done using the spark API's and Spark-SQL written in Python.
- Created, altered and deleted topics (Kafka Queues) when required with varyingPerformance tuning using Partitioning, bucketing of IMPALA tables. Convert the data into relational format to load into Redshift.
- Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift.
- Worked on iterative data validation and processing which is done on Spark with the help of Scala.
- Created data partitions on large data sets in S3 and DDL on partitioned data. Analyzed the SQL scripts and designed the solution to implement using Scala. Developed analytical component using Scala, Spark and Spark Stream.
- Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
- Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
- Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
- Created indexes for various statistical parameters on Elastic Search and generated visualization using Kibana
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
- Involved in the development of Pig UDF'S to analyze by pre-processing the data.Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
- Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
- Implemented Spark using python and Spark SQL for faster testing processing the data.
- Developed spark scripts by using python shell commands as per the requirement.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis. Using Curator API on Elastic Search to data back up and restoring.
- Used Hive partitioning and bucketing for performance optimization of the Hive tables and created around 20000 partitions.Importing and exporting data into HDFS and Hive using Sqoop.
- Consumed the data from Kafka queue using Spark.Configured different topologies for Spark cluster and deployed them on regular basis.
- Ran monthly security checks through UNIX and Linux environment and installed security patches required to maintain high level security to the clients.
- Involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop.
- Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
- Developed web application for department of storing and retrieving data of employees using Struts, REST services and MySQL Database.
Environment: Hadoop, Map-Reduce, AWS, EMR, HBase, RedShift, Power BI, Elastic Search, Hive, Impala, Pig, Hive, Sqoop, Hdfs, Flume, Oozie, Spark, Spark SQL, Python, Spark Streaming, Scala, Intellij, Kafka and Cloudera.
Confidential - Houston, TX
Big Data/ Talend Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Worked extensively with Flume for importing social media data.
- Created Talend Mappings to populate the data into Staging, Dimension and Fact tables.
- Worked on project to retrieve log messages procured by leveraging Spark Streaming.
- Designed Oozie jobs for the auto processing of similar data. Collect the data using Spark Streaming.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process the complex consumer information.
- Used Scala functional programming concepts to develop business logic.
- Assisted in designing and programming of object-oriented databases with Python and other languages.
- Worked with xml's extracting tag information using xpaths and Scala XMLlibraries from compressed blob datatypes.
- Developed Spark scripts by using Scala IDE as per the business requirement.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
- Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop.
- Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK.
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs. Parsed high-level design specification to simple ETL coding and mapping standards.
- Cluster co-ordination services through Zookeeper. Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Developed complex Talend jobs mappings to load the data from various sources using different components. Design, develop and implement solutions using Talend Integration Suite.
- Partitioning data streams using KAFKA. Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Built big Data solutions using HBase handling millions of records for the different trends of data and exporting it to Hive.
- Design and development of database operations in PostgreSQL. Experience working on NoSQL databases like HBase and PostgreSQL.
- Developed scripts in Hive to perform transformations on the data and load to target systems for use by the data analysts for reporting.
- Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
- Using SBT Scala developed spark code and Spark-SQL/Streaming for faster processing and testing of data.
- Used Scala collection framework (Lift Framework and Play Framework) to store and process the complex employer information. Based on the offers setup for each client, the requests were post processed and given offers.
- Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
- Designed application which receives data from several source systems and ingest to PostgreSQL database
- Used Oozie as workflow engine and Falcon for Job scheduling. Debugged the technical issues and errors was resolved.
Environment: Hadoop, Talend 6.0 studio, Linux, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Flume, Zookeeper, java, SQL, Scripting, Scala, SBT, PostgreSQL, Spark, Kafka.
Confidential, Plano, TX
Big Data/Hadoop Developer
Responsibilities:
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Designed application which receives data from several source systems and ingest to PostgreSQL database.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in loading data from Linux file system to HDFS.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
- Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN.
- Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Analyze, validate and document the changed records for IBM web application.
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
- Responsible for developing map reduce program using text analytics and pattern matching algorithms
- Setup and benchmarked Hadoop/HBase clusters for internal use. Assist the development team to install single node Hadoop 224 in local machine.
- Participated in architectural and design decisions with respective teams. Developed in-memory data grid solution across conventional and cloud environments using Oracle Coherence.
- Used Pig as to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
- These new data items will be used for further analytics/reporting purpose. It has Cognos reports as the BI component.
- Designed database and created tables, written the complex SQL Queries and stored procedures as per the requirements.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, PostGre-SQL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, IBM WebSphere, Tomcat and Tableau.
Confidential, Peoria, IL
Java developer
Responsibilities:
- Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
- Interacting with the system analysts & business users for design & requirement clarification.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Responsible for the API (SOAP, REST) development and onboarding of the APIs to build the next generation IVR product.
- Very good Knowledge on Web Services SOAP and REST layers services.
- Developed JSPs according to requirement.Wrote AngularJS controllers, views, and services.
- Developed integration services using SOA, Web Services, SOAP, and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.Developed HTML reports for various modules as per the requirement.
- Used spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated spring with JSF.
- Extensively used Struts framework for MVC, used Struts framework in UI designing and validations.
- Used Struts tiles libraries for layout of web page, and performed struts validations using Struts validation framework.
- Created components to extract application messages stored in xml files. Used Ant for building and the application is deployed on JBOSS application server.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
Environment: Java, JDBC, spring, JSP, JBOSS, Servlets, Web Services, Maven, Jenkins, Flex, HTML, AngularJS, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.
Confidential
Java Project
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- Agile Scrum Methodology been followed for the development process.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Experience in writing PL/SQL stored procedures, Function, Triggers, Oracle reports and Complex SQL’s.
- Worked with JavaScript to perform client side form validations. Gave an innovative for logging for all interdepends application.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Client side validation done using JavaScript.
- Used Data Access Object to make application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Proficient in developing applications having exposure to Java, JSP, UML, Oracle (SQL, PL/SQL), HTML, Junit, JavaScript, Servlets, Swing DB2, CSS.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Successfully delivered all product deliverables that resulted with zero defects.
Environment: Spring MVC, Oracle (SQL, PL/SQL), J2EE, Java, struts, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008