Sr. Big Data Engineer Resume
TexaS
SUMMARY
- Over 8 years of professional IT experience which includes 4 plus years of experience in Big Data ecosystem related technologies like Hadoop, Pig, Hive, Sqoop, HBase and designing and implementing Map/Reduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Big Data eco system (Job Tracker, Task Tracker, Name Node, Data Node) and Map Reduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Falcon, Pig, Storm, Kafka, Zookeeper, Yarn and Lucene.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Good working experience in multiple programming languages like Java, Python and Scala.
- Involved in generating automated scripts (YAML files) of Falcon and Oozie using ruby.
- Good knowledge and understanding on using BI reporting tools like Tableau.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experienced in SQA (Software Quality Assurance) including Manual and Automated testing with tools such as Selenium RC/IDE/WebDriver/Grid and Junit, Load Runner, Jprofile, RFT (Rational Functional Tester).
- Proficient in deploying applications on J2EE Application servers like Web - Sphere, Web-logic, Glassfish, Tuxedo, JBoss and Apache Tomcat web server.
- Expertise in developing applications using J2EE Architectures / frameworks like Struts, Spring Framework and SDP (Qwest Communications) Framework.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Experience in NoSQL data stores (Hbase, Accumulo and Mongo DB)
- Implemented POC’s using Amazon Cloud Components (S3, EC2, Elastic beanstalk and SimpleDB).
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g..
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Ability to meet deadlines and handle multiple tasks, flexible in work schedules and possess good communication skills
TECHNICAL SKILLS
JAVA Technologies: Java, JDK 1.2, JDK 1.3, JDK1.4, JDK1.5, JDK1.6.
J2EE Technologies: JSP, Java Bean, Servlets, JDBC, JPA1.0, EJB3.0, JDBC, JNDI, JOLT, Amazon Cloud (S3, EC2, Elastic Beanstalk and RDS).
Languages: C, C++, PL/SQL, Python and Java.
Frame Works: Hadoop (HDFS, Map Reduce, Pig, Hive, HBase, Mahout, Falcon, Oozie, Accumulo, Zookeeper, YARN, Lucene) Struts 1.x and Spring 3.x
Web Technologies: XHTML, JavaScript, AngularJS, AJAX, HTML, XML, XSLT, XPATH, CSS, DOM, WSDL, GWT, JQuery, Perl, VB Script.
Application Servers: WebLogic8.1/9.1/10.x, Web-Sphere5.x/6.x/7.x, Tuxedo server 7.x/9.x, Glass Fish Server 2.x, JBoss4.x/5.x.
Web Servers: Apache Tomcat 4.0/ 5.5, Java Web Server 2.0.
Operating Systems: Windows-XP/2000/NT, UNIX, Linux, and DOS
Databases: SQL, PL/SQL, Oracle 9i/10g, MYSQL, Microsoft Access, SQLServerNo SQL (HBASE, MongoDB).
IDE: Eclipse3.x, My Eclipse 8.x, RAD 7.x and JDeveloper 10.x.
Web Technologies: XHTML, JavaScript, XML, CSS, DOM, WSDL, SOA, Web Services.
Platforms: Windows XP/NT/9x/2000, MS-DOS, UNIX /LINUX/Solaris/AIX
Distribution: Cloudera/Hortonworks
Version Control: Win CVS, VSS, PVCS, Subversion, GIT
PROFESSIONAL EXPERIENCE
Confidential, Texas
Sr. Big Data Engineer
Responsibilities:
- Extensive experience programming MapReduce jobs to process complex data formats (structured, semi-structured and unstructured data)
- Experience loading external data to Hadoop environments using tools like Sqoop and Flume
- Scripting experience in Pig, which involves cleansing and transformation of data
- Experience working with very large data sets, knows how to build programs that leverage the parallel capabilities of Hadoop with loading data to Hive
- Used hive optimization techniques during joins and best practices in writing hive scripts.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Hands on experience in writing custom UDF and serde functions in hive
- Implemented Batch Analytics leveraging Spark data frames and Spark SQL.
- Designed and implemented real time analytics platform using Spark Streaming and Structured Streaming to ingest billions of events and process them Confidential scale and store the aggregated results in NoSQL stores.
- Developed custom Spark UDFs using PySpark.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Extensive Experience in creating and managing EMR Clusters and EC2 instances.
- Experience in processing the data through S3 buckets, managing IAM policies of S3.
- Hands on experience on managing SNS, SQS alerts and triggering them programmatically
- Experience in installing and configuring ELK stack to do analytics on the application log data
- Involved in ingesting the log data using Logstash from application servers to Elasticsearch
- Using Kibana as data visualization tool, created Dashboards to understand the data stored in elastic search
- Interacted with data scientists and industry experts to understand how data needs to be converted, loaded and presented.
- Worked with both technical and business-oriented teams as a part of daily activity
Environment: Big Data, RedHat Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, SQL server, Flume, Zookeeper, Oozie, AWS, S3, DB2, Spark, Kafka, GIT and ELK stack (Elasticsearch, Logstash and Kibana).
Confidential, Phoenix, AZ
Sr. Hadoop Developer
Responsibilities:
- Designed, developed and supported a Hadoop based data processing pipeline to process the customer eligibility data.
- Responsible for data accuracy, scalability and integrity on Hadoop platform.
- Worked with management to align solutions with business strategy and objectives.
- Written Map Reduce jobs to process the data and stored in to HDFS.
- Used Pig as ETL tool to do transformations, event joins, filter boot traffic and some pre aggregations before storing data the data onto HDFS.
- Experience in using Sqoop to connect to the Sql Server or Oracle database and move the pivoted data to HDFS and created Hive tables for data analysis.
- Using HBase as a No SQL database, stored the eligibility data which will provide real time access to data using REST API calls.
- Worked in Agile environment (Scrum), which uses Rally to maintain the story points.
- Developed junit test cases for testing the application.
- Developed Oozie workflows to automate the applications jobs.
- Used APIGEE authentication to secure API's.
- Using Jenkins as continuous integration (CI) tool and Maven as build tool, developed automation deployments.
- Provided accessibility of processed output to Datameer for visualization.
Environment: Hadoop, HDFS, Sqoop, Pig, Hive, HBase, Map Reduce, Maven, Jenkins, Oozie, Tomcat, Datameer and Jmockit.
Confidential, Minneapolis, MN
Sr. Hadoop Developer
Responsibilities:
- Configured different topologies for the PriceIndexing Storm cluster and deployed them on regular basis.
- Consuming the data form Kafka queue using storm and applied rules engine to determine drive sales or drive margin.
- Implemented Sqoop to connect to the DB2 and move the data to HDFS and created Hive tables.
- Developed job processing scripts using Oozie workflow.
- Involved in designing and implementation of Hadoop.
- Developed integration code for accessing Oracle and DB2 databases.
- Involving in day-to-day standups and working closely with clients and BA’s.
- Developed unit test cases using Jmockit framework and automated the scripts.
- Worked in Agile environment, which uses Version one to maintain the story points.
- Experience in setting up a cluster environment on OpenStack servers for storm, kafka and zookeeper.
- Maintained different cluster security settings and involved in creation and termination of multiple cluster environments.
- Install and run Mongodb with multiple instances on servers.
- Configure server side settings for Mongodb database servers.
- Create high availability and load balancing mongodb cluster.
- Implement and manage High Availability (Replication) and Load balancing (sharding) cluster of Mongodb having TB’s of databases.
- Optimizing MongoDB CRUD Operations.
- Monitor deployments for capacity and performance.
- Implement MMS monitoring and backup (mongoDB Management Services) on cloud and on local servers (on-premise).
- Migrate mongodbsharded/replica cluster for one datacenter to another without downtime.
- Manage and Monitor large production Mongodb sharded cluster environments having terabytes of the data.
- Implemented ELK (ElasticSearch, Logstash, Kibana) stack to collect and analyze audit logs produced by the Price Indexing Storm cluster, which is LogStash for collecting the logs, ElasticSearch for data search and Kibana as a data visualization tool.
Environment: Big data, HDFS, Hive, Sqoop, Oozie, Storm, Kafka, MongoDB, Git, VersionOne, Maven, J2EE, Jmockit, Jenkins, Logstash, Elasticsearch and Kibana.
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Launching and Setup of HADOOP related tools on AWS, which includes configuring different components of HADOOP.
- Experience in Using Sqoop to connect to the Sql Server or Oracle database and move the pivoted data to Hive tables and stored in Avro files.
- Managed the Hive database, which involves ingest and index of data.
- Expertise in exporting the data from Avro files and indexing the documents in sequence or serde file format.
- Hands on experience in writing custom UDF’s and also custom input and output formats.
- Scheduling the Hive jobs using Oozie and falcon process files.
- Developed Map Reduce jobs to store the data in to HBase tables.
- Involved in design and architecture of custom Lucene storage handler.
- Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
- Understanding of Ruby scripts used to generated yaml files.
- Maintained the test mini cluster using vagrant and VMware fusion.
- Experience in working with BI reporting tools to create dashboards.
- Involved in GUI development using JavaScript and AngularJS and Guice.
- Developed Unit test case using Jmockit framework and automated the scripts.
- Worked in Agile environment, which uses Jira to maintain the story points and Kanban model.
- Involved in implementing Kerberos secured environment for Hadoop cluster.
- Hands on experience on maintaining the builds in Bamboo and resolved the build failures in Bamboo.
Environment: Hadoop, Big data, Hive, Hbase, Sqoop, Accumulo, Oozie, Falcon, HDFS, Map Reduce, Jira, Bit bucket, Maven, Bamboo, J2EE, Guice, AngularJS, Jmockit, Lucene, Storm, Ruby, Unix, Sql, AWS (Amazon Web Services).
Confidential, Hoffman Estates, IL
Hadoop Developer
Responsibilities:
- Involved in design and development phases ofSoftware Development Life Cycle (SDLC)usingScrummethodology.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer profiles and purchase histories into HDFS for analysis.
- Developed job flows in Oozie to automate the workflow for extraction of data from warehouses
- Used Pig as ETL tool to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS
- Applied pattern matching algorithms to match customers spending habits with loyalty points using Hive and stored the output in HBase.
Environment: JDK1.6,RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie, Mahout,HBase
Confidential, Houston, TX
Java/J2EE Developer
Responsibilities:
- Used Hibernate ORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
- Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
- Implemented Dependency injection of spring framework.
- Developed reusable services using BPEL to transfer data.
- Created JUnit test cases, and Development of JUnit classes.
- Configured log4j to enable/disable logging in application.
- Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, JQuery and CSS.
- Implemented PL/SQL queries, Procedures to perform data base operations.
- Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
- Implemented Log4j for logging purpose in the application.
Environment: Java, Jest, SOA Suite 10g (BPEL), Struts, Spring, Hibernate, Web services (JAX-WS), JMS, EJB, Web logic 10.1 Server, JDeveloper, Sql Developer, HTML, LDAP, Maven, XML, CSS, JavaScript, JSON, SQL, PL/SQL, Oracle, JUnit, CVS and UNIX/Linux.
Confidential
SQL Server Developer
Responsibilities:
- Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views Using T-SQL in Development and Production environment for SQL Server.
- Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs.
- Involved in performance tuning and monitoring of both T-SQL and PL/SQL blocks.
- Wrote T-SQL procedures to generate DML scripts that modified database objects dynamically based on user inputs.
Environment: SQL Server 7.0, Oracle 8i, Windows NT, C++, HTML, T-SQL, PL/SQL, SQL Loader.