Senior Data Engineer Resume
Sunnyvale, CA
SUMMARY
- Result - oriented software developer with more than 8+years of experience, passionate about learning new technologies with an aim to solve real world problems. An avid learner, good at taking initiatives.
- Excellent understanding of Hadoop Architecture and its components like HDFS, Name Node, Data Node, Job Tracker, Task Tracker and Map Reduce.
- Hand-on experience on Hadoop Ecosystem including Hive, Sqoop, HBase, Flume, Oozie.
- Hand-on experience in using Spark Streaming, batch processing for processing the Streaming data and batch data.
- Used Spark SQL,HQL queries for analyzing the data in HDFS.
- Experience in Hadoop Distributions: Cloudera.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems. Using Flume designed the flow and configured the individual components.
- Experience in maintaining distributed Storage HDFS and Column Storage HBase.
- Worked with relational databases like Oracle and NoSQL databases like HBase.
- Experience in analyzing data using Pig scripting and Hive Queries.
- Experience in using optimization techiniques in Hive,MapReduce jobs.
- Experience in job flow scheduling systems like Oozie.
- Good knowledge on AWSservices (EMR, EC2, S3).
- Experience incorejavatechnologies includingMulti-Threading, Collections.
- Experience in developing applications using Java/J2EE technologies. Experience in Object Relational Mapping (ORM) persistence technologies like Hibernate.
- Experience in working with web servers and applications servers Apache Tomcat, WebLogic, WebSphere.
- Experience in working on automation of manual test cases using JUnit framework and helped the team in enhancing their JUnit infrastructure.
- Good knowledge inOLAPand OLTP process.
- Good knowledge on Reporting tools like Webi,Crystal Reports and Tableau.
- Experienced in source control repositories like GIT,SVN.
- Worked in agile software development teams.
- Excellent communication skills, interpersonal skills, problem solving skills, a very good team player along with extremely strong positive attitude.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce,HDFS,Hive,Pig,Sqoop,Oozie, Flume,Spark
Hadoop Distribution: Cloudera (CDH 5.4)
Operating System: Windows, UNIX/Linux
Languages: Java, SQL, UNIX/Linux Shell Scripts, Python, Scala
Java Technologies: JDBC3.0, Servlets, JSP
Web /App Servers: Tomcat5.0
ORM: Hibernate
Markup Languages: HTML, CSS
IDE: Intellij, ECLIPSE4.2
Databases: Teradata 14.0, Oracle 8.0.
Reporting Tools: BI Intelligence(Webi,Crystal Reports),Tableau
SCM Tools: Git
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Sunnyvale, CA
Senior Data Engineer
Responsibilities:
- Worked on the data pipeline ingesting petabytes of data from Server into Hadoop using Flume and Kafka, extracts meaningful information from around these logs using Spark. Avro files ingested by flume are processed by Spark Streaming, meaningful information is extracted and stored as parquet format. Later Hive and Impala tables are created on this data so that analytics can be performed on this data.
- Performed enhancement on the existing job depending on the requirement using Spark and scala.
- Created streaming jobs and batch jobs using spark to process the incoming logs from flume.
- Created the compaction job to perform the compactions of Avro files and parquet files.
- Performing the orchestration of the entire data pipeline using the oozie and maintained the job resolving issues like stuck jobs, rerunning the job etc.
- Created grok files and used grok tool to parse the json data in Avro files.
- Worked and resolved issues corresponding to cdh6.2 upgrade.Worked on upgrading the code from Spark 1.6 version to Spark 2.3 version as a part of cdh upgrade.
- Designed the process of copying the data from one cluster to another cluster, configured the job according to the other cluster to process the data.
- Designed new jobs to perform ETL operation on new log and launched them in production.
- Written complex hive queries to perform the data analytics.
- Troubleshooting the issue of Hadoop Jobs and Spark jobs like outofmemory,rpc protocol, malformedjson etc using Resource manager, Hue and command line arguments.
- Written shell scripts to simulate the streaming job using data in the batch and test the streaming jobs.
- Designed the customized testing framework to test the existing spark job using yaml files,scala testing framework.
- Designed the data pipeline to perform the ETL operation of the data in Hive server to Cassandra using NoteBook,Python and Java WebServices.
- Created webservices and written python code which connect publishIt() on Notebook on Jupitor to python script which in turn executes the webservice and process the data.
- Using UI on Jupitor we have written a query and CRON expression which connects the Cassandra database and stores this data in Column Family LookupQuery corresponding to KeySpace event using java script.
- Enhanced EdwProxy service using Java which capture the query based on CRON expression from the Casandra Column Family LookupQuery and run the query using Hive JDBC and store the data in Cassandra Column Family also researched some limitation of Hive Jdbc compared to other database jdbc’s.
- Created the process of remote debugging using the Intellij where it connects to process running on Server enabling the debug mode along with tunneling from your local machine.
- Deploying the processes on the dev server using Jenkins from git and testing.
- Tested the code on virtual cluster created using Dockers using dtest.
Environment: Hadoop, MapReduce, Spark 1.6,Spark 2.3,Yarn, Hive, Hue,Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java,cdh5.9,cdh6.2, Intellij,Agile,Cassandra,Jupitor,Python 2.7,Jenkins,GIT.
Confidential, Foster City,CA
Senior Hadoop Developer
Responsibilities:
- Designed data pipeline ingesting data from Oracle to Hadoop and send the data to mainframe from Hadoop using Connect direct. Mainframe process the data and updates the records and send back data to Hadoop. Upon processing data using Hive, Java map reduce using sqoop this data is updated into the Staging table in the Oracle database.
- Involved in creating proper infrastructure for the project from Hadoop Environment and designed the process of deployment in production.
- Using hive queries required data is filtered, a file is generated with this data and sent to external source as request using Connect direct to Mainframe server.
- Developed shell scripts and java code for performing the cryptographic operations on the data by converting the data into JSON object and send the request to RESTful web services for the data security.
- Worked on Hive tables are created in AVRO format which is a flexible format to communicate with web service for encryption and decryption.
- Developed complex hive queries for Aggregation of data and perform validation on the data. The hive queries in the shell script to capture the faulty records and store them in log files.
- Sent files from Hadoop to other external sources using SFTP protocol.
- Responsible for writing optimized sqoop queries to export the data from Hadoop to Oracle.Created optimized sqoop job to migrate these records from Hadoop to Oracle db.
- Analyzed Hadoop job using Map Reduce logs in HUE.
- Maintained the scripts in git repository. The scripts are deployed in QA environment using Jenkins. Artifacts are created in artifactory using Jenkins. Later these Artifacts are deployed on Prod server resulting in .bsx file and unzipped to install the code.
- Used Workload CA Automation tool (DSERIES) to orchestrate the jobs
- Agile methodology is followed and updates are given in Agile portal and Scrum meetings everyday
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java Cloudera HDFS, Eclipse,Agile.
Confidential, Richmond, VA
Senior Hadoop Developer
Responsibilities:
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode
- Created Linux and Python Scripts to automate the daily ingestion of raw data
- Processed the raw data using Hive jobs and scheduling them in Crontab.
- Developed HIVE UDFs to get the MDK and GeoIp values
- Moved data to appropriate partition based on record-level timestamp(as we have more than one day’s worth of data in log files)
- Compressed transformed/enriched data files with bzip2Codec.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
- Implemented two different process for Internal and External Weblogs
- Manage and review Hadoop log files.
- Experience in importing the real time data toHadoopusing Flume.
- Involved inETL, Data Integration and Migration.
- Support/Troubleshoot hive programs running on the cluster
- Involved in fixing issues arising out of duration testing
- Handling structured, semi structured and unstructured data
- Automated the History and Purge Process.
Environment: Hadoop 2.x, Hive 0.13.1, Python, Unix Scripts, HDP 2.3, Redhat Linux
Confidential, Maplewood, MN
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in writing MapReduce jobs.
- Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
- Worked on different file formats such as Text file, Avro data files, Sequence files and using compressions codecs such as snappy, gzip etc.
- Used Pig to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Experience in managing and reviewingHadooplog files
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
- Involved in developing Hive DDLs to create, alter and drop Hive TABLES.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Computed various metrics using Java MapReduce and concepts of Core Java such as collections to calculate metrics that define user experience, revenue etc.
- Used Pig asETLtool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Used Eclipse and ANT to build the application.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data fromHadoopto relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, MapReduce) and move the data files within and outside of HDFS.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume,Core Java Cloudera HDFS, Eclipse.
Confidential, Kent,WA
Java/J2EE Developer
Responsibilities:
- Involved in complete life cycle of software development including designing, developing, testing, and deployment of application.
- Designed and developed picking/put away screens using Eclipse, HTML, JSP, Servlets and Java Script.
- Developed the User Interface using CSS/HTML JSTL and Jquery.
- Implemented web interface using Spring MVC
- Developed server side components using Spring Core, Spring Authentication Manage, Spring CRUD Repository
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence to the Oracle 10g Database.
- Created Java Servlets using JSP and other classes deployed as EAR file, connecting to Oracle database using JDBCand Hibernate
- Involved in software development using Agile methods.
- Involved in reviews of Test Scenarios to ensure the requirement coverage
- Developed a tool to retrieve and send data to third party service provider. This feature was implemented using REST.
- Used JSON as response type in REST services.
- Developed a tool to retrieve and send data to third party service provider. This feature was implemented using REST.
- Used JSON as response type in REST services.
- Performed unit testing and performance testing using JUnit and Mockito.
- Involved In Performance and SQL Query Optimization.
- Developed REST based services using Spring MVC architecture.
Confidential, Kansas City, MO
Java/J2EE Developer
Responsibilities:
- Responsible for the development of new requirements for the presentation layer developed according to FSA style Guide andUSDAstyle guide.
- Preparing technical specifications, work assignments, coding and unit testing.
- Responsible for the development of Assessment Calculation reports module.
- Junit used for unit testing of the application.
- Maven build is used to build the application on Eclipse 3.2 IDE.
- Responsible for implementing reports module by using Springs MVC FrameWork.
- Responsible for reviewing and approving the projects related documents like Design document, Database design documents and so on.
- Responsible for designing the process configuration and event representation.
- Developing WebServices, SOAP, WSDL files and event processing.
- Responsible for the requirement gathering from the customer and finalizing the features.
- Responsible for updating project status to the senior project manager in a regular interval.
Environment: JDK 6.0, EJB, JSP, WMI, Portlets, JetSpeed Portal Server2.1.3, Glassfish V2, Net beans IDE 6.0, JBoss, Dojo, Web Services, VB Script, Apache POI, Hibernate 3.0, JQuery, Ajax, JMX, LDAP,JMS,XML, JSTL, Struts2.
Confidential, Jacksonville,FL
Java /J2EE Developer
Responsibilities:
- Involved in Analysis, design, coding and testing.
- Providing analysis, architecture, design, code, review and perform integration/unit testing.
- Designed service oriented middle tier libraries / components to communicate with other applications, database.
- Generated User interface Templates using JSP, HTML, CSS.
- Developed the Java Code using JBuilder as IDE.
- Presentation tier of the application was built completely on Struts framework.
- Developed many action classes and action forms.
- Heavily used Struts Tag Libraries, Exception handling, Validators in struts development.
- Developed client side validations using Javascript.
- Extensively involved in requirement analysis, design analysis, bug fixes and documentation.
- Agile methodology of software using practices of short iteration sprints and scrum.
- Implemented Hibernate framework for communicating with database.
- Involved in writing and executing complex HQL queries.
- Used Rest web services to integrate with external modules.
- Responsible for Query Optimization of Library system and accounting system.
- Responsible for Migrating MySQL 4.0 to MySQL 5.0
- Performed unit testing using JUnit. Advanced Rest client was used to test Rest services.
- Log 4j is used for logging.
- Used Maven for the building the application.
Environment: Java 1.4, JSP, Log4j, JBuilder, JUnit, Maven, Hibernate, MySQL 4.x, HQL, HTML, CSS, Ajax, XML, Struts framework, Text Pad, Windows 2000/NT.
Confidential
Software Developer
Responsibilities:
- Developed and refactored web pages using HTML, JSP, Javascript and CSS.
- Implemented complex business logic in core java.
- Worked on Eclipse 3.2 IDE as Application Development Environment.
- Implemented the Struts Model View Control (MVC) structure.
- Configured Struts-Config.xml.
- Designing of database components using SQL and PL/SQL.
- Performed unit and functional testing on code changes.
- Creation and maintenance of data using SQL Server database.
- Used JDBC for database connectivity. Designed and implemented complex SQL queries.
- Used JavaScript for client side validations.
- Optimized the performance of the application using various techniques in JSP, EJBs such as caching static and dynamic data, flushing data partly, choosing right include mechanism, etc.
- Ant was used for building the application.
- Analysis & study of the new enhancements and guiding the team on the requirements.
Environment: Struts 1.1, Eclipse 3.2, WebSphere 4.0, JSP, HTML, CSS, JDBC, Ant 1.5 and SQL Server 2000.