Big Data Engineer Resume
Bronx, NY
SUMMARY:
- Around 9+ years of experience in Information technology and in fields of Developing and Testing in Java/J2EE technology, expertise in development of Hadoop/BigData, web based technologies with different back end databases.
- Good Knowledge and exposure inBigDataprocessing usingHadoopEcosystem including Pig, Hive, HDFS, Map Reduce (MRV1 and YARN), Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark, Impala.
- Experience in Cloudera, HortonWorks, MapR and Amazon Web Services distributions ofHadoop.
- Experience in installing, configuring and using ecosystem components likeHadoopMapReduce, Sqoop, Pig, Hive, Hbase and HDBS Impala& Spark.
- Good Knowledge and exposure inHadooparchitecture and various components such as HDFS, Job Tracker, Name Node,DataNode, Task Tracker.
- Experience in working with java for writing custom UDFs to extend Hive and Pig core functionality.
- Good understanding of NoSQL databases and hands on experience with Apache HBase.
- Expertise in transferringdatabetween aHadoopecosystem and structureddatastorage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
- Extensive experience in Oracle database design, application development and in - depth knowledge of SQL and PL/SQL
- Expertise in various Java/J2EE technologies like JSP, Servlets, Hibernate, Struts, spring.
- Experience in using Oozie, ControlM and Autosys workflow engine for managing and scheduling Hadoop Jobs
- Experience in Software Development Life Cycles (SDLC) like Waterfall Model, and Agile methodologies which include Test Driven Development, SCRUM and Pair Programming.
- Good knowledge with web-based UI development using jQuery UI, jQuery, ExtJS, CSS3, HTML, HTML5, XHTML and JavaScript.
- Experience with unit testing, functional Testing, system Testing, Integration testing the applications using JUnit, Mockito, Jasmine and Cucumber, PowerMock & EasyMock.
- Experience in using IDEs like Eclipse, Visual Studio and experience in DBMS like Oracle and MYSQL.
- Experience in working Windows and Linux Based Operating systems like Windows 7/8, Ubuntu, CentOS and Fedora.
TECHNICAL SKILLS:
Hadoop/BigDataTechnologies: ApacheHadoop, HDFS and Map Reduce, Pig, Hive, Sqoop, Flume, Hue, HBase, YARN, Oozie, Zookeeper, MapR ConvergedDataPlatform, Apache Spark, Apache Kafka
Web Technologies: JavaScript, HTML, CSS, XML,AJAX,SOAP
MVC Frameworks: Spring, Hibernate, Struts
Languages: JAVA, PYTHON, C, C++, SQL, PL/SQL, Ruby, Bash and Perl
SQL/NOSQL Databases: Apache HBase, MongoDB, Cassandra, MS SQL Server, MYSQL
Application Server: Web Logic, Web Sphere, Apache Tomcat & JBoss
Testing Frameworks: JUnit, Mockito, PowerMock, EasyMock, Jasmine, Cucumber Version Control Git, Subversion, CVS, Clearcase
Documentation Tools: MS Office, iWorks, MS Project, MS SharePoint
Operating Systems: Windows, Mac OS
PROFESSIONAL EXPERIENCE
Confidential, Bronx, NY
Big Data Engineer
Responsibilities:
- Responsible to work with Business stakeholder and translate Business objectives, requirements into technical requirements and design
- Involved in loading and transforming large sets of structured, semi structured and unstructureddata from multiple source system to MacysHadoopDataLake.
- Developed a process for Sqoopingdatafrom multiple sources like SQL Server, Oracle and Teradata, DB2.
- Migrated all the SQL sources toHadoopData Lake while loading or moving thedatainto one or more Staging, Landing and Semantic logical Layers with the same schema as source.
- Developed the Scala wrappers to generate HiveQL scripts to load or move thedatabetween different logical layers of HiveHadoopData Lake.
- Involved in creating Hive tables, loadingdataand writing hive queries as per business requirements.
- Performeddatatransformation in Hive, Spark SQL.
- Implemented partitioning and bucketing ofdatain Hive for improving the performance
- Developed and Supported Map Reduce programs those are running on the cluster.
- Involved in writing both DML and DDL operations in NoSQL database Cassandra
- Developed analytical components using Scala, Spark and Spark Streaming.
- Implemented Flume, Spark, and Spark Streaming framework for real timedataprocessing.
- Developed proto type forBigDataanalysis using Spark, RDD,DataFrames andHadoopeco system with .csv, Json, parquet and hdfs files.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of dataand exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Wrote programs in Scala using Spark and worked on migrating MapReduce programs into Spark using Scala
- Responsible for creation of Source to Target mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing and Semantic tables with the same schema like the source
- Developed HiveQL scripts for performing transformation logic and also loading thedatafrom staging zone to landing zone and Semantic zone.
- Responsible for Debug, Optimization of Hive Scripts
- Automated all the jobs for pullingdatafrom FTP server or SQL Sources to loaddatainto Hive tables using Control-M Jobs.
- Created HBase tables to store the final aggregateddatafromHadoopsystem.
- Generated reports for hive tables in different scenarios using Tableau.
Environment:HDP 2.2.4.2, Hive, Pig, Oozie, Sqoop, Flume, Spark, Spark SQL, Scala, HBase, Cassandra, SAP HANA, SAP BODS, Tableau
Confidential, Gardner, KS
Sr. Big Data Engineer
Responsibilities:
- Worked on analyzingHadoopcluster using differentbigdataanalytic tools including Kafka, Pig, Hive and Map Reduce.
- Configured Spark streaming to receive real timedatafrom the Kafka and store the streamdatato HDFS using Scale.
- Worked on implementing Spark using Scala and Sparksql for faster analyzing and processing of data.
- Handled in Importing and exportingdatainto HDFS and Hive using SQOOP and Kafka
- Involved in creating Hive tables, loading thedataand writing hive queries, which will run internally in map reduce.
- Worked on Designing and Developing ETL Workflows using Java for processingdatain HDFS/Hbase using Oozie.
- Worked on importing the unstructureddatainto the HDFS using Flume.
- Wrote complex Hive queries and UDFs.
- Exporteddatafrom HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries and Pig Scripts.
- Used Flume extensively in gathering and moving logdatafiles from Application Servers to a central location inHadoopDistributed File System (HDFS).
- Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move thedatafiles within and outside of HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL,DataIntegration and Migration.
- Worked with NoSQL databases like Hbase, Cassandra in creating tables to load large sets of semi structureddata.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
- Worked on loadingdatafrom UNIX file system to HDFS
- Analyzed Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Analyzed large amounts ofdatasets to determine optimal way to aggregate and report on it.
Environment: Hadoop, HDFS, MapReduce, Hive Sqoop, Hbase, Apache Spark, Oozie Scheduler, Java, UNIX Shell Scripts, Kafka, Git, Maven, PLSQL, Python, Scala, Cloudera
Confidential, Columbus, OH
Sr. Big Data/Hadoop Developer
Responsibilities:
- Worked in the BI team in the area ofBigDataHadoopcluster implementation anddataintegration in developing large-scale system software..
- Developed MapReduce programs to parse the rawdata, populate staging tables and store the refineddatain partitioned tables in the EDW.
- Worked extensively with Sqoop for importing and exporting thedatafrom HDFS to Relational Database systems/mainframe and vice-versa. Loadingdatainto HDFS.
- Captureddatafrom existing databases that provide SQL interfaces using Sqoop.
- Created Hive queries that helped market analysts spot emerging trends by comparing freshdatawith EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automatedataloading into theHadoopDistributed File System and PIG to pre-process thedata.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked on importing and exportingdatafrom Oracle and DB2 into HDFS and HIVE using Sqoop.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Responsible for architectingHadoopclusters with CDH4 on CentOS, managing with Cloudera Manager.
- ManagedHadoopjobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
- Used Flume to collect the logdatafrom different resources and transfer thedatatype to hive tables using different SerDe to store in JSON, XML and Sequence file formats.
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
- Supported in settling up QA environment and updating configuration for implementing scripts with Pig and Sqoop.
- Implemented testing scripts to support test driven development and continuous integration.
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6),Hadoopdistribution of HortonWorks, Cloudera, MapR, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting
Confidential, Arlington, VA
Sr. Java/J2EE Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams and activity diagrams
- Participated in requirement gathering and converting the requirements into technical specifications
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Developed the XML Schema and Web services for the data maintenance and structures
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Maintained third party software, and database(s) with updates/upgrades, performance tuning and monitoring
- Developed multiple MapReduce jobs injavafor data cleaning and preprocessing.
- Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
- Responsible to manage data coming from different sources.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Wrote test cases in Junit for unit testing of classes
- Involved in templates and screens in HTML and JavaScript
- Involved in integrating Web Services using WSDL and UDDI
- Built and deployedJavaapplications into multiple Unix based environments and produced both unit and functional test results along with release notes
Environment: JDK 1.5, J2EE 1.4, Struts 1.3, Kafka, Storm JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, Perl, Python, JavaScript, Junit 3.8
Confidential
Sr. Java/J2EE Developer
Responsibilities:
- Designed and developed the application using Agile methodology.
- Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.
- Wrote technical design document with class, sequence, and activity diagrams in each use case.
- Created Wiki pages using Confluence Documentation.
- Developed various reusable helper and utility classes which were used across all modules of the application.
- Involved in developing XML compilers using XQuery.
- Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
- Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
- Used Hibernate for persistence framework, Involved in creating DAO's and used Hibernate for ORM mapping.
- WrittenJavaclasses to test UI and Web services through JUnit.
- Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML
- Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
- Used Soap UI for testing the Web services.
- Use of MAVEN for dependency management and structure of the project
- Create the deployment document on various environments such as Test, QC, and UAT.
- Involved in system wide enhancements supporting the entire system and fixing reported bugs.
- Explored Spring MVC, SpringIOC, Spring AOP and Hibernate in creating the POC.
Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services,JNDI, JMS, HTML, XML, XSD, XML Schema.