Sr. Bigdata/spark Developer Resume
Albany, NY
SUMMARY
- An extensive experience of 8+ years in all phases of Software Development Life Cycle on Java/J2EE which includes extensive experience of Bigdata and cloud - based applications spanning across technologies and business domains.
- Real-time experience with Hadoop ecosystem major components like Map Reduce, HDFS, YARN, Sqoop, Hive, Pig, Oozie, HBase, Spark
- Experience as Hadoop and Java Developer
- Extensive experience in Core and Advanced Java Programming like Object-oriented concepts, Collections, Generics, Exception Handling, JDBC, Servlets, JSP, Structs, Multi-threading
- Developed Java Applications using advanced Java concepts like Servlets, JSP, JSTL, JDBC, JavaScript, JQuery
- Experience on NoSQL databases that includes HBase
- Extensive knowledge on how to create and monitor Hadoopcluster on VM, Hortonworks sandbox, Cloudera on Linux, RedHatOS
- Experience in using Pig script to extract the data from data files to load into HDFS
- Good knowledge in how to import and export data between RDBMS and HDFS using Sqoop
- Worked with Oozie that enabled to run workflow jobs with actions that run Hive and Pig jobs
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper
- Knowledge in designing and developing Mobile Application using Java Technologies like JDBC and IDE tools like Eclipse
- Extensive experience in SQL in query processing, execution, optimization, performance
- Experience in creating tables, views, triggers, stored procedures, packages, functions, indexes in Oracle databases
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Experienced in working with spark eco system using SparkSQL and Scala on different formats like Text file, Avro, Parquet files.
- In depth knowledge in developing User Interface applications using HTML, CSS, JavaScript
- Extensive experience in both Hadoop-1 architecture i.e., Master-Slave Architecture and Hadoop-2 Architecture i.e., YARN
- Extensive knowledge and experience on real time data streaming technologies like Kafka, Storm and SparkStreaming
- Worked on how to develop KafkaProducers and Consumers for streaming millions of events per second on streaming data
- Imported data from Kafka consumer into HBase using Spark Streaming
- Extensively involved in designing, reviewing, optimizing data transformation processes using Apache Storm
- Using SparkStreaming, divided streaming data into batches as an input to spark engine for batch processing
- Loaded the data in Spark and performed in-memory data computation to generate the output response
- Experience on SparkSQL to load tables into HDFS to run select queries on top
- In-Depth knowledge of Scala and Experience building Spark applications using Scala
- Experience include design, development and coding in JAVA, OraclePL/SQL, and C.
TECHNICAL SKILLS
Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, NiFi, Flume, HBase, Zookeeper, Hue, Cloudera, Hortonworks Sandbox, Spark, Scala, Kafka, Storm, Teradata, Cassandra
Programming Skills: C, C++, Core Java, Shell Scripting, PL/SQL, Scala, Python
Java/J2EE: J2EE, JSF, Servlets, Structs, Spring
Web Technologies: HTML, CSS, XML, JDBC, JSP, JSTL, Web Services
Operating System: Windows, Linux, Unix
Design: UML, E-R Modelling, Rational Rose
Tools: Eclipse, TOAD, Maven, GIT, Amazon AWS, Bit Bucket
Database: MySQL, SQLite, Oracle 11g/10g, Microsoft SQL Server 2016, RDBMS, HBase
Methodologies: Waterfall, Agile
PROFESSIONAL EXPERIENCE
Confidential, Albany, NY
Sr. BigData/Spark Developer
Responsibilities:
- Data is extracted from the Databases and migrated to Hadoop using SparkProcess
- Developed hash key rule which is used as a primary key among all the tables
- Enabled compression at various phases like on intermediate data, final output, to achieve the performance improvement in Hive Queries
- Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using Map-Reduce, Hive and analyse data using visualization/reporting tools
- Used ORC (Optimized Row Columnar) file format to improve the performance of Hive Queries
- Developed Spark programs using ScalaAPI’s to compare the performance of Spark with Hive and SQL.
- Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
- Developed and implemented a database structure and software modules SQL, MS SQL Server
- Extracted the data from Teradata into HDFS using the Sqoop
- Involved in requirement gathering and analysis phase in documenting the business requirements by conducting meetings with various business users
- Using Hive, Pig and Java queries developed UDF’s
- Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
- Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
- Migrated the computational code in hq l toPySpark.
- Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
- Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
- Supported Senior Engineer in installing and configuring the HadoopClusters using Cloudera version CDH4.
- Performed benchmarking of the No-SQL databaseHBase
- Involved in everyday Scrum meetings to discuss the progress of the project
- Active in making Scrum meetings more productive
- Developed Spark scripts by using Scalashell as per requirements
- Written programs in Spark to use on application for faster data processing than standard MapReduce programs
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
- Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming.
- Using Spark streaming imported data from Kafka consumer into HBase
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, Kafka,Pyspark, Hortonworks,Oozie, HBase, Python, Agile, MS SQL Server, HTML.
Confidential, Cary, NC
Big Data/Hadoop Developer
Responsibilities:
- Build reusable components which can be used throughout the project (Generic)
- Developed Spark applications using Scala for easy Hadoop transitions
- Developed reusableshell script for data transfer from source hive table to target based on
- Load and transform data into HDFS from large set of structured data/Oracle/Sql server using Talend Big data studio
- Developed reusable shell script to verify the data transfer from Hive to SQL server.
- Developed a reusable component for wrapper script which calls Column mapper file transfer and configuration file import in one script.
- Automated entire flow of ETL and Data analysis and Data export using reusable components
- Successfully loaded files to Hive and HDFS from Oracle and SQL Server using Sqoop.
- Worked on analysing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
- Worked with Oozie Workflow manager to schedule Hadoop jobs and highly intensive jobs
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tablesand loaded data into HIVE tables.
- Worked with Senior Engineer on configuring Kafka for streaming data.
- Developed Spark programs using Scala API’s to compare the performance of Spark with Hive and SQL.
- Creating UDF functions in Pig and Hive and applying partitioning and bucketing techniques in Hive for performance improvement
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python
- Involved in loading data from LINUX file system to HDFS. Written customized HiveUDFs in Python where the functionality is too complex
- Strong experience in analysing large amounts of data sets writingPySparkscripts and Hive queries
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
- Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop
- Migrated the mapping to the testing and production department and introduced the concepts of Informatica to the people in testing department
- Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
- Used Pig for analysis of large data sets and brought data back to HBase by Pig
- Implemented Kerberos security in all environments
- Responsible for analysing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data
- Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig,Oozie, Kafka
- Involved in everyday Stand-up meetings to discuss the progress of the project
- Active in making Scrum meetings more productive
Environment: Hadoop, MapReduce, Scala, Kafka,HDFS, Hive, Pig, Sqoop,Oozie, HBase, Python, Agile
Confidential, Boston, MA
Jr. Big Data/Hadoop Developer
Responsibilities:
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations
- Created Hive tables to import large data sets from various relational databases using Sqoop and export the analysed data back for visualization and report generation by the BI team
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources
- Used default MapReduce Input and Output Formats
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
- Analysed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion
- Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
- Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
- Loading and transforming of large sets of structured and semi structured data
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS
- Managing and Reviewing HadoopLog Files, deploy and MaintainingHadoopCluster
- Developed simple to complex MapReduce jobs using Java, and scripts using Hive and Pig
- Written multiple MapReduce programs in Java for data extraction, data transformation and data aggregation from different file formats including XML, JSON, CSV and other compressed file formats
- Worked on migrating the old java stack to Type safe stack using Scala for backend programming.
- Analysed user needs and planed information streams UML, Rational Roses
- Exported filtered data into HBase for fast query
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster
Environment: Apache Hadoop(Cloudera), HBase, Hive, Pig, Scala, Kafka, Spark, Map Reduce, Sqoop, Oozie, Eclipse, Java.
Confidential, Las Vegas, Nevada
Jr. Hadoop Developer
Responsibilities:
- Involved in gathering and analyzing user requirements.
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Build microservices for the delivery of software products across the enterprise.
- Develop strategy for integrating internal security model into new projects with Spring Security and Spring Boot.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Import the data from different sources like HDFS/Hbase intoSparkRDD.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Installing, Upgrading and Managing Hadoop Clusters
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Performed CRUD operations like Update, Insert and Delete data in MongoDB.
- Worked onMongoDBdatabase design and indexing techniques.
- Worked on creating End-End data pipeline orchestration using Oozie.
- Worked on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud (EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR) Amazon Simple DB, Amazon Cloud Watch, SNS, SQSand LAMBDA.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Participated in code reviews and weekly meetings.
Environment: Spring Boot, Microservices, AWS, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Perl scripting, Apache Kafka, J2EE.
Confidential
Java Developer
Responsibilities:
- Coordinating with Onsite Team to get business requirements.
- Providing estimation in terms of man power, number of days to develop the requirements.
- Responsible for designing and developing the application.
- Applying knowledge of programming techniques to develop the application.
- Followed ScrumMethodology - iterative planning, weekly sprints, incremental and test driven development, continuous integration and quality assurance.
- Created Java based Servlets for the web tier.
- Developed server-side application code using Java, JavaScript, HTML, CSS and XML.
- UI Development using JSPpages and JavaScript validations.
- Writing Java, JSP Code using customized JSP tags and Content Server tags for rendering the Assets.
- Sending status report and attending weekly team meeting to report status of the project.
- Provide maintenance and enhancement support for the deliverable's that are being done.
- Designed and Developed web interface for admin module using Struts MVCframework to Search, reset password, lock/unlock the user accounts.
- Extensively used Struts Validator for server-side validations and JavaScript for client-side validations.
- Developed Stored Procedures, Queries to extract the Customers data from the database.
- Used Subversion for Version Control Management.
- Extensively used the Struts tag libraries (Bean Tags, Logic Tags and HTML Tags etc.) and Custom tag libraries.
Environment: Java, J2EE, Oracle 9i, Struts, Tomcat, JavaScript, HTML, Servlets, CSS, XML, JSP, SCRUM Methodology, Pl/SQL Developer, Pentium III, IV, Windows XP, Win CVS.