Sr. Big Data/hadoop Developer Resume
Jacksonville, FL
SUMMARY:
- 7+ years of extensive product development experience in Data Analytics, Data Modeling and Software development using Java, Big Data (Hadoop, Apache Spark).
- Real - time experience with Hadoop ecosystem major components like Map Reduce, HDFS, YARN, Sqoop, Hive, Pig, HBase, Spark
- Experience as Hadoop and Java Developer
- Experience on NoSQL databases that includes HBase
- Extensive knowledge on how to create and monitor Hadoopcluster on VM, Hortonworkssandbox, Cloudera on Linux.
- Loaded the data in Spark and performed in-memory data computation to generate theoutput response
- Experience on SparkSQL to load tables into HDFS to run select queries on top
- Experience in using Pig script to extract the data from data files to load into HDFS
- Good knowledge in how to import and export data between RDBMS and HDFS using Sqoop
- Good knowledge in using job scheduling and monitoring tools like Zookeeper
- Knowledge in designing and developing Mobile Application using Java Technologies like JDBCand IDE tools like Eclipse
TECHNICAL SKILLS:
Technologies: Core Java, Python, Apache Spark, Hadoop, Hive, Impala, Sqoop, Kafka, Spark Streaming, AWS S3 storage, NoSQL-Cassandra and Hbase, RDBMS, Rest Api and ShellScript.
Environment: Linux (Ubuntu), Cloudera distribution, Eclipse.
Process/Methodologies: Agile, Scrum, Git, Jira, Confluence and BitBucket.
PROFESSIONAL EXPERIENCE:
Sr. Big Data/Hadoop Developer
Confidential, Jacksonville, FL
Responsibilities:
- Datai sextractedfromtheDatabasesandmigratedtoHDFSusingSparkProcess.
- Designed architectureofnewmodulestohandlethedifferenttypeofdata,relationbetweendataanddatadependencies.
- Enabled compressionatvariousphaseslikeonintermediatedata,finaloutput,toachievetheperformanceimprovementinHiveQueries
- Used ORC(OptimizedRowColumnar)fileformattoimprovetheperformanceofHiveQueries.
- Extracted thedatafromRDBMSintoHDFSusingtheSqoop
- Analysis andintegrationofrawdatafromvarioussources.
- Extraction of thedataforthetransformation,calculationandaggregation.
- Collect allthelogsfromsourcesystemsintoHDFSusingKafkaandperformanalyticsonit.
- Loaded all data-setsintoHivefromSourceCSVfilesusingSparkusingSpark-Javajobs.
- Migrated the computational code in hql toSparkSQL.
- Completed dataextraction,aggregationandanalysisinHDFSbyusingCoreSparkandSparkSQLandstorethedataneededtohive.
- Writing SparkjobsinJavatoperformtransformationandaggregationontheprepareddata.
- Writing Junittestcaseforcodevalidation.UsedMockitoandLombokforbettermentinJunit.
- Refactoring/improving codequality.
Environment: Core Java, Junit, Mockito, Lombok, Apache Spark, SparkSql, Hive, sqoop, Hbase andRDBMS.
Big DataDeveloper
Confidential, Dorchester, MA
Responsibilities:
- Database ArchitecturedesignfordatatomigrationfromMySQLtoCassandradatabase.
- Used sparkstreamingtoprocessdatafromMySQLtoCassandrausingsparkjobswritteninJava.
- Designe darchitectureofnewmodulestohandlethedifferenttypeofdata,relationbetweendataanddatadependencies.
- Enabled compressionatvariousphaseslikeonintermediatedata,finaloutput,toachievetheperformanceimprovementinHiveQueries
- Used ORC(OptimizedRowColumnar)fileformattoimprovetheperformanceofHiveQueries.
- Created reports on agreegateddata.
Environment: CoreJava,ApacheSpark,Kafka,SparkStreaming,CassandraandMySQL.
Sr. Hadoop Developer
Confidential
Responsibilities:
- Involvedwithingestingdatareceivedfromvariousrelationaldatabaseproviders,onHDFSforanalysisandotherbigdataoperations.
- Created Hive tables to import large data sets from various relational databases using Sqoopand export the analysed data back for visualization and report generation by the BIteam.
- Used default MapReduceInput and OutputFormats.
- SupportedinsettingupQAenvironmentandupdatingconfigurationsforimplementingscriptswithHiveandSqoop
- DevelopedandConfiguredKafkabrokerstopipelineserverlogsdataintosparkstreaming.
- Loadingandtransformingoflargesetsofstructuredandsemistructureddata
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside ofHDFS
- ManagingandReviewingHadoopLogFiles,deployandMaintainingHadoopCluster
- R&D on “Share state of Apache Spark RDD between different Spark Application/Jobs using Apache Ignite” and prepared POC on the same.
Environment: CoreJava,ApacheSpark,Hadoop,HDFS,MapReduce,Kafka,Sqoop,Hive,ZookeeperandLinux(Ubuntu).
Jr. Hadoop Developer
Confidential
Responsibilities:
- Database ArchitectureDesign.
- Involved in gathering and analyzing userrequirements.
- ResponsibleforInstallationandconfigurationofHive, Sqoop,and ShellScriptontheHadoopcluster.
- Developedsqoopscriptstoimportexportdatafromrelationalsourcesandhandledincrementalloadingonthecustomer,transactiondata bydate.
- DevelopedsimpleandcomplexprogramsinJavaforDataAnalysisondifferentdataformats.
- InvolvedinmovingalllogfilesgeneratedfromvarioussourcestoHDFSforfurtherprocessingthroughJavaCronjobs.
- ImportthedatafromdifferentsourceslikeHDFSintoSparkRDD.
- ResponsibleforanalyzingandcleansingrawdatabyperformingHivequeriesandrunningsqoopscriptsondata.
- Installing, Upgrading and Managing HadoopClusters.
Environment: Hadoop,MapReduce,HDFS,Hive,MySQL,Sqoop,Shellscripting,CronJobs,ApacheKafka,Core-Java.
Java-Hadoop Developer
Confidential
Responsibilities:
- DatabaseArchitectureDesign.Involvedingatheringandanalyzinguserrequirements.
- DevelopeddatabasearchitecturedesignfordatastoreonImpala.
- PerformedDatapreparationandtransformationondatafromkafkastoredonAWSs3bucket.
- PerformedcalculationandaggregationonprepareddatainImpalaforreportingpurpose.
- Created Api’s to retrieve aggregateddata.
Environment: Core-Java,Hadoop,Kafka,Zookeeper,HDFS,Hive,MySQL,ApacheKafka,AWSS3storageandRestAPI.