Big Data Developer Resume
3.00/5 (Submit Your Rating)
San Diego, CA
SUMMARY:
- 8 years of experience in software design and development with 5+ years of experience in Hadoop, Data warehousing solutions, Big Data analytics and development.
- Sound knowledge and familiarity of Data Journey (Data Ingestion > Transformation > Discovery > Advanced Analytics).
- Experience in installation, configuration, supporting and managing Hadoop clusters.
- Extensive experience in developing complex MapReduce programs against structured and unstructured data.
- Experience in tuning and troubleshooting performance issues in Hadoop cluster.
- Extensive experience in working with structured data in HIVE to improve performance by various advanced techniques like Bucketing, Partitioning and optimizing Self - joins.
- Experience in using various tools like Sqoop, Flume, Kafka, Nifi, Pig to ingest structured, semi-structured and unstructured data into the cluster and creating complex workflows using Oozie.
- Experienced in working with Spark with various data structures like RDD’s, Datasets, and Dataframes and used both sparks’ built in Web UI and advanced Instrumentations like Ganglia to monitor and improve processing times of spark jobs by following Partitioning, broadcasting and check pointing practices.
- Experience in creating DStreams and Dataframe from streaming services like Flume, Kafka and performed real-time Spark transformations and actions on it.
- Good knowledge in scripting skills in Python, Linux and UNIX Shell
- Excellent understanding and knowledge of ETL tools like Informatica, Talend and BI tools like Tableau.
- Expertise in working with major NoSQL Database Solutions like Cassandra, Hbase, MongoDB.
- Hands-on experience in scripting skills in Python, Linux and Unix Shell.
- Good working knowledge on AWS’s stack for big data Analytics (S3, EMR, EC2, Kinesis, DynamoDB, Redshift,ElasticSearch)
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Done predictions and built analytic models using various Machine Learning algorithms using Spark ML.
- Implemented K-Means, Logistic Regression and SVM for classification for various business scenarios.
- Excellent communication and analytical skills and flexible to adapt to evolving technology.
- Ability to think independently, creatively solve problems, and to connect various thoughts, observations, and results into innovative solutions.
TECHNICAL SKILLS:
- Hadoop
- MapReduce
- Apache Sqoop
- Apache Hive
- Impala
- Spark
- HDFS
- Amazon S3
- Yarn Kafka
- Apache Flume
- Zeppelin/Jupyter Apache NiFi
- Amazon Notebooks
- Tableau
- HBase
- Cassandra
- Distributions: Kinesis
- Apache Spark
- Spark MLlib
- Pandas. MongoDB
- CDH 5.x
- 4.x Storm DynamoDB
- Redshift. Hortonworks 2.6 Apache Ranger
- Hue
- Ambari
- Zookeeper
- Apache Solr
- Elastic Apache Avro
- Apache Apache Kerberos
- Oozie
- Mesos Search
- Apache Lucene Parquet
- JSON
- CSV
- Apache Knox core RC Files. Amazon IAM Java
- C
- C++
- Scala
- SOAP
- REST
- JavaScript
- Linux Distributions Git
- SVN
- CVS Python JQuery
- XML
- HTML
- CSS
- (Debian penSUSE
- AJAX Arch and Fedora)
- Windows and Mac OS
PROFESSIONAL EXPERIENCE:
Big Data Developer
Confidential, San Diego, CA
Responsibilities:- Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
- Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
- Designed and developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
- Integrated NoSQL database like Hbase with Apache Spark to move bulk amount of data into HBase.
- Responsible for developing data pipeline using Flume, Sqoop and Spark to extract the data from warehouses and weblogs and store it in HDFS. Automated the process by using Oozie workflows.
- Experienced in writing Hive UDFs to sort Structure fields and return complex data types based on the required schema.
- Experienced with both batch processing and stream processing of data sources using Apache Spark.
- Responsible for handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other techniques during the Ingestion process itself.
Hadoop Developer
Confidential, Overland Park, Kansas
Responsibilities:- Created a spark streaming applications to aggregate huge files of clickstream data based on Sessions and store the aggregated data in HDFS.
- Otimized the processing time of the applications by caching data and partitioning it where appropriate, and tuned the Spark applications via configuration changes.
- Developed Kafka producer and consumers for message handling and wrote spark scripts to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Improved the performance of data transformations by optimizing the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, and Spark YARN.
- Worked with Teradata Tools and Utilities (FastLoad, MultiLoad, BTEQ, FastExport)
- Developed UDF’s in Python for enabling functionalities and enhancing existing ones for Pig and Hive scripts.
- Worked on importing and exporting data from various NoSQL databases frequently.
- Involved in Unit testing and delivered Unit test plans and results documents.
Hadoop Developer
Confidential, Atlanta, GA
Responsibilities:- Experience in installation, configuration, supporting and managing Hadoop clusters
- Used Storm for extracting the data by designing a topology as per client requirement.
- Optimized Hive queries to extract the customer information from Cassandra.
- Worked in fine-tuning search queries and designing tables, views, indexes using Cassandra and wrote DDL and DML scripts for data store operations.
- Used Zookeeper for cluster co-ordination and Kafka Offset monitoring
- Developed and Implemented POC's to load data from Kafka connectors to Cassandra and HDFS.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce programs.
- Wrote Map Reduce jobs in java to parse the web logs and store them in HDFS and used MRUnit to test and debug the written MapReduce programs.
Java Developer
Confidential
Responsibilities:- Involved in Full Life Cycle Development. Part of core Design Architect team.
- Involve in the team of implementation of all requirements using a wide range of technologies including Java, J2EE, Executor Service Framework, web services and associated technologies.
- Developed Business components using Java Objects and used Hibernate framework to map Java classes to database.
- Worked with pojo class mappings using spring and hibernate and extensively used Jquery and Javascript for validations.
- Used Hibernate for the backend persistence and designed and built SOAP web service interfaces.
- Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing the session for database transactions respectively. Used HQL and SQL for fetching and storing data in databases
- Worked in multi-threaded environment for process based applications to interrupt threads in embedded systems and actively involved I in deployment planning and build setup as per technical specifications.
- Worked on enhancement of the application which are already implemented and focused on high-performance improvements to new and existing software applications.
Java Developer
Confidential
Responsibilities:
- Involve in the team of architecting and implementation of all requirements using a wide range of technologies including Java, Spring MVC and used Hibernate for the backend persistence.
- Used Spring framework for dependency injection and integrated with Hibernate and JSF.
- Involved in writing Spring Configuration XML file that contains object declarations and dependencies.
- Developed core java classes using OOPS Concepts
- Provide technical support on client side issues.
- Performed business systems analysis and proposed solutions that fit requirements.
- Deployment planning and build setup as per technical specifications.
- Worked on creating POJO classes for spring framework.
- Worked on enhancement of the application which are already implemented and focused on high-performance improvements to new and existing software applications.
- Got an oppourtunity to work on multi-threading environment.