We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

3.00/5 (Submit Your Rating)

Mclean, VA

SUMMARY

  • 8+ years of experience with emphasis in designing and implementing statistically significant analytic solutions on Big Data Technologies and Java based enterprise applications.
  • 4 years of implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, and Zookeeper.
  • Experience with different distribution systems Like Cloudera, Hortonworks, EMR, MAPR.
  • Good Exposure to Name Node Federation and MapReduce 2.0(MRV2) or YARN.
  • Hands on Experience on different big data ingestion tools like Flume, Sqoop, Kafka.
  • Experienced in implementing scheduler using Oozie, Crontab and Shell scripts.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice versa.
  • Experienced on Loading streaming data into HDFS using Kafka messaging system.
  • Worked on ELK stack like Elastic search, Logstash, Kibana.
  • Experience in different Spark Modules like Spark - SQL, Spark Mllib, Spark Streaming, GraphX.
  • Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core
  • Experienced in DevelopingSparkprograms using Scala and JavaAPI's.
  • Expert in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Involved in integrating Hive queries into Spark environment using SparkSql.
  • Experience in using MapReduce Design patterns to solve complex MapReduce program.
  • Experience in developing Real-Time Streaming Solutions using DStreams, Accumulatorvariables, Broadcastvariables, RDDcaching for Spark Streaming.
  • Experienced in Data cleansing process using Pig Latin operations and UDF’s.
  • Expertise in implement Ad-hoc queries and complex business logic to in corporate in HiveQL.
  • Experienced in working with structured data using Hive QL, Join operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
  • Good knowledge in working with NoSQL databases including HBase, MongoDB, Cassandra and Neo4J.
  • Expertise in using Kafka as a messaging system to implement real-time Streaming solutions
  • Experience inApache SOLR to implement indexing and wrote Custom SOLR segments to optimize the search.
  • Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
  • Responsible for building scalable distributed data solutions using Datastax Cassandra.
  • Proficient with Cluster management and configuring Cassandra Database.
  • Expertise in writing the Real-time processing application using spout and bout in Storm.
  • Worked on MongoDB for distributed storage and Processing.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency
  • Worked in file formats like AVRO, PARQUET, CSV and compression techniques like LZO, GZIP, Bzip2 and Snappy.
  • Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWSAmazon EC2 and Amazon EMR.
  • Good understanding of MPP databases such as HP Vertica and Impala.
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Experience in Building Web-based, Enterprise level and Stand-alone application using JSP, Struts, Spring, Hibernate, JSF, Restful Web services.
  • Experienced in build tool Maven and continuous integration like Jenkins.
  • Experience in different utilities tools like Eclipse, Intellij, SBT.
  • Experience in automated scripts using Unixshell scripting to perform database activities.
  • Good working experience in working with Big Data ETL/BI tools like Pentaho, Tableau.
  • Experienced in Agile Scrum waterfall and Test-Driven Development methodologies.
  • Experienced in ticketing tools like Jira, Service now.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Solr, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache EMR

Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++.

No SQL Databases: Cassandra, MongoDB, HBase, Neo4J

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts.

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB.

Development Methodologies: Agile, waterfall.

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, Spring and Hibernate.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle.

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2.

Operating systems: UNIX, LINUX, Mac OS and Windows Variants.

ETL Tools: Talend, Informatica, Pentaho.

PROFESSIONAL EXPERIENCE

Sr. Big Data Developer

Confidential, McLean, VA

Responsibilities:

  • Worked on analyzingHadoopcluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop,Spark, Kafka and Impala with Cloudera distribution.
  • Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
  • Imported metadata from Relational Databases like Oracle and MySQL using Sqoop.
  • Developed and Configured Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured, deployed and maintained multi-node Dev and tested Kafka Clusters.
  • Expertise in developing multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
  • Active member for developing POC on streaming data using Apache Kafka and SparkStreaming.
  • UsedSpark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and persists into Cassandra.
  • Configured Spark Streaming data to receive real time data from Kafka and store it in HDFS.
  • DevelopedSparkScripts using Scala and SparkSQL for faster testing and processing of data.
  • Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
  • Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Worked with data science team to build statistical model with Spark MLLIBand PySpark.
  • Worked with MLLIB algorithms for streaming data such as linear regression using ordinary K-means clustering.
  • Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
  • ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to the application wise job loads.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs and loading data into HDFS.
  • Designed, developed data integration programs in aHadoopenvironment with NoSQL data store Cassandra for data access and analysis.
  • Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Created Cassandra tables to store various data formats of data coming from the various sources.
  • Good knowledge in using Data Manipulations, Tombstones and Compactions in Cassandra.
  • Experience in working on CQL (Cassandra Query Language) for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Involved in maintaining the Big Data servers using Ganglia and Nagios.
  • Developed Schedulers that communicated withCloud based services(AWS) to retrieve data.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services EC2 and S3 for small data sets processing and storage,
  • Experienced in Maintaining theHadoopcluster on AWS EMR.
  • The entire process includes complex data extracting, cleansing, filtering, mapping, validating, transforming, and loading into various dimensions and fact tables precisely.
  • Extracted data from various data source includingOLEDB, Green plum, Excel, Flat files and XML.
  • Involved in runningHadoopstreaming jobs to process terabytes of text data.
  • Applied fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently.
  • Worked with data science team to build statistical model with Spark
  • Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
  • Written complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history.
  • Implemented ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
  • Well versed on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Worked with a cluster of nodes.
  • Used the external tables in Impala for data analysis.
  • Experienced with Full Text Search and Faceted Reader search using Solr and implemented data querying with Solr.
  • Experienced with reporting tools like Tableau to generate reports.
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Worked with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop, YARN, Spark- Core, Spark Streaming, AWS EC2, S3, AWS EMR, Spark-SQL, GraphX, Scala, PySpark, Kafka, Hive, Pig, Sqoop, Solr, Impala, Cassandra, Informatica, Cloudera, Maven, Agile, GitHub, Tableau.

Big Data Developer

Confidential, Shreveport, LA

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoopecosystem.
  • Experienced in designing and deployment of Hadoop cluster and differentbig data analytic tools including Pig, Hive, Flume, Hbase and Sqoop.
  • Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
  • Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by Flume.
  • Developed business logic in Flume interceptor in Java.
  • Implementing quality checks and transformations using Flume Interceptor.
  • Developed simple and complex MapReduce programs in Hive, Pig and Python for Data Analysis on different data formats.
  • Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
  • Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
  • Experienced in Kerberos authentication to establish a more secure network communication on the cluster.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Managed and reviewed Hadoop and HBase log files.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Experienced in writing Spark Applications in Scala and Python.
  • Used SparkSQL to handle structured data in Hive.
  • Imported semi-structured data from Avro files using Pig to make serialization faster
  • Processed the web server logs by developing Multi-hop flume agents by using AvroSink and loaded into MongoDB for further analysis.
  • Experienced in converting Hive/SQL queries into Spark transformations using SparkRDD, Scala and Python.
  • Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
  • Imported data from AWSS3 and into SparkRDD and performed transformations and actions on RDD’s.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Managing and scheduling Jobs on a Hadoop Cluster using Oozie workflows and Java schedulers.
  • Continuous monitoring and managing theHadoop cluster through Hortonworks(HDP) distribution.
  • Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
  • Involved in review of functional and non-functional requirements.
  • Indexed documents using Elastic search.
  • Worked on MongoDB for distributed Storage and Processing.
  • Implemented Collections and Aggregation Frameworks in MongoDB.
  • Implemented B Tree Indexing on the data files which are stored in MongoDB.
  • Good knowledge in using MongoDBCRUD operations.
  • Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
  • Implemented Flume NG MongoDB sink to load the JSON- styled data into MongoDB.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud service Amazon Redshift.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
  • UsedZookeeperto provide coordination services to the cluster.
  • CreatedHivequeries that helped market analysts spot emerging trends by comparing fresh data withreference tables and historical metrics.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
  • Written Shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved inHadoopcluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Followed Agile methodology for the entire project.
  • Experienced in Extreme Programming, Test-Driven Development and Agile Scrum

Environment: Hortonworks(HDP), Hadoop, Spark, Sqoop, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MongoDB, Java, Python, MapReduce, HDFS, Tableau, Informatica.

Big Data Developer

Confidential, Mobile, AL

Responsibilities:

  • Involved in handling large amount of data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in defining job flows, managing and reviewing log files.
  • Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
  • Migrated large amount of data from various Databases like Oracle, Netezza, MySQL toHadoop.
  • Imported Bulk Data into HBase Using Map Reduce programs.
  • Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
  • Imported the weblogs using Flume.
  • Perform analytics on Time Series Data exists in HBase using HBaseAPI.
  • Designed and implemented Incremental Imports into Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
  • Involved with File Processing using Pig Latin.
  • Scheduled jobs using Oozie workflow Engine.
  • Worked on various compression techniques like GZIP and LZO.
  • Ingesting Log data from various web servers into HDFS using Apache Flume.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in Map Reduce way.
  • Experience in optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
  • Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
  • Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
  • Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Linux, Scoop, Flume, Oracle, MySQL, Eclipse, AWS EC2, Cloudera.

Java Developer

Confidential

Responsibilities:

  • Understanding requirement and the technical aspects and architecture of the existing system.
  • Help Design application development using SpringMVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, jQuery andAJAX.
  • Utilized various JavaScript andjQuerylibraries, AJAX for form validation and other interactive features.
  • Involved in writing SQL queries for fetching data from Oracle database.
  • Developed multi-tiered web - application using J2EE standards.
  • Designed and developed Web Services to store and retrieve user profile information from database.
  • Used Apache Axis to develop web services and SOAP protocol for web services communication.
  • Used SpringDAO concept to interact with Database using JDBC template and Hibernate template.
  • Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
  • Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
  • Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
  • Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
  • Worked closely with team members on and offshore in development when having dependencies.
  • Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, Agile, Git, SVN.

Java Developer

Confidential

Responsibilities:

  • Participated in all the phases of the Software development life cycle (SDLC) which includesDevelopment, Testing, Implementation and Maintenance.
  • Involved in collecting client requirements and preparing the design documents.
  • Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
  • Developed the JAVA classes to execute the business logic and to collect the input data from theusers using JAVA, Oracle.
  • Involved in creation of scripts to create, update and delete data from the tables.
  • Followed Agile Methodology in analyze, define, and document the application which will supportfunctional and business requirements.
  • Wrote JSP using HTML tags for designing UI for different pages.
  • Extensively used OOD concepts in overall design and development of the system.
  • Developed user interface using Spring JSP to simplify the complexities of the application.
  • Responsible for Development, unit testing and implementation of the application.
  • Used Agile methodology to design, develop and deploy the changes.
  • Extensively used tools like AccVerify, Check style and Clockworks to check the code.

Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML,WebLogic.

We'd love your feedback!