Sr. Big Data Developer Resume
Mclean, VA
SUMMARY
- 8+ years of experience with emphasis in designing and implementing statistically significant analytic solutions on Big Data Technologies and Java based enterprise applications.
- 4 years of implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, and Zookeeper.
- Experience with different distribution systems Like Cloudera, Hortonworks, EMR, MAPR.
- Good Exposure to Name Node Federation and MapReduce 2.0(MRV2) or YARN.
- Hands on Experience on different big data ingestion tools like Flume, Sqoop, Kafka.
- Experienced in implementing scheduler using Oozie, Crontab and Shell scripts.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice versa.
- Experienced on Loading streaming data into HDFS using Kafka messaging system.
- Worked on ELK stack like Elastic search, Logstash, Kibana.
- Experience in different Spark Modules like Spark - SQL, Spark Mllib, Spark Streaming, GraphX.
- Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core
- Experienced in DevelopingSparkprograms using Scala and JavaAPI's.
- Expert in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Involved in integrating Hive queries into Spark environment using SparkSql.
- Experience in using MapReduce Design patterns to solve complex MapReduce program.
- Experience in developing Real-Time Streaming Solutions using DStreams, Accumulatorvariables, Broadcastvariables, RDDcaching for Spark Streaming.
- Experienced in Data cleansing process using Pig Latin operations and UDF’s.
- Expertise in implement Ad-hoc queries and complex business logic to in corporate in HiveQL.
- Experienced in working with structured data using Hive QL, Join operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
- Good knowledge in working with NoSQL databases including HBase, MongoDB, Cassandra and Neo4J.
- Expertise in using Kafka as a messaging system to implement real-time Streaming solutions
- Experience inApache SOLR to implement indexing and wrote Custom SOLR segments to optimize the search.
- Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
- Responsible for building scalable distributed data solutions using Datastax Cassandra.
- Proficient with Cluster management and configuring Cassandra Database.
- Expertise in writing the Real-time processing application using spout and bout in Storm.
- Worked on MongoDB for distributed storage and Processing.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency
- Worked in file formats like AVRO, PARQUET, CSV and compression techniques like LZO, GZIP, Bzip2 and Snappy.
- Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWSAmazon EC2 and Amazon EMR.
- Good understanding of MPP databases such as HP Vertica and Impala.
- Build AWS secured solutions by creating VPC with private and public subnets.
- Experience in Building Web-based, Enterprise level and Stand-alone application using JSP, Struts, Spring, Hibernate, JSF, Restful Web services.
- Experienced in build tool Maven and continuous integration like Jenkins.
- Experience in different utilities tools like Eclipse, Intellij, SBT.
- Experience in automated scripts using Unixshell scripting to perform database activities.
- Good working experience in working with Big Data ETL/BI tools like Pentaho, Tableau.
- Experienced in Agile Scrum waterfall and Test-Driven Development methodologies.
- Experienced in ticketing tools like Jira, Service now.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Solr, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache EMR
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++.
No SQL Databases: Cassandra, MongoDB, HBase, Neo4J
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts.
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB.
Development Methodologies: Agile, waterfall.
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON.
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
Frameworks: Struts, Spring and Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle.
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2.
Operating systems: UNIX, LINUX, Mac OS and Windows Variants.
ETL Tools: Talend, Informatica, Pentaho.
PROFESSIONAL EXPERIENCE
Sr. Big Data Developer
Confidential, McLean, VA
Responsibilities:
- Worked on analyzingHadoopcluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop,Spark, Kafka and Impala with Cloudera distribution.
- Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
- Imported metadata from Relational Databases like Oracle and MySQL using Sqoop.
- Developed and Configured Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
- Configured, deployed and maintained multi-node Dev and tested Kafka Clusters.
- Expertise in developing multiple Kafka Producers and Consumers from as per the software requirement specifications.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Active member for developing POC on streaming data using Apache Kafka and SparkStreaming.
- UsedSpark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and persists into Cassandra.
- Configured Spark Streaming data to receive real time data from Kafka and store it in HDFS.
- DevelopedSparkScripts using Scala and SparkSQL for faster testing and processing of data.
- Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
- Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- Worked with data science team to build statistical model with Spark MLLIBand PySpark.
- Worked with MLLIB algorithms for streaming data such as linear regression using ordinary K-means clustering.
- Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
- ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to the application wise job loads.
- Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs and loading data into HDFS.
- Designed, developed data integration programs in aHadoopenvironment with NoSQL data store Cassandra for data access and analysis.
- Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Created Cassandra tables to store various data formats of data coming from the various sources.
- Good knowledge in using Data Manipulations, Tombstones and Compactions in Cassandra.
- Experience in working on CQL (Cassandra Query Language) for retrieving the data present in Cassandra cluster by running queries in CQL.
- Involved in maintaining the Big Data servers using Ganglia and Nagios.
- Developed Schedulers that communicated withCloud based services(AWS) to retrieve data.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Migrated an existing on-premises application to AWS. Used AWS services EC2 and S3 for small data sets processing and storage,
- Experienced in Maintaining theHadoopcluster on AWS EMR.
- The entire process includes complex data extracting, cleansing, filtering, mapping, validating, transforming, and loading into various dimensions and fact tables precisely.
- Extracted data from various data source includingOLEDB, Green plum, Excel, Flat files and XML.
- Involved in runningHadoopstreaming jobs to process terabytes of text data.
- Applied fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently.
- Worked with data science team to build statistical model with Spark
- Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
- Written complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history.
- Implemented ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
- Well versed on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Worked with a cluster of nodes.
- Used the external tables in Impala for data analysis.
- Experienced with Full Text Search and Faceted Reader search using Solr and implemented data querying with Solr.
- Experienced with reporting tools like Tableau to generate reports.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Worked with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop, YARN, Spark- Core, Spark Streaming, AWS EC2, S3, AWS EMR, Spark-SQL, GraphX, Scala, PySpark, Kafka, Hive, Pig, Sqoop, Solr, Impala, Cassandra, Informatica, Cloudera, Maven, Agile, GitHub, Tableau.
Big Data Developer
Confidential, Shreveport, LA
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoopecosystem.
- Experienced in designing and deployment of Hadoop cluster and differentbig data analytic tools including Pig, Hive, Flume, Hbase and Sqoop.
- Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
- Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by Flume.
- Developed business logic in Flume interceptor in Java.
- Implementing quality checks and transformations using Flume Interceptor.
- Developed simple and complex MapReduce programs in Hive, Pig and Python for Data Analysis on different data formats.
- Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
- Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
- Experienced in Kerberos authentication to establish a more secure network communication on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Managed and reviewed Hadoop and HBase log files.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experienced in writing Spark Applications in Scala and Python.
- Used SparkSQL to handle structured data in Hive.
- Imported semi-structured data from Avro files using Pig to make serialization faster
- Processed the web server logs by developing Multi-hop flume agents by using AvroSink and loaded into MongoDB for further analysis.
- Experienced in converting Hive/SQL queries into Spark transformations using SparkRDD, Scala and Python.
- Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
- Imported data from AWSS3 and into SparkRDD and performed transformations and actions on RDD’s.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Managing and scheduling Jobs on a Hadoop Cluster using Oozie workflows and Java schedulers.
- Continuous monitoring and managing theHadoop cluster through Hortonworks(HDP) distribution.
- Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
- Involved in review of functional and non-functional requirements.
- Indexed documents using Elastic search.
- Worked on MongoDB for distributed Storage and Processing.
- Implemented Collections and Aggregation Frameworks in MongoDB.
- Implemented B Tree Indexing on the data files which are stored in MongoDB.
- Good knowledge in using MongoDBCRUD operations.
- Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
- Implemented Flume NG MongoDB sink to load the JSON- styled data into MongoDB.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud service Amazon Redshift.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
- UsedZookeeperto provide coordination services to the cluster.
- CreatedHivequeries that helped market analysts spot emerging trends by comparing fresh data withreference tables and historical metrics.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Designed and implemented Spark jobs to support distributed data processing.
- Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
- Written Shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved inHadoopcluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Followed Agile methodology for the entire project.
- Experienced in Extreme Programming, Test-Driven Development and Agile Scrum
Environment: Hortonworks(HDP), Hadoop, Spark, Sqoop, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MongoDB, Java, Python, MapReduce, HDFS, Tableau, Informatica.
Big Data Developer
Confidential, Mobile, AL
Responsibilities:
- Involved in handling large amount of data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Involved in defining job flows, managing and reviewing log files.
- Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL toHadoop.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Imported the weblogs using Flume.
- Perform analytics on Time Series Data exists in HBase using HBaseAPI.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
- Involved with File Processing using Pig Latin.
- Scheduled jobs using Oozie workflow Engine.
- Worked on various compression techniques like GZIP and LZO.
- Ingesting Log data from various web servers into HDFS using Apache Flume.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in Map Reduce way.
- Experience in optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
- Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Linux, Scoop, Flume, Oracle, MySQL, Eclipse, AWS EC2, Cloudera.
Java Developer
Confidential
Responsibilities:
- Understanding requirement and the technical aspects and architecture of the existing system.
- Help Design application development using SpringMVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, jQuery andAJAX.
- Utilized various JavaScript andjQuerylibraries, AJAX for form validation and other interactive features.
- Involved in writing SQL queries for fetching data from Oracle database.
- Developed multi-tiered web - application using J2EE standards.
- Designed and developed Web Services to store and retrieve user profile information from database.
- Used Apache Axis to develop web services and SOAP protocol for web services communication.
- Used SpringDAO concept to interact with Database using JDBC template and Hibernate template.
- Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
- Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
- Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
- Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
- Worked closely with team members on and offshore in development when having dependencies.
- Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.
Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, Agile, Git, SVN.
Java Developer
Confidential
Responsibilities:
- Participated in all the phases of the Software development life cycle (SDLC) which includesDevelopment, Testing, Implementation and Maintenance.
- Involved in collecting client requirements and preparing the design documents.
- Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
- Developed the JAVA classes to execute the business logic and to collect the input data from theusers using JAVA, Oracle.
- Involved in creation of scripts to create, update and delete data from the tables.
- Followed Agile Methodology in analyze, define, and document the application which will supportfunctional and business requirements.
- Wrote JSP using HTML tags for designing UI for different pages.
- Extensively used OOD concepts in overall design and development of the system.
- Developed user interface using Spring JSP to simplify the complexities of the application.
- Responsible for Development, unit testing and implementation of the application.
- Used Agile methodology to design, develop and deploy the changes.
- Extensively used tools like AccVerify, Check style and Clockworks to check the code.
Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML,WebLogic.