Sr. Big Data Developer Resume McLean, VA - Hire IT People

SUMMARY

8+ years of experience with emphasis in designing and implementing statistically significant analytic solutions on Big Data Technologies and Java based enterprise applications.
4 years of implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, and Zookeeper.
Experience with different distribution systems Like Cloudera, Hortonworks, EMR, MAPR.
Good Exposure to Name Node Federation and MapReduce 2.0(MRV2) or YARN.
Hands on Experience on different big data ingestion tools like Flume, Sqoop, Kafka.
Experienced in implementing scheduler using Oozie, Crontab and Shell scripts.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice versa.
Experienced on Loading streaming data into HDFS using Kafka messaging system.
Worked on ELK stack like Elastic search, Logstash, Kibana.
Experience in different Spark Modules like Spark - SQL, Spark Mllib, Spark Streaming, GraphX.
Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core
Experienced in DevelopingSparkprograms using Scala and JavaAPI's.
Expert in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Involved in integrating Hive queries into Spark environment using SparkSql.
Experience in using MapReduce Design patterns to solve complex MapReduce program.
Experience in developing Real-Time Streaming Solutions using DStreams, Accumulatorvariables, Broadcastvariables, RDDcaching for Spark Streaming.
Experienced in Data cleansing process using Pig Latin operations and UDF’s.
Expertise in implement Ad-hoc queries and complex business logic to in corporate in HiveQL.
Experienced in working with structured data using Hive QL, Join operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
Good knowledge in working with NoSQL databases including HBase, MongoDB, Cassandra and Neo4J.
Expertise in using Kafka as a messaging system to implement real-time Streaming solutions
Experience inApache SOLR to implement indexing and wrote Custom SOLR segments to optimize the search.
Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
Responsible for building scalable distributed data solutions using Datastax Cassandra.
Proficient with Cluster management and configuring Cassandra Database.
Expertise in writing the Real-time processing application using spout and bout in Storm.
Worked on MongoDB for distributed storage and Processing.
Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency
Worked in file formats like AVRO, PARQUET, CSV and compression techniques like LZO, GZIP, Bzip2 and Snappy.
Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWSAmazon EC2 and Amazon EMR.
Good understanding of MPP databases such as HP Vertica and Impala.
Build AWS secured solutions by creating VPC with private and public subnets.
Experience in Building Web-based, Enterprise level and Stand-alone application using JSP, Struts, Spring, Hibernate, JSF, Restful Web services.
Experienced in build tool Maven and continuous integration like Jenkins.
Experience in different utilities tools like Eclipse, Intellij, SBT.
Experience in automated scripts using Unixshell scripting to perform database activities.
Good working experience in working with Big Data ETL/BI tools like Pentaho, Tableau.
Experienced in Agile Scrum waterfall and Test-Driven Development methodologies.
Experienced in ticketing tools like Jira, Service now.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Solr, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache EMR

Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++.

No SQL Databases: Cassandra, MongoDB, HBase, Neo4J

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts.

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB.

Development Methodologies: Agile, waterfall.

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON.

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Frameworks: Struts, Spring and Hibernate.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle.

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2.

Operating systems: UNIX, LINUX, Mac OS and Windows Variants.

ETL Tools: Talend, Informatica, Pentaho.

PROFESSIONAL EXPERIENCE

Sr. Big Data Developer

Confidential, McLean, VA

Responsibilities:

Worked on analyzingHadoopcluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop,Spark, Kafka and Impala with Cloudera distribution.
Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
Imported metadata from Relational Databases like Oracle and MySQL using Sqoop.
Developed and Configured Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
Configured, deployed and maintained multi-node Dev and tested Kafka Clusters.
Expertise in developing multiple Kafka Producers and Consumers from as per the software requirement specifications.
Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
Active member for developing POC on streaming data using Apache Kafka and SparkStreaming.
UsedSpark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and persists into Cassandra.
Configured Spark Streaming data to receive real time data from Kafka and store it in HDFS.
DevelopedSparkScripts using Scala and SparkSQL for faster testing and processing of data.
Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
Worked with data science team to build statistical model with Spark MLLIBand PySpark.
Worked with MLLIB algorithms for streaming data such as linear regression using ordinary K-means clustering.
Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to the application wise job loads.
Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs and loading data into HDFS.
Designed, developed data integration programs in aHadoopenvironment with NoSQL data store Cassandra for data access and analysis.
Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
Created Cassandra tables to store various data formats of data coming from the various sources.
Good knowledge in using Data Manipulations, Tombstones and Compactions in Cassandra.
Experience in working on CQL (Cassandra Query Language) for retrieving the data present in Cassandra cluster by running queries in CQL.
Involved in maintaining the Big Data servers using Ganglia and Nagios.
Developed Schedulers that communicated withCloud based services(AWS) to retrieve data.
Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
Migrated an existing on-premises application to AWS. Used AWS services EC2 and S3 for small data sets processing and storage,
Experienced in Maintaining theHadoopcluster on AWS EMR.
The entire process includes complex data extracting, cleansing, filtering, mapping, validating, transforming, and loading into various dimensions and fact tables precisely.
Extracted data from various data source includingOLEDB, Green plum, Excel, Flat files and XML.
Involved in runningHadoopstreaming jobs to process terabytes of text data.
Applied fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently.
Worked with data science team to build statistical model with Spark
Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
Written complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history.
Implemented ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
Well versed on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
Worked with a cluster of nodes.
Used the external tables in Impala for data analysis.
Experienced with Full Text Search and Faceted Reader search using Solr and implemented data querying with Solr.
Experienced with reporting tools like Tableau to generate reports.
Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
Worked with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop, YARN, Spark- Core, Spark Streaming, AWS EC2, S3, AWS EMR, Spark-SQL, GraphX, Scala, PySpark, Kafka, Hive, Pig, Sqoop, Solr, Impala, Cassandra, Informatica, Cloudera, Maven, Agile, GitHub, Tableau.

Big Data Developer

Confidential, Shreveport, LA

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoopecosystem.
Experienced in designing and deployment of Hadoop cluster and differentbig data analytic tools including Pig, Hive, Flume, Hbase and Sqoop.
Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by Flume.
Developed business logic in Flume interceptor in Java.
Implementing quality checks and transformations using Flume Interceptor.
Developed simple and complex MapReduce programs in Hive, Pig and Python for Data Analysis on different data formats.
Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
Experienced in Kerberos authentication to establish a more secure network communication on the cluster.
Analyzed substantial data sets by running Hive queries and Pig scripts.
Managed and reviewed Hadoop and HBase log files.
Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
Experienced in writing Spark Applications in Scala and Python.
Used SparkSQL to handle structured data in Hive.
Imported semi-structured data from Avro files using Pig to make serialization faster
Processed the web server logs by developing Multi-hop flume agents by using AvroSink and loaded into MongoDB for further analysis.
Experienced in converting Hive/SQL queries into Spark transformations using SparkRDD, Scala and Python.
Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
Imported data from AWSS3 and into SparkRDD and performed transformations and actions on RDD’s.
Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
Managing and scheduling Jobs on a Hadoop Cluster using Oozie workflows and Java schedulers.
Continuous monitoring and managing theHadoop cluster through Hortonworks(HDP) distribution.
Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
Involved in review of functional and non-functional requirements.
Indexed documents using Elastic search.
Worked on MongoDB for distributed Storage and Processing.
Implemented Collections and Aggregation Frameworks in MongoDB.
Implemented B Tree Indexing on the data files which are stored in MongoDB.
Good knowledge in using MongoDBCRUD operations.
Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
Implemented Flume NG MongoDB sink to load the JSON- styled data into MongoDB.
Involved in loading data from UNIX file system and FTP to HDFS.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud service Amazon Redshift.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
UsedZookeeperto provide coordination services to the cluster.
CreatedHivequeries that helped market analysts spot emerging trends by comparing fresh data withreference tables and historical metrics.
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau.
Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
Designed and implemented Spark jobs to support distributed data processing.
Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
Written Shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
Involved inHadoopcluster task like Adding and Removing Nodes without any effect to running jobs and data.
Followed Agile methodology for the entire project.
Experienced in Extreme Programming, Test-Driven Development and Agile Scrum

Environment: Hortonworks(HDP), Hadoop, Spark, Sqoop, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MongoDB, Java, Python, MapReduce, HDFS, Tableau, Informatica.

Big Data Developer

Confidential, Mobile, AL

Responsibilities:

Involved in handling large amount of data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
Involved in defining job flows, managing and reviewing log files.
Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
Migrated large amount of data from various Databases like Oracle, Netezza, MySQL toHadoop.
Imported Bulk Data into HBase Using Map Reduce programs.
Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
Imported the weblogs using Flume.
Perform analytics on Time Series Data exists in HBase using HBaseAPI.
Designed and implemented Incremental Imports into Hive tables.
Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
Involved with File Processing using Pig Latin.
Scheduled jobs using Oozie workflow Engine.
Worked on various compression techniques like GZIP and LZO.
Ingesting Log data from various web servers into HDFS using Apache Flume.
Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in Map Reduce way.
Experience in optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Linux, Scoop, Flume, Oracle, MySQL, Eclipse, AWS EC2, Cloudera.

Java Developer

Confidential

Responsibilities:

Understanding requirement and the technical aspects and architecture of the existing system.
Help Design application development using SpringMVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, jQuery andAJAX.
Utilized various JavaScript andjQuerylibraries, AJAX for form validation and other interactive features.
Involved in writing SQL queries for fetching data from Oracle database.
Developed multi-tiered web - application using J2EE standards.
Designed and developed Web Services to store and retrieve user profile information from database.
Used Apache Axis to develop web services and SOAP protocol for web services communication.
Used SpringDAO concept to interact with Database using JDBC template and Hibernate template.
Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
Worked closely with team members on and offshore in development when having dependencies.
Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, Agile, Git, SVN.

Java Developer

Confidential

Responsibilities:

Participated in all the phases of the Software development life cycle (SDLC) which includesDevelopment, Testing, Implementation and Maintenance.
Involved in collecting client requirements and preparing the design documents.
Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
Developed the JAVA classes to execute the business logic and to collect the input data from theusers using JAVA, Oracle.
Involved in creation of scripts to create, update and delete data from the tables.
Followed Agile Methodology in analyze, define, and document the application which will supportfunctional and business requirements.
Wrote JSP using HTML tags for designing UI for different pages.
Extensively used OOD concepts in overall design and development of the system.
Developed user interface using Spring JSP to simplify the complexities of the application.
Responsible for Development, unit testing and implementation of the application.
Used Agile methodology to design, develop and deploy the changes.
Extensively used tools like AccVerify, Check style and Clockworks to check the code.

Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML,WebLogic.

We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Mclean, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship