Senior Big Data Developer Resume
New York, NY
SUMMARY:
- 8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Last 3+ years working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
- Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
- Leveraged strong Skills in developing applications involving Big Data technologies likeHadoop, Spark,ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Skilled programming in Map - Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Experience in implementingInverted Indexing algorithm using MapReduce.
- Extensive experiencein creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
- Experience in setting up standards and processes for Hadoop based application design and implementation.
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS4.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
- Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
- Experience in coordinating Cluster services through ZooKeeper.
- Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper, MyEclipse.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
- Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
- Experience in writing tests using Spec2, Scala Test, Selenium, TestNg and Junit.
- Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
- Worked on different OS like UNIX/Linux, Windows XP, and Windows
- A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
- Proficient in adapting to the new Work Environment and Technologies.
- Quick learner and self-motivated team player with excellent interpersonal skills.
- Well focused and can meet the expected deadlines on target.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring and MRUnit
Languages: C,C++, Java, Python, Linux shell scripts, SQL
Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2
Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, CSS, AJAX, JSON, Servlets,JSP
Reporting Tools: Jasper Reports, iReports
ETL Tools: Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Senior Big Data Developer
Confidential,New York,NY
Responsibilities:- • Extensively involved in installation and configuration of Cloudera Distribution Hadoop platform.
- • Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with DataFrames in Spark.
- Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey).
- • Extend the capabilities of DataFrames using User Defined Functions in Python and Scala.
- • Resolve missing fields in DataFrame rows using filtering and imputation.
- • Integrate visualizations into a Spark application using Databricks and popular visualization libraries (ggplot, matplotlib).
- • Train analytical models with Spark ML estimators including: linear regression, decision trees, logistic regression, and k-means.
- • Perform pre-processing on a dataset prior to training, including: standardization, normalization.
- • Create pipelines to create a processing pipeline including transformations, estimations, evaluation of analytical models.
- • Evaluate model accuracy by dividing data into training and test datasets and computing metrics using evaluators.
- • Tune training hyper-parameters by integrating cross-validation into pipelines.
- • Compute using Spark MLlib functionality not present in SparkML by converting DataFrames to RDDs and applying RDD transformations and actions.
- • Troubleshoot and tune machine learning algorithms in Spark.
Environment: Spark 1.6.2, Spark Mllib, Spark ML, Hive 1.2.1, Sqoop 1.4.6, Flume 1.5.0, HBase 1.1.4, MySQL 5.6, Scala 2.11.x, Pyspark 1.4.0, Shell Scripting, Tableau 9.2, Agile
Big Data Engineer
Confidential, Boston,MA
Responsibilities:- • Responsible for building scalable distributed data solutions using Hadoop.
- • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
- • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- • Experienced on loading and transforming of large sets of structured and semi structured data.
- • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- • Export filtered data into HBase for fast query.
Environment: Hadoop, HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java
Big Data Developer
Confidential,Boston,MA
Responsibilities:
- • Involve in meeting and release, working closely with my teammates and managers.
- • Developed on Hadoop technologies including HDFS, MapReduce2, YARN, Hive, HBase, Sqoop, Spark Streaming and RabbitMQ.
- • Translated, loaded and streamed disparate data sets in multiple formats/sources including Arvo, JSON delivered by Kafka queue, RabbitMQ, Flume etc.
- • Translated functional and technical requirements into detail programs running on Hadoop MapReduce and Spark.
- • Migrated traditional database code to distributed system code (mainly HiveQL).
- • Migrated data between RDBMS and HDFS/Hive with Sqoop.
- • Used HBase for scalable storage and fast query.
- • Involved in application performance tuning and troubleshooting.
Environment: Hadoop, HBase, MapReduce, Spark, Flume, Sqoop, Kafka, RabbitMQ, Hive
Database Developer
Confidential,NY
Responsibilities:- • Involved in system design, which is based on Spring Struts Hibernate framework.
- • Implemented the business logic in standalone Java classes using core Java.
- • Developed database (SQL Server) applications.
- • Worked in Spring Hibernate Template to access the SQL Server database.
- • Design, implementing, and test new features by using T-SQL programming.
- • Optimize existing data aggregation and reporting for better performance.
- • Perform varied analyses to support organization and client improvement.
Environment: Eclipse, SQL Server 2012, spring, HTML, JavaScript, Hibernate, JSF, Junit, SDLC: Agile/Scrum
Software Engineer
Confidential
Responsibilities:- • Designed and coded application components with JSP, Servlet and AJAX.
- • Implemented data persistency using JDBC for database connectivity and Hibernate for database/java object mapping.
- • Designed the logical and physical data model, generated DDL, DML scripts.
- • Designed user-interface and used JavaScript to check validations.
- • Wrote SQL queries, stored procedures and database triggers as required on the database objects.
Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology
Software Engineer
Confidential,NJ
Responsibilities:- • Developed the data parsing system on XML.
- • Developed the system UI using Java Swing.
- • Developed with Struts/Hibernate frameworks as MVC layer.
- • Developed front end application using HTML/CSS, JavaScript, JSP.
- • Developed SQL queries using Oracle database.
Environment: Eclipse, MySQL Client 4.1, JSP, HTML, JavaScript, spring, Hibernate, SDLC: Agile/Scrum