Hadoop Developer And Solution Architect Resume
Houston, Texas
SUMMARY:
- Over 7 years of experience in Software Environment with 4.5 years of experience on Hadoop Ecosystem.
- Hadoop developer with experience in Hadoop, database management system architecture, Java core, Testing and Implementing Big Data.
- Experience using Cloudera and Horton Works platform and their Eco - System. Experience in installing, configuring and using ecosystem components like Hadoop, Map-Reduce, HDFS, Pig, Hive, Sqoop and Flume.
- Installed, configured & administered Cloudera cluster .
- Experienced in providing technical solutions to the business on applications that are developed on Hadoop and Its eco-systems. Experience in cloud platforms like AWS, AZURE.
- Thrives on challenge and works well under pressure, with technical expertise to learn new environments quickly, locate inefficiencies in code, and provide quick solutions
- Extensive experience in data ingestion, big data storage planning, complex transformations, data integration, analysis for Pharmaceutical, Healthcare and Retail sectors.
- Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster
- Written python program for web scrapping and converting HTML to Text .
- Setting up environment required to execute various PIG and Hive jobs.
- Executed faster Map reduce functions using Spark RDD for parallel processing or referencing a dataset in HDFS, HBase and other data sources
- Implemented Hive Serde, for reading Json and Xml files.
- Experience in configuring hive and Oozie for metadata store in the Microsoft SQL server.
- Implemented Flume for the data from Sensor generated log files to load directly in Hdfs
- Used Oozie Platform required for executing daily jobs, Imported Tables and analyzed through tableau server.
- Handled importing the data from various data sources and performed transformation using Java, Hive, Pig.
- Yarn, HBase, Sqoop, Oozie, Flume, Windows Azure, Zookeeper.
- Created Internal and External tables in Hive and Hive Serialization and deserialization tables in HDFS.
- Analyzed large amount of data with formats including XML, Json and Relational files from different Data Sources.
- Imported Data from Local machine to the HDFS using Sqoop and flume. Sqoop for relational and Flume for Log Files.
- Wrote java programs to read data form XML file and transfer it to HDFS.
- Worked with efficient storage formats like PARQUET, AVRO and ORC integrated them with Hadoop and the ecosystem (Hive, Impala, and spark). Also used compressions like Snappy and Zlib.
- Converted XML and Json Files using UDFs, adding jars to library of Hive and imported data to Hive-Serde tables.
- Created complex data types based on the input source format with the help of PIG.
- Explained the reports to project head on weekly basis and handed over the weekly jobs and job details.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
TECHNICAL SKILLS:
Big Data Ecosystems: Sqoop, Flume, Hive, Pig, Oozie, Kafka, Map-Reduce, HBase, Spark SQL,Spark RDD, Cassandra, Avro, Orc.Talend.
Big Data Platform: Cloudera CDH4X,CDH 5X, Hortonworks Sandbox, Windows Azure
L anguages: Java, Python, shell scripting.
Testing Tools: Eclipse, NetBeans, PyCharm, Sublime Text
Software development Methodologies: Agile, Waterfall.
Web Languages: HTML,CSS
D atabases: Oracle 11g,MySQL,Microsoft SQL SERVER
ETL TOOL: Informatica
NoSQL DataBases: HBase, Cassandra
Operating Systems: Linux, Windows
Business Tools: Tableau, Google Analytics, M S Exce l, Power Query, GIS tools, SAS.
Version Control: GitHub, GitLab
SAP Tool: SAP curriculum
Proficient languages: Hive, Hive (SerDe), SQL, Pig, Sqoop.
Web Services: Restful, SOAP, WSDL.
GeoSpactial Tools: ESRI ARC GIS(ONLINE)
PROFESSIONAL EXPERIENCE:
Confidential, Houston, Texas
Hadoop Developer and Solution Architect
Responsibilities:
- Developed End-To-End solution for Data ingestion from a MapR source system into Hortonworks HDFS system
- Developed shell scripts to pull data from a MapR cluster over webhdfs URL into Hortonworks framework.
- Installed and configured zookeeper and kafka.
- Wrote Java programs to read data from HTML and converting it to text to transfer into MySQL database.
- Developed flume scripts to store real time sensor log data into HDFS and cleaned using pig scripts with Regex
- Implemented Regular - expressions to set the limit of the data which needs to be pulled in over webhdfs URL.
- Developed hive scripts for Raw and Enriched layers and Pig scripts to clean data form the Raw and Enriched Layers.
- Developed Hive - Hbase configuration tables to pull the table properties while executing shell scripts in order to know configuration of the source system in order to pull tables from the correct schema.
- Created Hive - Hbase tables for data storage .Hive for Meta-Store and Hbase for data storage in Row Key Format.
- Created Hive ORC managed tables form a Hive external table for faster retrieval of large amount of data
- Implemented Hive Partitioning and Bucketing concepts on large data sets for further sub-division and fast data retrieval
- Analyzed large amount of data every day including XML, JSON and Relational files from different data Sources
- Developed pig scripts for data cleaning and rearranging columns in hive enriched tables.
- Developed Oozie workflows to automate the data Ingestion process for running shell scripts responsible for data ingestion, cleaning and loading into hive managed ORC table.
- Implemented and created Falcon cluster to schedule oozie workflows every 12 hours in order to call shell scripts to perform their functions for data ingestion, cleaning and loading.
Java/Hadoop Developer
Responsibilities:
- Processed Data from our Data Center Unit and transferred the data to the various Departments for further analysis.
- Implemented Map-Reduce Java Programs for the processing of the large data Sets in the HDFS.
- Implemented Hadoop frame-work technologies like Map-Reduce, SQOOP, Hive, Pig, for analyzing the change in the trends in the health of the machine based on earlier analysis from the log files.
- Analyzed the health data of the machine transferred to our Data Center unit.
- Executed speedy reviews and first mover advantages by using workflows like Oozie in order to automated the data
- Worked on devseloping applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
- Loading process into the Hadoop distributed File System and Pig language in order to preprocess the data.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-
- Reduce, Pig, Hive, Sqoop, flume) as well as some system specific jobs like (shell scripts)
- Importing and exporting large sets of data into HDFS and vice-versa using Sqoop.
- Used Java for reading data from mysql database and transferring it to hdfs.
- Transferred log files from the log generating servers into HDFS.
- Read the log generated data form HDFS using advanced HiveQL(Serialization-DeSerialization).
- Executed the HiveQL commands on CLI and transferred back the required output data to HDFS.
- Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition
- Assisted the project manager in problem solving with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
- Solved performance issues in Hive and pig with understanding of Joins, Group and aggregation and how does it transfer to Map-Reduce.
- Map reduce program and adding external jars for the Map-Reduce Program.
Environment: Windows Azure, Horton works Hive, Pig, Hive Bucketing, Hive partitioning, Linux, ETL, Oozie, Map-Reduce, Sqoop, HBase, Shell Scripting.
Confidential, FREMONT, CAJava/Hadoop developer
Responsibilities:
- Transferred data from a Data Source using Sqoop to HDFS to perform analysis.
- Stored data from HDFS to respective Hive tables for further analysis in order to identify the Trends in data.
- Deeply analyzed the trend in the Customer Behavior and the cause leading to that behavior.
- Developed Hive Ad-Hoc queries filtered data in order to increase the efficiency of the process execution by using
- Developed several REST web services supporting both XML and JSON to perform demand-response management.
- Developed REST/HTTP APIs for exposing web applications.
- Developed Map - R functions using HashMap concepts in Java.
- Designed, Developed, Tested and deployed web services.
- Increased the time efficiency of the HIVEQL and reduced the time difference of executing the sets of data by applying the compression techniques like SNAPPY, ZLIB for Map-Reduce Jobs.
- Created Hive Partitions for storing Data for Different Trends under Different Partitions.
- Have around 2 + years of working with Cassandra. Used Cassandra for storing Time series Data events for analysis process.
- Integrated Cassandra with Hive to use the Map reduce properties and Map meaningful columns.
- Connected the hive tables to Data analyzing tools like Tableau for Graphical representation of the trends.
- Checked the health and proper functioning through monitoring tools like Ambari.
- Used Ambari to check the health of the nodes which and proper functioning of zookeeper while Using kafka Consumer and producer applications.
- Built big Data solutions using HBase, handling millions of records for the different trends of data and exporting it to Hive for analysis.
Environment: Cloudera, Hadoop, Hive, SNAPPY and ZLIB compression, HBase, Hive Bucketing & partitioning, Tableau, excel, Hive-Hbase, Esri ARC GIS, Linux, Sqoop.
Confidential, Chicago, ILHadoop Developer
RESPONSIBILITIES:
- Analyzed data using Hadoop Components Hive and Pig.
- Worked Hands on With ETL process
- Created Spark SQL queries for faster processing of data.
- Used Spark RDD for faster Data sharing.
- Developed Hadoop streaming jobs to ingest large amount of data.
- Converted data from csv format to text format for mysql database injection using JAVA
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Responsible for loading data from various file system to HDFS using Unix command line utilities.
- Handled imported data form different data sources, performed transformation using Hive, Pig and Map- Reduce and loaded data in HDFS.
- Written Hive Queries to fetch Data from HBase and transferred to HDFS through HIVE.
- Imported the data from RDBMS (MYSQL) to HDFS using Sqoop.
- Exported the patterns analyzed back to Mysql using Sqoop.
- Executed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Used JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s
Environment: Windows Azure, Horton Works, Hive, Hive-Serde, Pig, Hive-Udf, RDBMS, HDFS, Map-Reduce, eclipse, NetBeans.
Confidential, Englewood, NJHadoop Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects. Involved in creating Hive Tables, and loading and analyzing data using hive queries for the customer card Details.
- Developed Simple to complex Map-Reduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in loading data from UNIX file system to HDFS.
- Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
- Responsible for managing data from multiple sources. Experienced in running Hadoop streaming jobs to process terabytes of JSON format data.
- Responsible to manage customer data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop, Hive, SNAPPY compression, HBase, Hive Bucketing and Partitioning, Tableau, excel, Hive-Hbase, Esri ARC GIS, Linux, SQOOP.
Confidential, San-Diego, CAJava Developer
Responsibilities:
- Handling multiple projects internal/external involving end to end testing from requirement gathering to providing the final testing report and logging of defects/recommendations in Jira. Project management using agile process
- Developed clean documented and reusable code. Uploaded Clean Codes Into GIT as a repository for application project folders.
- Experience in using application servers like Tomcat Servers
- Developed applications using Core Java, HTML, CSS.
- Written Java programs for parsing XML, JSON HTML files.
- Good Understanding of SOA technologies like SOAP, Restful Web Services.
- Developed Java codes for parsing the data out of the XML files.
- JIRA for trouble tickets and confluence for our knowledge base.
- Created tables maintaining the health records in different Tables.
- Maintain referential integrity, domain integrity and column integrity by using the available options such as constraints etc.
Environment: Eclipse IDE, GIT hub, Restful, Soap, Web Services, HTML, CSS.