Hadoop Data Analyst/developer Resume
Columbus, GA
SUMMARY
- 7+ Years of professional IT experience including 4 years of Big Data / Hadoop and Big Data analytics.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Load and transform large sets of structured, semi - structured and unstructured data using Hadoop ecosystem components.
- Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, PIG, SQOOP, Impala, Flume, Spark.
- Having hands on experience in writing Map Reduce jobs in Hive, Pig.
- Having experience on importing and exporting data from different systems to Hadoop file system using SQOOP.
- Having experience on creating databases, tables and views in HIVEQL, IMPALA and PIG LATIN.
- Experience in using different file formats like Avro, ORC, Sequence, CSV, etc.
- Excellent understanding of NoSQL database like Hbase.
- Implemented Proof of Concepts on Hadoop stack and different big data analytic tools, migration from different databases (Oracle, MySQL) to Hadoop.
- Highly experienced Database Administrator: Designing, Data Modeling, Installation, Configuring, Administration, Performance monitoring, Troubleshooting and Fine-tuning of RDBMS (DB2 LUW, SQL Server, Oracle, Paraccel - Matrix), NOSQL (Cassandra, SciDB, RedisDB and Mongo) databases, Graph database Neo4j
- Experience in Entire Hadoop echo system installation and maintenance of the components (HDFS, Hive,) Achieved above and Confidential award with Hadoop implementation.
- Experience in Data transfer from using Scoop to/from Hadoop to different data sources (DB2, Oracle etc.)
- Excellent Programming experience in Java, Scala, Python, R, C, C++, Perl, Ksh, Java Script, Pig, Hive, Impala, CSS, NLP
- Excellent skills in Visualization using D3.js, Python, R
- Enterprise level Data Architect: Analysis and Data integration roadmaps, Real time and Batch ETL processing, Interpret business needs into RDBMS and NOSQL conceptual
TECHNICAL SKILLS
Big Data: Hadoop/Big Data HDFS, MapReduce, HBase, Pig, Hive, Sqoop, FlumeOozie, Zookeeper, Spark, Storm, Impala and Kafka.
Programming: R, Python, SQL, Twitter & LinkedIn API, Web scrapping.
Databases: Oracle 10g, IBM DB2, MySQL, SQL Server, SAP RMS.
IDE’s/Tools: Tableau, MS Excel Risk Solver, Anaconda, PyCharm, iPython Notebook, Amazon Web Service.
Operating Systems: Linux (Ubuntu, CentOS, Red Hat Linux), Windows XP/7/8/10, OS X 10.11
Version Control: GitHub, SVN.
ANALYTICAL SKILLS: Machine Learning, Data Mining, Sentimental Analysis, Predictive Analytics, Statistical Data Analysis, Optimization, Decision Trees, Sensitivity Analysis, Data Modelling, Data Wrangling, Data Visualization, Cluster Analysis.
PROFESSIONAL EXPERIENCE
Big data Developer
Confidential, Virginia
Responsibilities:
- Achieved above and Confidential award for successfully implementing Big data/Hadoop.
- Documented the implementation process for Hadoop installation including authentication using Kerberos, Ranger authorizations at policy level, monitoring setup, backups etc.
- Actively involved during data ingestion from DB2 to HADOOP
- Actively involved in Hadoop upgrade project
- Played key role in Paraccel/Matrix implementation including troubleshooting, Leader node HA, Reports automation, automating backups, Boot from SAN conversion etc.
- Machine Learning using Spark ML
- Having experience on using OOZIE to define and schedule the jobs.
- Implemented Spark SQL jobs to read & analyze the data from Hive, write into HDFS/Hive.
- Having experience on Storage and Processing in Hue covering all Hadoop ecosystem components.
- Having Basic experience on using Tableau Reporting Tools.
- Involved in all stages of Software Development Life Cycle.
Environment: Hadoop/Big Data HDFS,MapReduce, HBase, Pig, Hive, Sqoop, FlumeOozie, Zookeeper, Spark, Storm, Impala Kafka, Python and SQL.
Hadoop Data Analyst/Developer
Confidential, Columbus, GA
Responsibilities:
- Involved in end to end data processing like ingestion, processing, quality checks and splitting.
- Bringing the data into Big Data Lake using Pig, Sqoop and Hive.
- Written Map Reduce job for Change Data Capture on HBASE.
- Created Hive ORC and External tables.
- Refined terabytes of data from different sources and created hive tables.
- Developed MapReduce jobs for data cleaning and preprocessing.
- Importing and exporting data into HDFS and HIVE from Oracle, Teradata databases using Sqoop.
- Responsible to manage data coming from different sources.
- Monitoring the running MapReduce jobs on the cluster using Oozie.
- Responsible for loading data from UNIX file systems into HDFS.
- Installed and configured Hive and also wrote Hive UDFs.
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Written the Oozie workflow to coordinate the Hadoop Jobs.
Environment: Scoop, Pig, Hive, Map Reduce, Java, Oozie, Eclipse, Linux, Oracle, Teradata.
Hadoop Data Analyst
Confidential, NC
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the Data utilizing BigData Technologies such as Hive, Pig, Sqoop, Hbase, Mapreduce, etc.
- Design and develop a daily process to do incremental import of raw data from Oracle into Hive tables using Sqoop.
- Experience in querying data from Hbase for lookups, grouping and sorting.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into Hive tables.
- Extensly worked with Partitions, bucketing tables in Hive and designed both Managed and External tables and also worked on optimization of Hive queries.
- Assisted analytics team in writing Pig scripts to perform further detailed analysis of the data.
- Exploring with the Spark for improving the performance and optimization of existing algorithms in Hadoop using Spark Context, SparkSql, Data frames, etc.
Environment: Cloudera CDH 5, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Hbase Spark Context, SparkSql, Data frames.
Confidential
Software Systems Analyst
Responsibilities:
- Conducted strategic IT chain management assessments based on statistical analysis, thereby improved the workflow and efficiency of client’s finance applications.
- Automated the batch recovery process of client, thereby improving the time of recovery by 30%.
- Proposed and implemented several robust workaround techniques, which resulted in an overall decline of customer incidents by 15%.
- Accurately recovered 120K customer records within 2 days during an application malfunction.
- Coordinated with onshore and offshore teams and organized weekly team meetings.
- Performed several root cause analysis of various recurring Enterprise Application related issues.
Environment: Linux (Ubuntu, CentOS, Red Hat Linux), Windows XP/7/8/10, OS X 10.11