Job Seekers, Please send resumes to resumes@hireitpeople.com
Must have skills:
- Hadoop
- Python
- Spark or PySpark
Detailed Job Description:
- 3+ years of experience – using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work.
- Advanced SQL experience.
- 3+ years of programming experience in Python, PySpark, Spark for data processing, ingestion and analysis.
- 5+ years of experience in processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs).
- Be detail oriented.
- Have excellent communication skills (verbal and written).
- Be able to manage multiple priorities and meet deadlines.
- Have a Degree in Statistics, Economics, Business, Mathematics, Computer Science or related field.
- Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
- Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
- Manage and implement data processes (Data Quality reports).
- Develop data profiling, deduping logic, and matching logic for analysis.
- Present ideas and recommendations on Hadoop and other technologies best use to management.
Experience required: 5 Years