Job ID :
29875
Company :
Internal Postings
Location :
Mclean, VA
Type :
Contract
Duration :
6 Months
Salary :
DOE
Status :
Active
Openings :
1
Posted :
25 Feb 2021
Job Seekers, Please send resumes to resumes@hireitpeople.com

Detailed Job Description:

  • 3 years of each: Python, PySpark, and Strong SQL.
  • Ability to work in a UNIX environment.
  • 5+ years of experience in processing large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS, JSONs, PDFs).
  • 3+ years of experience – using Hadoop platform and performing analysis. Familiarity with Hadoop cluster environment and configurations for resource management for analysis work.
  • Detail oriented. Excellent communication skills (verbal and written) this person might have a few hours of meetings per day.
  • Must be able to manage multiple priorities and meet deadlines.
  • Degree in Computer Science, Statistics, Economics, Business, Mathematics or related field.

Job Responsibilities:

  • Cleanse, manipulate and analyze large datasets (Structured and Unstructured data – XMLs, JSONs, PDFs) using Hadoop platform.
  • Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
  • Be able to build Dashboards in R/Shiny for end user consumption.
  • Manage and implement data processes (Data Quality reports).
  • Develop data profiling, deduping logic, matching logic for analysis.
  • Programming Languages experience in Python, PySpark and Spark for data ingestion.
  • Programming experience in BigData platform using Hadoop platform.
  • Present ideas and recommendations on Hadoop and other technologies best use to management.

Experience required: 5 Years