Job Seekers, Please send resumes to resumes@hireitpeople.com
Responsibilities:
- Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution.
- Build utilities, user defined functions, and frameworks to better enable data flow patterns.
- Research, evaluate and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
- Build and incorporate automated unit tests, participate in integration testing efforts.
- Work with teams to resolving operational & performance issues
- Work with architecture/engineering leads and other teams to ensure quality solutions are implements, and engineering best practices are defined and adhered to.
Qualification:
- MS/BS degree in a computer science field or related discipline
- 6+ years’ experience in large - scale software development
- 2+ year experience in Hadoop/Big Data
- Strong development skills around Hadoop, Spark, Hive
- 1+ years’ experience in workflow orchestration tools like Airflow
- Strong skills in Python, shell scripting, and SQL
- Experience with AWS components and services, particularly, EMR, S3, and Lambda
- Good understanding of file formats including Parquet, Avro, JSON and others
- Good understanding of R, TensorFlow, SAS or similar
- Experience with performance/scalability tuning, algorithms and computational complexity
- Experience (at least familiarity) with data warehousing, dimensional modeling and ETL development
- Proven ability to work cross functional teams to deliver appropriate resolution
Nice to have:
- Front end UI development experience, specifically Node JS or Angular JS
- Machine learning frameworks