Job ID :
28066
Company :
Internal Postings
Location :
Tampa, FL
Type :
Contract
Duration :
6 Months
Salary :
DOE
Status :
Active
Openings :
1
Posted :
24 Aug 2020
Job Seekers, Please send resumes to resumes@hireitpeople.com

Detailed Job Description:

  • 5+ years total experience in development on Bigdata, Hive & Hadoop, Spark, Scala, Python/Pyspark, AWS, and other cloud related technologies.
  • Independent/lead developer who can work with minimal supervision.
  • Solid understanding of distributed system fundamentals.
  • Solid understanding of hadoop security and familiar with kerberos/keytabs etc and hands on experience with working with Spark/Hive/Oozie/Kafka etc on a kerberized cluster.
  • Experience in developing, troubleshooting, diagnosing, and performance tuning of distributed batch & real-time data pipelines using Spark/PySpark at scale.
  • Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop)
  • Demonstrated professional experience working with various components of Big Data ecosystem: Spark/Spark Streaming, Hive, Kafka/KSQL, Hadoop (or similar NoSQL ecosystem) and orchestrate these pipelines using oozie, et. al, in a production system.
  • Construct data staging layers and fast real-time systems to feed BI applications and machine learning algorithms.
  • Strong software engineering skills with Python or Scala/Java.
  • Knowledge of some flavor of SQL (MySQL, Oracle, Hive, Impala), including the fundamentals of data modeling and performance.
  • Skills in real-time streaming applications.
  • Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Nifi, Kafka) as well as batch modes (Sqoop).
  • Experienced in Data Engineering with good understanding of Datawarehouse, Data Lake, Data Modelling, Parsing, Data wrangling, Cleansing & Transformation, and sanitizing.
  • Agile work experience, build CI/CD pipelines using Jenkins, GIT, Artifactory, Anisble etc.
  • Hands-on Development experience with Scala, Python using Spark 2.0, Spark Internals and Spark jobs performance improvement.
  • Good understanding of Yarn, Spark UI, Spark resource management and Hadoop resource management and efficient Hadoop storage mechanisms.
  • Good understanding & experience with Performance tuning in Cloud environment for complex S/W projects mainly around large scale and low latency.
  • AWS knowledge is essential with good working experience in AWS Technologies EMR, S3, Cluster management, AWS Airflow automation, Snowflake Knowledge is plus.
  • AWS development certification/Spark certifications is an advantage.
  • Expert in data analysis in Python (Numpy, Scipy, Scikit-learn, Pandas, etc.)
  • Strong UNIX Shell scripting experience to support data warehousing solutions.
  • Process oriented, focused on standardization, streamlining, and implementation of best practices delivery. 
  • Excellent problem solving and analytical skill, excellent verbal and written communication skills.
  • Proven teamwork in multi-site/multi-geography organizations.
  • Ability to multi-task and function efficiently in a fast-paced environment.
  • Strong background in Scala or Java and experience with streaming technologies such as Flink, Kafka, Kinesis, and Firehose ,experience with EMR, Spark, Parquet, and Airflow.
  • Excellent interpersonal skills, ability to handle ambiguity and learn quickly.
  • Exposure to data architecture & governance is helpful.
  • A degree in Computer Science or a related technical field; or equivalent work experience

Minimum years of experience*: 5+