We require candidates with a minimum of 4 to 5 years of experience as a Data Engineer, demonstrating a proven track record managing data-intensive projects and environments. Expertise in Retrieval-Augmented Generation (RAG) methodologies is essential as you will lead innovative data pipeline development.
Technical proficiency must include deep experience with at least one tool or technology from each of the following categories:
-
Vector Databases: Experience with Pinecone, Milvus, Qdrant, Weaviate, or equivalent solutions to manage embedding data effectively.
-
Databases: Proficiency in SQL databases such as PostgreSQL and MySQL, plus NoSQL databases including MongoDB and Cassandra.
-
Data Warehousing: Knowledge of warehouse platforms such as Redshift, BigQuery, or Snowflake to support analytical processing.
-
ETL/ELT Tools: Skilled in dbt, Apache NiFi, Talend, or Informatica for robust data transformation workflows.
-
Big Data Frameworks: Experience with Apache Spark and Kafka for handling large-scale data streams.
-
Search Engines: Familiarity with Elasticsearch, Solr, or OpenSearch to implement advanced search capabilities.
-
Cloud Platforms: Working knowledge of AWS, Google Cloud Platform, or Azure for scalable cloud infrastructure.
-
Containers: Proficient with Docker to containerize and deploy applications reliably.
-
Programming: Advanced skills in Python for scripting, automation, and integration.
Strong analytical aptitude and problem-solving capabilities are vital to address complex data challenges. We seek candidates who thrive in agile, cross-functional teams and possess excellent communication and collaboration skills to work effectively across different roles and disciplines.