We develop a query engine for unstructured documents. Our vision is to be able to receive natural language
queries and execute them against arbitrary document repositories. Unlike Retrieval Augmented Generation (RAG)
however, in ReDD (pronounced ready) we adopt a fundamentally different approach aiming to serve a very different purpose.
Given a natural language query, we extract a schema from the data tailored to the query, and then we populate the schema with data that
will answer the query correctly. Thus we can support analytical queries fully. We translate the question to SQL and execute the SQL query against the collection of
extracted tables. Think about DeepResearch or DeepSearch, but for answering relational queries over documents with error guarantees. We are working to address many challenges.
These include:
- Natural language to SQL generation
- Ad-hoc Schema extraction given a query (may include multiple tables with constraints)
- Query optimization in this framework (e.g., data extraction versus code generation)
- Ability to abstain providing an answer and ask for help
- Conversation frameworks to inject the right knowledge to the query during execution and steer it towards correctness