-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature discussion: Explicit Order #114
Comments
I do not understand what you mean by "ordering". I will need to see examples. You mean "order-by" in SQL? What's wrong with ordering results afterwards? It's simple enough to add that in the query language of Datalevin. It's not a problem, just adding a But putting order in index is a bad idea, and no database does that. The reason is simple, there are unbounded number of orderings and database systems often do sorting as needed to speed up query, e.g. to allow merge joins. There's no point to store a specific order in the index. I am trying hard to minimize storage. Whatever storage budget I want to spend, I want to spend on speeding up the query. As to his blog post of Datascript 2.0, most of the concerns are either already being addressed in Datalevin, or plainly wrong from a database point of view. For example, the idea of removing query is just ridiculous. The only reason for many people to be interested in Datomic-like stores is the query language. It is at least the case for me. Without the query language, I wouldn't consider using such a thing. With a fast query engine, most of his concerns can be resolved. For example, if the query is fast, there's no problem adding an additional step to sort the results in whatever order users like. This whole business of pull API is also due to the slow query engine. If it is fast, there's no need for a pull API. Yes, it is hard to build fast query engines, but it is doable. I am making great progress here. Just finished adding two types of indices to Datalevin. The query is going to be much faster. For example, all the shown queries in the current benchmark requires no joins at all (except q5, which is too slow so we normally comment it out), as they are on the same entity type, so they should be equally fast. My goal is to be competitive to RDBMS. It is doable, just do what RDBMS does, i.e. we build virtual tables behind the scene during transaction. In any case, I have very different goals from tonsky. He's a UI guy and he built Datascript to experiment with building UI in a new way. I want to build a database in the traditional sense. I want the database to be flexible and easy to use, so that people don't have to be a relational algebra experts to use a database, i.e. to fully realize the dream of database researchers of having completely fool proof database systems. My ultimate ambition, is to realize the dream of early AI researchers, to have a powerful knowledge graph and reasoning platform, and use that as the basis of new generation of AI (no, deep learning is not going to solve AI, no matter what the hype is saying). I took Datascript as a starting point, because it is there. The most value for me is the test cases. Without that, I wouldn't have attempted to start the project. I hope this much is clear. |
@huahaiy agreeing on your points that minimizing storage and fast queries are crucial. To clarify, what I mean by ordering is an explicit order, e.g. in the EAV sense: Todo List (E) Items (A) todo-items (V) – todo items require an order. Currently order is often added as an attribute on the todo-item itself. This has at least two problems:
While the above are UI examples I can imagine explicit ordering to be useful in AI use cases as well. The main question is how does explicit ordering of nodes work in a graph where those nodes can reappear under different parent nodes? Currently, I believe, there's no straightforward solution to this in Clojure's flavor of datalog databases. |
These ordering problems will not be solved by adding order in the index. They can be solved by allowing users to supply an Pagination is a different problem. The query engine can be enhanced to hold some state to handle that, i.e. to have a transducer friendly API. We are addressing that in KV API (waiting for the PR from @ieugen), the same can be done with query API as well. Just modify the syntax a bit or use a different |
Thanks for clarifying! |
Hope you don't mind a "UI guy" jumping in here. I've been using
I've been solving this by wrapping the 'todo' in another entity for each parent. Makes sense for my use cases, since order is not the only attribute that is location-dependent. Wholeheartedly agree with @huahaiy about Tonsky's query comments. However, I will say that One thing I'd like is ref support in heterogenous tuples. Any thoughts on that? Finally, @huahaiy, following your work with great interest and excitement. Thanks! |
We will need to support tuple first, then we can think about handle ref in them. Thanks for the suggestion. |
@tonsky recently published a post of ideas for a DataScript 2.0. One of the topics is lack of ordering in DataScript which equally affects datalevin. One of the suggestions is to create an order index.
I imagine this would look like
EAVO
- Entity Attribute Value Order such that order can be set on the relation-level (EAV
) such thatref
values can reappear in other relations with different order. @ccorcos suggested fractional indexing to set and update order.@huahaiy while I'm not clear yet what the API might like and what the tradeoffs are I'm curious to know is this something you'd be open to having in datalevin?
The text was updated successfully, but these errors were encountered: