idea: support high entropy matches #10

dominictarr · 2018-04-30T21:09:39Z

in secure scuttlebutt, there are many values that are high entropy - i.e. ids (both feeds and messages and blobs). Since these are essentially random, there is no reason to query them as ranges, usually they are retrived as exact queries.

for example, you could request all replies in a thread like this:

[{$filter: { value: {content: { root: <thread_id> } } } }]

this is a valid query, but would unfortunately produce a full-scan (i.e. read the entire database, a very inefficient query!).

Currently, we have indexes that match a given path, but we could also have indexes that match a given value. This index would match a particular value where ever it appears in the object. So this query would return replies and likes and backlinks, like https://github.com/ssbc/ssb-backlinks does. and then these would be filtered out (which would generally be an efficient query!). This means we could replace backlinks and do pretty much all the message queries via ssb-query.

@arj03 @mmckegg @mixmix

mixmix · 2018-04-30T21:53:00Z

Yes, and it would be great to query for "one of this set of values" Eg I want all comments that are attached to any of my blogposts is currently hard to do - I'm doing a db scan with a filter that checks blogIds.includes(msg.value.content.root) ) :

…

On Tue, 1 May 2018, 09:09 Dominic Tarr, ***@***.***> wrote: in secure scuttlebutt, there are many values that are high entropy - i.e. ids (both feeds and messages and blobs). Since these are essentially random, there is no reason to query them as ranges, usually they are retrived as exact queries. for example, you could request all replies in a thread like this: [{$filter: { value: {content: { root: <thread_id> } } } }] this is a valid query, but would unfortunately produce a full-scan (i.e. read the entire database, a very inefficient query!). Currently, we have indexes that match a given path, but we could also have indexes that match a given *value*. This index would match a particular value where ever it appears in the object. So this query would return replies and likes and backlinks, like https://github.com/ssbc/ssb-backlinks does. and then these would be filtered out (which would generally be an efficient query!). This means we could replace backlinks and do pretty much all the message queries via ssb-query. @arj03 <https://github.com/arj03> @mmckegg <https://github.com/mmckegg> @mixmix <https://github.com/mixmix> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACitnomZXj8wtbzti-Q4fzhg5G1Z4rgBks5tt32UgaJpZM4TtOem> .

dominictarr · 2018-04-30T22:06:51Z

@mixmix in the mean time, a much better way to do that would to use backlinks and http://npm.im/pull-merge to combine into one stream.

dominictarr · 2018-04-30T22:07:25Z

or pull-many if you don't care about the relative order of the substreams

mixmix · 2018-05-01T18:00:33Z

I read the README of pull-merge and pull-many and it looks like both of them merge in ways that aren't ideal ... or the examples are unclear.

e.g. I don't want all comments from BlogA, then all comments from BlogB... or one of one then one of the other... I want a stream as they happened in time so I can make an infinite scroller.

How hard would it be to add $in to map-filter-reduce e.g.

{ 
  $filter: { 
    value: { 
      content: { 
        root: { 
          $in: [ '@ye4awsdas', '@mmsasdas']
        }
      }
    }
  }
}

dominictarr · 2018-05-05T22:24:50Z

@mixmix pull-merge takes a compare function, so you can use that to interleave the messages in order you want.

dominictarr · 2018-05-05T22:25:47Z

oh, the sub streams should already be in that order, to do a streaming merge, else you need to buffer the streams and then sort them.

mixmix · 2018-05-07T20:38:53Z

One challenge is knowing the order of the substreams.i didn't know how to get the order gradients for a flumeview-level read (without doing a moving read with pull-next and sorting the chunks for example). Will have a look at backings streaming if I have time.

…

On Sun, 6 May 2018, 10:25 Dominic Tarr, ***@***.***> wrote: oh, the sub streams should already be in that order, to do a streaming merge, else you need to buffer the streams and then sort them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACitnqmlHJ_5UjrQcHJCvJATtle8gkl4ks5tvibrgaJpZM4TtOem> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: support high entropy matches #10

idea: support high entropy matches #10

dominictarr commented Apr 30, 2018

mixmix commented Apr 30, 2018 via email

dominictarr commented Apr 30, 2018

dominictarr commented Apr 30, 2018

mixmix commented May 1, 2018 •

edited

Loading

dominictarr commented May 5, 2018

dominictarr commented May 5, 2018

mixmix commented May 7, 2018 via email

idea: support high entropy matches #10

idea: support high entropy matches #10

Comments

dominictarr commented Apr 30, 2018

mixmix commented Apr 30, 2018 via email

dominictarr commented Apr 30, 2018

dominictarr commented Apr 30, 2018

mixmix commented May 1, 2018 • edited Loading

dominictarr commented May 5, 2018

dominictarr commented May 5, 2018

mixmix commented May 7, 2018 via email

mixmix commented May 1, 2018 •

edited

Loading