Global view SafeTrace algorithm #36

cankisagun · 2020-03-30T22:44:24Z

This issue builds on issue #2

In order to achieve a certain level of privacy, we propose to run a lightweight clustering algorithm inside the SafeTrace secure DB and passing results to the maps API. This prevents any specific user data to leak to the map API.

The clustering algorithm should return << lan, lat, weight >> to the maps API for each data point to be represented on the map

Next steps:

Determine how to account for high risk symptoms in the global view algorithm (Suggested approach: overlay both positive covid19 and high risk symptoms on the same map with different colors)
Determine clustering logic based on sample data size (i.e. create 100 clusters for 10K users vs. 5 clusters for 10 users)
Assess Differential Privacy options

UPDATE: The clustering algorithm runs inside the enclave similar to individual reporting algorithm, which can be found here

theoturner · 2020-04-01T12:33:49Z

I agree on the positive vs high risk in different colors. While there is correlation, I expect that if you drew the curves, you'd see changes in high risk frequency for a given area not to be in the same time step as positive cases. Visualise it, and you may find a leading indicator relationship (I'd consider the first and second derivatives wrt time). You might even be able to develop a predictor using a regression model.
The clustering algorithm is simple: duplicate points according to weight, the run k-means (easy with cogset in Rust).
Choosing the number of clusters is not an easy problem to solve, and generally is difficult to do without a human present. Silhouetting is the most straightforward way to automate cluster number choice, though you can also run through many numbers of clusters and minimise MSE while not having too many clusters (elbow method). These approaches typically take a lot of effort to program reliably, and a human observing the data based on context tends to be the better option. I would allow the user to input the number of clusters, and maybe display a number selected by yourselves by default.
In terms of preserving privacy, it's pretty easy: output cluster centroids, rather than the data points.

lacabra · 2020-04-03T16:58:48Z

Thanks @theoturner for your outline and proposed solution. It makes sense at a high level. Given that I sense your familiarity with Rust, would you be in a position to submit a PR to introduce these improvements?

Alternatively, you could post small snippets of code as per your comments to this thread, and someone else could "glue" them together in fully functioning code and submit a PR instead.

Thoughts?

theoturner · 2020-04-06T10:11:55Z

K-means in Rust for lat/long:

We assume that lat/long is representative of a flat plane. For most cases this is fine, particularly smaller surface areas (@Akhil325 could potentialy comment further). I've personally never had problems with this abstraction and clustering, and given the ongoing dispute about 2D projections of the globe, It's likely not worth your time to consider projections.

Assume you have locations:

struct Location {
    latitude: f64, // Or whatever numerical type
    longitude: f64,
}

Cluster (assume argument num_clusters e.g user-supplied - I'd advise this before attempting silhouetting):

use cogset::{Euclid, Kmeans};
let mut eucvec: Vec<Euclid<_>> = Vec::new();
for point in &locations {
    eucvec.push(Euclid([point.latitude as f64, point.longitude as f64]));
}
let kmeans = Kmeans::new(&eucvec, num_clusters as usize);
let clusters = kmeans.clusters();

Outputting just the centroids:

let mut clustvec: Vec<(f64, f64)> = Vec::new();
for result in &clusters {
    clustvec.push(((result.0).0[0], (result.0).0[1]));
} 
format!("{:?}", clustvec)

theoturner · 2020-04-06T10:19:24Z

Drawing it on Google Maps

I also see you've linked an issue re: google maps. Cluster data is relatively easy to draw on maps, e.g. in React.

First, let's create a drawable var of our points, which you could extract from your props, e.g. say you had prop clusters imported from your Rust:

const Point = ({ text }) => <div>{text}</div>;
var points = [];
for (var i = 0; i < this.props.clusters.length; i++) {
    points.push(<Point
        lat={this.props.clusters[i][0]}
        lng={this.props.clusters[i][1]}
        text="X"
    />);
}

Then, draw a google map:

import GoogleMapReact from 'google-map-react';
<GoogleMapReact
bootstrapURLKeys={{ key: process.env.REACT_APP_API_KEY }}
defaultCenter={{ lat: 51.507221, lng: -0.127600}}
defaultZoom={11}
{points}
</GoogleMapReact>

theoturner · 2020-04-06T10:24:25Z

For more details, see my implementation of clustering + maps for Enigma, APEX. In this implementation, the contract only stores ints and uses enigma macros such as eformat!, there's some input sanitation, and the google map is drawn with a custom theme.

cankisagun mentioned this issue Mar 30, 2020

Post-MVP work #37

Closed

cankisagun mentioned this issue Apr 3, 2020

Google map API for individual reporting #45

Open

cankisagun added Enclave rust high priority help wanted Extra attention is needed labels Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global view SafeTrace algorithm #36

Global view SafeTrace algorithm #36

cankisagun commented Mar 30, 2020 •

edited

Loading

theoturner commented Apr 1, 2020

lacabra commented Apr 3, 2020

theoturner commented Apr 6, 2020

theoturner commented Apr 6, 2020

theoturner commented Apr 6, 2020

Global view SafeTrace algorithm #36

Global view SafeTrace algorithm #36

Comments

cankisagun commented Mar 30, 2020 • edited Loading

theoturner commented Apr 1, 2020

lacabra commented Apr 3, 2020

theoturner commented Apr 6, 2020

K-means in Rust for lat/long:

theoturner commented Apr 6, 2020

Drawing it on Google Maps

theoturner commented Apr 6, 2020

cankisagun commented Mar 30, 2020 •

edited

Loading