Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Validation #16

Open
ainsleys opened this issue Mar 26, 2020 · 16 comments
Open

Data Validation #16

ainsleys opened this issue Mar 26, 2020 · 16 comments
Labels
help wanted Extra attention is needed

Comments

@ainsleys
Copy link
Contributor

ainsleys commented Mar 26, 2020

Users who download their location data from Google Takeout may submit data from the wrong dates, in the wrong format, or may attempt to submit empty data.

The client application should be able to do some measure of local data validation and prep for transmission to the SafeTrace server (prior to data leaving the user's device).
This will probably look like:
JSON file ---> some validation ---> data prep (throwing away irrelevant fields) --> cleaned and validated JSON file

This is an open issue, we'd love help on it from anyone who wants to dig into the google takeout location history json and coordinate requirements with @lacabra . This may already be something anyone developing a separate app / mobile app is working on, get in touch!

EDIT: This does not address users intentionally submitting wrong data for any reason. That is a valid concern but is out of scope at this time.

@ainsleys ainsleys added the help wanted Extra attention is needed label Mar 26, 2020
@m1ghtfr3e
Copy link

Hi, I could help programming the Backend, but not the full app/web page

@cankisagun
Copy link
Contributor

@m1ghtfr3e
We would be happy to get your help in web service backend (user + password mgmt) for users to login and interact with SafeTrace API.

Let us know if you want to start implementing a web page DB. Happy to jump on a call to discuss in more detail

@cankisagun
Copy link
Contributor

This is also important to keep the data size down and ensure that TEE can handle as many users as possible

@shenrene
Copy link
Contributor

This is also important to keep the data size down and ensure that TEE can handle as many users as possible

I agree with this. The size of these files is massive, and we only need last 14-21 days of data.

@shenrene
Copy link
Contributor

@lacabra I have been looking at this myself. Happy to help out.

@m1ghtfr3e
Copy link

@cankisagun
sorry i couldn't be online the last days...
very disappointing. how i can help specifically? would do it asap if possible

@ainsleys
Copy link
Contributor Author

ainsleys commented Mar 28, 2020

@shenrene take a look at the .json file that is produced in "Semantic Location History" folder when you use google takeout to download your location history. AFAIK, we would want a script that basically takes in that .json file, and puts out only placevisits in the last 14 days
placevisit:


    "placeVisit" : {
      "location" : {
        "latitudeE7" : 392107052,
        "longitudeE7" : -769362430,
        "placeId" : "ChIJNR94nNzYt4kR1kmg98tgX8A",
        "address" : "6050 Daybreak Cir\nClarksville, MD 21029\nUSA",
        "name" : "River Hill Village Center",
        "sourceInfo" : {
          "deviceTag" : 856889161
        },
        "locationConfidence" : 99.586555
      },
      "duration" : {
        "startTimestampMs" : "1583191245000",
        "endTimestampMs" : "1583196383999"
      },

whereas post-processing, the json file should look like this (which is what the API expects to receive)

{
		"lat": 40.757339,
		"lng": -73.985992,
		"startTS": 1583064000,
		"endTS": 1583067600
	},

note--- those are diff location, this is just as a format example. The first is a place near me from my google takeout, the 2nd is the input sample @lacabra provides here: https://github.com/enigmampc/SafeTrace/tree/master/api-server under Data Specification

@shenrene
Copy link
Contributor

@ainsleys I have the code ready for review. can you add me as a collaborator, I can't seem to create a branch

@ainsleys
Copy link
Contributor Author

Hey @shenrene I'm so sorry I missed this. Working on it now.

@ainsleys
Copy link
Contributor Author

@shenrene issue opened here: #28
Let's take this there and figure it out, so we can leave this issue for data validation convo. Thanks!

@ainsleys
Copy link
Contributor Author

Thanks to @shenrene we now have a script that converts google takeout to our api endpoint format, and only uses the last 14 days of user data.

https://github.com/enigmampc/SafeTrace/blob/master/client/scripts/last_14d_visits.py

@cankisagun
Copy link
Contributor

@shenrene this code needs to run on the client side, which means it needs to be in javascript. Can you help us with that?

@francomendoza
Copy link

@cankisagun @shenrene if you need assistance with the JS version of the transformation/validation I can help. I presume the idea is after the user selects the file on the client you want to validate/reduce data prior to sending to the server?

@shenrene
Copy link
Contributor

shenrene commented Apr 3, 2020

I submitted a pull request #40

@ainsleys
Copy link
Contributor Author

ainsleys commented Apr 3, 2020

@francomendoza that is correct! @shenrene has submitted a PR addressing this already, which I just merged now. you can check it out in https://github.com/enigmampc/SafeTrace/blob/master/client/scripts/last_14d_visits.js

@ainsleys ainsleys mentioned this issue Apr 3, 2020
4 tasks
@ainsleys
Copy link
Contributor Author

ainsleys commented Apr 3, 2020

@alaaltoros when you make the required changes to this code (intake local file), can open a PR for the new code and move this issue from "Review" to "Done"? Thank you! cc @cankisagun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants