Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt Graphryder to the ethical consent funnel results #35

Open
albertocottica opened this issue Dec 20, 2017 · 13 comments
Open

Adapt Graphryder to the ethical consent funnel results #35

albertocottica opened this issue Dec 20, 2017 · 13 comments

Comments

@albertocottica
Copy link
Member

The opencare platform acquires user consent through something called the consent funnel. This malfunctioned, and in summer 2017 we set out to try and fix the problem. Recaps and results are here.

What needs to happen now:

  1. We provide you with a list of consentless users. A discussion is ongoing, but it should be fast.
  2. This list is used in the harvesting script as a filter to drop those users. I am not sure how to do it: the cleaner possibility is to check in the database that the consent has been given. In this case, all information is contained in the database, with no outside information.
  3. This also means dropping the content authored by these authors, and the annotations on that content.

Based on this new process, we then proceed to regenerate the dashboard and produce the export.

@guywiz
Copy link

guywiz commented Jan 10, 2018

Putting our hands on the data is feasible from the GraphRyder API, so we can already form a bunch of json files (following the same patron as we did last year: users.json, posts.json, comments.json, etc.-). One we have the list of people who oppose publication of their content, we can amend these files.
Just to make sure I get things right: we need to discard those users and the content they authored. Also, we should include a short readme file explaining what data we publish (and what we don't).

@albertocottica
Copy link
Member Author

albertocottica commented Jan 10, 2018 via email

@albertocottica
Copy link
Member Author

API endpoint ready.

As per what Marco says here, both the exported data and the actual Graphryder should be based only on those users who have actively given consent.
To find them, call up this endpoint (needs API authentication):
https://edgeryders.eu/administration/annotator/users.json

and select users for which "edgeryders_consent" = "1"

@guywiz
Copy link

guywiz commented Jan 22, 2018

(Echoing a comment from Report part B) Coming back to you on the data export we still need to do.

On user identity.

  • Anonymization is hard. A soft approach is to simply display users as ids (numbers). (At least those who do not wish to be identified as having participated to the conversations.)

On authored content.

  • Do we simply discard any content authored by these users as well. This can have dramatic impact on the SSNA analysis, since it will break conversations (A replying to B replying to C, if B is taken out we lose the indirect link from A to C).

P.S. Do we have a list of user ids we need to "discard"?

@albertocottica
Copy link
Member Author

albertocottica commented Jan 29, 2018

@guywiz :

On anonymization: that's a negative. See here. There should be Python libraries which do SHA-256 lying around.

On authored content. Correct, but nothing we can do. Unless you want to keep the social network but discard the semantics, which would be more work.

On "a list of users". We do it in a cleaner way, with calls to the API. See here. The database needs updating (from our end) as Noemi and I have managed to get a few more people to give consent. She has been off the loop, but I think she is back at the house today. You can of course start writing the export code; I will let you know when the new information is incorporated into the dataset. At that point you run your script and it is done.

@guywiz
Copy link

guywiz commented Jan 29, 2018

Followed the instructions on the Edgeryders' API page but didn't see any "Generate key" button under Permissions ... (I do not have Admin priviledge, but do I need it?)

Also, reading the page it seems I will be able to access all content, even those published by user who didn't give consent so it's up to me to filter out those users and content. That means GraphRyder will also need to filter content -- which for now it does not ... Asking @jason-vallet

@albertocottica
Copy link
Member Author

albertocottica commented Jan 29, 2018 via email

@guywiz
Copy link

guywiz commented Jan 29, 2018

Yep, precisely. My question to @jason-vallet was whether this is what GraphRyder does already, or whether we need to adapt the underlying script.

The script can access everything, but it can then copy onto Neo4j only the content that (1) relates to the ethno-opencare Discourse tag and (2) was authored by users for which "edgeryders-consent" = "1" .

@albertocottica
Copy link
Member Author

albertocottica commented Jan 29, 2018 via email

@jason-vallet
Copy link

Hey guys, I actually had pushed all the necessary modifications last week but had actually forgotten to update the database (silly me!).

Basically, I had three possible solutions to address this whole issue:

  1. Remove from GR the content for which we do not have an explicit consent.
    Plain and simple, sure, but a disaster for the whole network as complete posts (those whose authors did not gave consent) and comments were removed as well as their corresponding annotations and codes.
    We ended up with: 230 users, 577 posts, 2571 comments, 4621 annotations and 1165 codes.

  2. The second solution a little less extreme was to keep all the pieces of content written by persons which did not give their consent but obfuscate them.
    Basically, all the authors are replaced by a single anonymous user who is considered as the creator of the contents. The titles and bodys of the posts and comments are also obfuscated, as well as the specific pieces of text annotated. The codes are still attached to the posts/comments but we do not know how exactly. This has a several advantages as the code-to-code relations are still preserved as we know them, and the obfuscated content, while not readable on the GR platform, can still be accessed using the hyperlinks referring to EdgeRyders (access to the content is authorised on the ER website). On the bad side, this screws the social network as all the authors of obfuscated content are considered as a single anonymous user.
    Result: 245 users, 659 Posts, 3248 comments, 5625 annotations, and 1282 codes.

  3. The last solution is retrospectively the most straightforward and the one currently being deployed.
    Authors which did not give their consent are anonymous (username is not displayed), but we keep track of which obfuscated content they have authored. The pros of the previous solution are still valid while removing the issue concerning the social network.
    Result: 337 users, 659 posts, 3248 comments, 5625 annotations, and 1282 codes.

So a user will see content which as not been cleared as follows:

image

and the social network still present a logical structure:

image

Ultimately, it is still possible to find what anybody has written on subject, but this knowledge is only available when going through ER which comply with the TOS thus freeing us from the consent issue.

@guywiz
Copy link

guywiz commented Feb 3, 2018

Please @albertocottica validate solution 3.
Also, this means the data we need to publish on zenodo can be downloaded from GR (no need to write extraction script). Please @jason-vallet confirm (also indicate whether I need special privilege to download the data).

@albertocottica
Copy link
Member Author

albertocottica commented Feb 3, 2018

It makes sense to me, but we should clear it with Marco, who is in charge of ethics. Also, we should still pseudonymize everyone, not just the consentless users.

Matthias tells me he will update the database of the consent funnel today. We have acquired consent from 6 more users.

@albertocottica
Copy link
Member Author

Marco has asked for time. Meanwhile, Matt has updated the database. The data are now complete, at least on the Edgeryders database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants