Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document automated checks under o!TR/Data Processing/Automated Checks #30

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions docs/redirection-rules.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,4 @@
<description>Created after removal of "Setup" from osu! Tournament Rating</description>
<accepts>Setup.html</accepts>
</rule>
<rule id="4f4b2c10">
<description>
<![CDATA[Created after removal of "Related Tools & Services" from osu! Tournament Rating]]></description>
<accepts>DataWorkerService.html</accepts>
</rule>
</rules>
70 changes: 9 additions & 61 deletions docs/topics/Automated-Checks.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,6 @@
# Automated Checks

The [](DataWorkerService.md) has numerous responsibilities, one of them being a data processing step known as automated
checks. These checks are responsible for processing various portions of data depending on the current processing step
for a particular piece of data.

## Core Principals

When designing this system, we did so with the following principles in mind:

1. Human reviewers have authority over whether an entity is `Verified` or `Rejected`. As such, the system will never automatically assign these designations.
2. The automatic application of the `PreRejected` status must be as accurate as possible, based on concrete rules.
3. The process must be as transparent as possible. As such, the system tracks all changes to entities in the `audit` tables. Additionally, all entities have a `RejectionReason` enum which defines a combination of reasons why it was marked as rejected by either the system or human reviewer.
4. Do not include entities which are not `Verified` in the tournament rating algorithm.
* This provides an added benefit of ensuring all generated statistics are valid. Even with manually submitted data, humans make mistakes. If unverified data is introduced into the rating & statistics systems, users will notice invalid statistics and the rating ladder itself will not be completely accurate.

## Entities

The following entities are part of this processing pipeline:

* `Tournaments`
* `Matches`
* `Games`
* `GameScores`

## Statuses

Each entity has `VerificationStatus`, `ProcessingStatus`, and `RejectionReason` fields. These fields are referenced and changed by the DataWorkerService as they move through the processing flow.

### `VerificationStatus`

Each entity shares the same `VerificationStatus` type. This type contains the following statuses:

* `None`: The entity has yet to be processed automatically.
* `PreRejected`: Based on the system's rules, this entity should be rejected.
* `PreVerified`: The system did not find anything wrong, awaiting human review.
* `Rejected`: A human marked this entity as rejected.
* `Verified`: A human marked this entity as verified.

### `ProcessingStatus`

Each entity has a unique `ProcessingStatus` type associated with it. This flag is self-explanatory: it indicates how far along an entity is in the processing pipeline.

For example, consider `TournamentProcessingStatus`:

1. `NeedsApproval`: The tournament is submitted but waiting approval from a verifier.
2. `NeedsMatchData`: Match data needs to be fetched via the osu! API.
3. `NeedsAutomationChecks`: The tournament, and all of its children, are awaiting automation checks.
4. `NeedsVerification`: Awaiting human review
5. `NeedsStatCalculation`: After human review, process statistics (must be complete before it is eligible for inclusion in the rating system).
6. `Done`: Processing is completed. `Verified` tournaments with this status are eligible for inclusion in the rating system.

### `RejectionReason`

Each entity has a custom `RejectionReason` type with various flags which may cause it to be marked as `PreRejected`. Flags can be combined with each other to form a set of reasons. For example, a `Game` could be marked as `PreRejected` by the system due to `NoScores` and `BeatmapNotPooled`.
The [](DataWorkerService.md) has numerous responsibilities, one of them being a data processing step known as automated checks. These checks are responsible for identifying, flagging, and in some cases fixing various discrepancies found in our data.

## Flow

Expand All @@ -63,14 +11,14 @@ flowchart LR;
GameScore --> Game --> Match --> Tournament
```

> This allows the parent entities to have the context of how their children faired during the automated checks process.
> This allows the parent entities to have the context of how their children fared during the automated checks process.
>

### Tournament

```Mermaid
flowchart TD;
A[Is the count of PreVerified and/or Verified matches >= 0?]
A[Is the count of PreVerified and/or Verified matches > 0?]
B[Apply NoVerifiedMatches flag to RejectionReason]
C[Is this count >= 80% of the total match count?]
D[Apply NotEnoughVerifiedMatches flag to RejectionReason]
Expand Down Expand Up @@ -108,14 +56,14 @@ flowchart TD;
conversion to TeamVS?]
N[Attempt to convert a full set of Head to Head games to TeamVS]
O[Apply FailedTeamVsConversion flag to RejectionReason, repeat
for all child games]
for all games]
P[Convert all games to TeamVS, mark all games as PreVerified]

F[Is the count of games equal to 0?]
G[Apply NoGames flag to RejectionReason]
Q[What is the count of PreVerified and/or Verified games?]
Q1[0]
Q2[1 or 2]
Q2[1 to 3]
Q3[4 or 5]
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
Q4[&gt;5]
Q_A[Apply NoValidGames flag to RejectionReason]
Expand Down Expand Up @@ -180,7 +128,7 @@ flowchart TD;
O[Is the count of PreVerified and/or Verified scores 0?]
P[Apply NoValidScores flag to RejectionReason]
Q[Is the count of PreVerified and/or Verified scores
half that of the tournament's LobbySize?]
twice that of the tournament's LobbySize?]
R[Apply LobbySizeMismatch flag to RejectionReason]
S[Is the ScoringType ScoreV2?]
T[Apply InvalidScoringType flag to RejectionReason]
Expand Down Expand Up @@ -246,11 +194,11 @@ flowchart TD;

### How can a human manually mark all entities as `Verified`?

Most of the issues which require manual intervention are at the `Match` and `Game` levels. For example, if a `Match` has too many invalid games, it will be marked as `PreRejected` and require manual intervention. The same is true for `Game`s.
Most of the manual review process occurs at the `Match` and `Game` levels. For example, if a `Match` has too many invalid games, it will be marked as `PreRejected` and require manual intervention. The same is true for `Game`s.

For `GameScore` entities, there are very concrete rules which can easily determine whether it should be `Rejected`, for example if the `Score` value is below the minimum.
For `GameScore` entities, there are concrete rules which determine whether it should be `PreRejected`, for example if the `Score` value is below the minimum (and thus very likely comes from a referee in the lobby or other anomaly). It is rare for a `GameScore`'s `PreRejected` flag to be manually overturned.

We also have a web interface which allows reviewers to mark an entity - and all of its children - as `Verified` or `Rejected`. Generally speaking, if at a glance everything is marked as `PreVerified`, very little effort is required to manually approve these submissions. If the opposite is true, it's likely that the submission contains invalid data.
Additionally, a web interface exists which allows reviewers to mark an entity - and all of its children - as `Verified` or `Rejected`. Generally speaking, if at a glance everything is marked as `PreVerified`, little effort is required to manually approve these submissions as they can be approved in one click.

### In what cases should a human reviewer override a `PreRejected` status?

Expand Down
2 changes: 1 addition & 1 deletion docs/topics/Contributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
58 changes: 56 additions & 2 deletions docs/topics/DataWorkerService.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,59 @@
# DataWorkerService

These docs are currently under construction!
The [DataWorkerService](https://github.com/osu-tournament-rating/otr-api/tree/master/DataWorkerService) is a program which is part of
the [](o-TR-API.md) repository. This program is a service which continuously polls the database for items which need to be processed. This includes the following processes:

<toc depth="10" />
* Fetching match data from the [osu! API](https://osu.ppy.sh/docs/index.html)
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
* Fetching historical player data from the [osu!track API](https://github.com/Ameobea/osutrack-api)
* Running [automated checks](Automated-Checks.md) against tournament data
* Performing stat calculations, such as match cost and forming placements (required by the [](o-TR-Processor.md)).
hburn7 marked this conversation as resolved.
Show resolved Hide resolved

## Core Principles

This system is designed with the following principals in mind:
hburn7 marked this conversation as resolved.
Show resolved Hide resolved

1. Human reviewers have authority over whether an entity is `Verified` or `Rejected`. As such, the system will never automatically assign these designations.
2. The automatic application of the `PreRejected` status must be as accurate as possible, based on concrete rules.
3. The process must be as transparent as possible. As such, the system tracks all changes to entities in the `audit` tables. Additionally, all entities have a `RejectionReason` enum which defines a combination of reasons why it was marked as rejected by either the system or human reviewer.
4. Do not include entities which are not `Verified` in the tournament rating algorithm.
* Even with manually submitted data, humans make mistakes. If unverified data is introduced, users may notice invalid stats or inaccurate rating calculations, and it may be easier for bad actors to influence the algorithm.

## Entities

The following entities are part of this processing pipeline:

* `Tournaments`
* `Matches`
* `Games`
* `GameScores`

## Statuses

Each entity has `VerificationStatus`, `ProcessingStatus`, and `RejectionReason` fields. These fields are referenced and changed by the DataWorkerService as they move through the processing flow.

### `VerificationStatus`

Each entity shares the same `VerificationStatus` type. This type contains the following statuses:

* `None`: The entity has yet to be processed automatically.
* `PreRejected`: Based on the system's rules, this entity should be rejected.
* `PreVerified`: The system did not find anything wrong, awaiting human review.
* `Rejected`: A human marked this entity as rejected.
* `Verified`: A human marked this entity as verified.

### `ProcessingStatus`

Each entity has a unique `ProcessingStatus` type associated with it. This flag is self-explanatory: it indicates how far along an entity is in the processing pipeline.

For example, consider `TournamentProcessingStatus`:

1. `NeedsApproval`: The tournament is submitted but awaiting approval from a verifier.
2. `NeedsMatchData`: Match data needs to be fetched via the osu! API.
3. `NeedsAutomationChecks`: The tournament, and all of its children, are awaiting automation checks.
4. `NeedsVerification`: The tournament has been checked and is awaiting human review.
5. `NeedsStatCalculation`: After human review, process statistics (must be complete before it is eligible for inclusion in the rating system).
6. `Done`: Processing is completed. `Verified` tournaments with this status are eligible for inclusion in the rating system.

### `RejectionReason`

Each entity has a custom `RejectionReason` type with various flags which may cause it to be marked as `PreRejected`. Flags can be combined with each other to form a set of reasons. For example, a `Game` could be marked as `PreRejected` by the system due to `NoScores` and `BeatmapNotPooled`.
2 changes: 1 addition & 1 deletion docs/topics/Development.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ In contrast to both ETX and SIP, o!TR intentionally does not take map difficulty

### Open-source code

o!TR is open source, compliant with osu! tournament [filtering rules](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding), and communicates all algorithm changes publicly. This means it can be used for [filtering](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding) in badged tournaments. We aim to have an open, compliant tool that meets the osu! community's high transparency standards.
o!TR is open source, compliant with osu! tournament [filtering rules](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding), and communicates all algorithm changes publicly. This means it can be used for [filtering and seeding](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding) in badged tournaments. We aim to have an open, compliant tool that meets the osu! community's high transparency standards.

### Support for other rulesets

Expand Down
2 changes: 1 addition & 1 deletion docs/topics/Related-Services.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@

# Related Services

Start typing here...
<toc depth="1" />
7 changes: 4 additions & 3 deletions docs/topics/o-TR-API.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# o!TR API

The [o!TR API](https://github.com/osu-tournament-rating/otr-api) is a [.NET](https://dotnet.microsoft.com/en-us/) project that is comprised of two applications:
The [o!TR API](https://github.com/osu-tournament-rating/otr-api) is a [.NET](https://dotnet.microsoft.com/en-us/) project that is comprised of multiple applications:

* o!TR API
* o!TR Data Processor
* [](DataWorkerService.md)
* OsuApiClient

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Database.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ See the [Getting Started](Getting-Started.md) section for setup.

## Contents

<toc depth="10" />
<toc depth="1" />
8 changes: 4 additions & 4 deletions docs/topics/o-TR-Processor.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# o!TR Processor

The [o!TR Processor](https://github.com/osu-tournament-rating/otr-processor) is a [Rust](https://www.rust-lang.org/) program that serves two purposes:
*This page is under construction!*

The [o!TR Processor](https://github.com/osu-tournament-rating/otr-processor) is a [Rust](https://www.rust-lang.org/) program that accomplishes the following:

1. Processes all match data from the [o!TR API](o-TR-API.md) for:
* Leaderboard recalculation
* Rating calculation
* Host of the algorithm by which ratings are affected
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
2. (Coming soon) Host a private processing API to be used by the [o!TR API](o-TR-API.md):
* Allows internal clients to have access to model functions, such as match predictions.

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Web.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ The repository can be viewed [here](https://github.com/osu-tournament-rating/otr

## Contents

<toc depth="10" />
<toc depth="1" />