Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document automated checks under o!TR/Data Processing/Automated Checks #30

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/images/co23-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 6 additions & 1 deletion docs/otr.tree
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
<toc-element topic="Initial-Ratings.md"/>
<toc-element topic="Match-Cost.md"/>
</toc-element>
<toc-element topic="Score-Modifications.md"/>
<toc-element topic="Score-Modifications.md">
<toc-element topic="Automated-Checks.md"/>
</toc-element>
<toc-element topic="Team.md"/>
<toc-element topic="Contact.md"/>
<toc-element topic="Contributions.md">
Expand All @@ -27,6 +29,9 @@
<toc-element topic="API-Configuration.md"/>
<toc-element topic="Code-Quality.md"/>
</toc-element>
<toc-element topic="Related-Services.md">
<toc-element topic="DataWorkerService.md"/>
</toc-element>
</toc-element>
<toc-element topic="o-TR-Database.md">
<toc-element topic="Database-Setup.md"/>
Expand Down
207 changes: 207 additions & 0 deletions docs/topics/Automated-Checks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Automated Checks

The [](DataWorkerService.md) has numerous responsibilities, one of them being a data processing step known as automated checks. These checks are responsible for identifying, flagging, and in some cases fixing various discrepancies found in our data.

## Flow

Automation checks are performed in the following order.

```Mermaid
flowchart LR;
GameScore --> Game --> Match --> Tournament
```

> This allows the parent entities to have the context of how their children fared during the automated checks process.
>

### Tournament

```Mermaid
flowchart TD;
A[Is the count of PreVerified and/or Verified matches > 0?]
B[Apply NoVerifiedMatches flag to RejectionReason]
C[Is this count >= 80% of the total match count?]
D[Apply NotEnoughVerifiedMatches flag to RejectionReason]
PreTerm[Is the RejectionReason null?]
TermPositive[Change VerificationStatus to PreVerified]
TermNegative[Change VerificationStatus to PreRejected]

A -- No --> B --> PreTerm
A -- Yes --> C
C -- No --> D --> PreTerm
C -- Yes --> PreTerm
PreTerm -- Yes --> TermPositive
PreTerm -- No --> TermNegative
```

### Match

```Mermaid
flowchart TD;
A[Is the count of games > 2?]
B[Do any games besides the first 2 have a
RejectionReason of BeatmapNotPooled?]
C[Apply UnexpectedBeatmapsFound to WarningFlags]
D[Is the EndTime property equal to
2007-09-17-00:00:00?]
E[Apply NoEndTime flag to RejectionReason]
H[Is the match name structured in a typical
format?]
I[Apply UnexpectedNameFormat to WarningFlags]
J[Does the match name start with the tournament's
abbreviation?]
K[Apply NamePrefixMismatch flag to RejectionReason]
L[Is the tournament's lobby size equal to 1?]
M[Are the games structured in a way which supports
conversion to TeamVS?]
N[Attempt to convert a full set of Head to Head games to TeamVS]
O[Apply FailedTeamVsConversion flag to RejectionReason, repeat
for all games]
P[Convert all games to TeamVS, mark all games as PreVerified]

F[Is the count of games equal to 0?]
G[Apply NoGames flag to RejectionReason]
Q[What is the count of PreVerified and/or Verified games?]
Q1[0]
Q2[1 to 3]
Q3[4 or 5]
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
Q4[&gt;5]
Q_A[Apply NoValidGames flag to RejectionReason]
Q_B[Apply UnexpectedGameCount flag to RejectionReason]
Q_C[Apply LowGameCount to WarningFlags]

PreTerm[Is the RejectionReason null?]
TermPositive[Change VerificationStatus to PreVerified]
TermNegative[Change VerificationStatus to PreRejected]

A -- Yes --> B
B -- Yes --> C --> D
A -- No --> D
B -- No --> D
D -- Yes --> E --> H
D -- No --> H

H -- No --> I --> J
H -- Yes --> J
J -- No --> K --> L
J -- Yes --> L
L -- No --> F
L -- Yes --> M
M -- Yes --> N
M -- No --> F
N -- Fail --> O --> F
N -- Success --> P --> F
F -- No --> Q
F -- Yes --> G
Q --> Q1
Q --> Q2
Q --> Q3
Q --> Q4
Q1 --> Q_A --> PreTerm
Q2 --> Q_B --> PreTerm
Q3 --> Q_C --> PreTerm
Q4 --> PreTerm
PreTerm -- Yes --> TermPositive
PreTerm -- No --> TermNegative
```

### Game

```Mermaid
flowchart TD;
A[Is the beatmap null?]
B[Is there a known mappool for the tournament?]
C[Of all games in the tournament, is the beatmap
used exactly once?]
D[Apply BeatmapUsedOnce to WarningFlags]
E[Is the beatmap in the known mappool for the tournament?]
F[Apply BeatmapNotPooled flag to RejectionReason]
G[Is the EndTime property equal to
2007-09-17-00:00:00?]
H[Apply NoEndTime flag to RejectionReason]
I[Are invalid mods present at the game level?]
J[Apply InvalidMods flag to RejectionReason]
K[Does the ruleset match the tournament's ruleset?]
L[Apply RulesetMismatch flag to RejectionReason]
M[Is the count of scores 0?]
N[Apply NoScores flag to RejectionReason]
O[Is the count of PreVerified and/or Verified scores 0?]
P[Apply NoValidScores flag to RejectionReason]
Q[Is the count of PreVerified and/or Verified scores
twice that of the tournament's LobbySize?]
R[Apply LobbySizeMismatch flag to RejectionReason]
S[Is the ScoringType ScoreV2?]
T[Apply InvalidScoringType flag to RejectionReason]
U[Is the TeamType TeamVs?]
V[Apply InvalidTeamType flag to RejectionReason]
PreTerm[Is the RejectionReason null?]
TermPositive[Change VerificationStatus to PreVerified]
TermNegative[Change VerificationStatus to PreRejected]

A -- Yes --> G
A -- No --> B
B -- Yes --> C
B -- No --> G
C -- Yes --> D --> E
C -- No --> E
E -- Yes --> G
E -- No --> F --> G
G -- Yes --> H --> I
G -- No --> I
I -- Yes --> J --> K
I -- No --> K
K -- Yes --> M
K -- No --> L --> M
M -- Yes --> N --> S
M -- No --> O
O -- Yes --> P --> S
O -- No --> Q
Q -- Yes --> S
Q -- No --> R --> S
S -- No --> T --> U
S -- Yes --> U
U -- No --> V --> PreTerm
U -- Yes --> PreTerm
PreTerm -- Yes --> TermPositive
PreTerm -- No --> TermNegative
```

### GameScore

```Mermaid
flowchart TD;
A[Is the score value > 1,000?]
B[Apply ScoreBelowMinimum flag to RejectionReason]
C[Does the score contain invalid mods?]
D[Apply InvalidMods flag to RejectionReason]
E[Does the ruleset match the tournament's ruleset?]
F[Apply RulesetMismatch flag to RejectionReason]
PreTerm[Is the RejectionReason null?]
TermPositive[Change VerificationStatus to PreVerified]
TermNegative[Change VerificationStatus to PreRejected]

A -- No --> B --> C
A -- Yes --> C
C -- Yes --> D --> E
C -- No --> E
E -- Yes --> PreTerm
E -- No --> F --> PreTerm
PreTerm -- Yes --> TermPositive
PreTerm -- No --> TermNegative
```

## FAQ

### How can a human manually mark all entities as `Verified`?

Most of the manual review process occurs at the `Match` and `Game` levels. For example, if a `Match` has too many invalid games, it will be marked as `PreRejected` and require manual intervention. The same is true for `Game`s.

For `GameScore` entities, there are concrete rules which determine whether it should be `PreRejected`, for example if the `Score` value is below the minimum (and thus very likely comes from a referee in the lobby or other anomaly). It is rare for a `GameScore`'s `PreRejected` flag to be manually overturned.

Additionally, a web interface exists which allows reviewers to mark an entity - and all of its children - as `Verified` or `Rejected`. Generally speaking, if at a glance everything is marked as `PreVerified`, little effort is required to manually approve these submissions as they can be approved in one click.

### In what cases should a human reviewer override a `PreRejected` status?

One example of where this should happen is [Corsace Open 2023](https://osu.ppy.sh/community/forums/topics/1794106?n=1). This tournament has numerous matches marked as `PreRejected` by the system due to not having matches which consistently use the same prefix. This is a case in which the human reviewer should manually override the system's `PreRejected` status (assuming the `RejectionReason`s are of type `MatchRejectionReason.NamePrefixMismatch`).

![CleanShot 2024-11-29 at 09.12.53@2x.png](../images/co23-example.png)
2 changes: 1 addition & 1 deletion docs/topics/Contributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
59 changes: 59 additions & 0 deletions docs/topics/DataWorkerService.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# DataWorkerService

The [DataWorkerService](https://github.com/osu-tournament-rating/otr-api/tree/master/DataWorkerService) is a program which is part of
the [](o-TR-API.md) repository. This program is a service which continuously polls the database for items which need to be processed. This includes the following processes:

* Fetching match data from the [osu! API](https://osu.ppy.sh/docs/index.html)
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
* Fetching historical player data from the [osu!track API](https://github.com/Ameobea/osutrack-api)
* Running [automated checks](Automated-Checks.md) against tournament data
* Performing stat calculations, such as match cost and forming placements (required by the [](o-TR-Processor.md)).
hburn7 marked this conversation as resolved.
Show resolved Hide resolved

## Core Principles

This system is designed with the following principals in mind:
hburn7 marked this conversation as resolved.
Show resolved Hide resolved

1. Human reviewers have authority over whether an entity is `Verified` or `Rejected`. As such, the system will never automatically assign these designations.
2. The automatic application of the `PreRejected` status must be as accurate as possible, based on concrete rules.
3. The process must be as transparent as possible. As such, the system tracks all changes to entities in the `audit` tables. Additionally, all entities have a `RejectionReason` enum which defines a combination of reasons why it was marked as rejected by either the system or human reviewer.
4. Do not include entities which are not `Verified` in the tournament rating algorithm.
* Even with manually submitted data, humans make mistakes. If unverified data is introduced, users may notice invalid stats or inaccurate rating calculations, and it may be easier for bad actors to influence the algorithm.

## Entities

The following entities are part of this processing pipeline:

* `Tournaments`
* `Matches`
* `Games`
* `GameScores`

## Statuses

Each entity has `VerificationStatus`, `ProcessingStatus`, and `RejectionReason` fields. These fields are referenced and changed by the DataWorkerService as they move through the processing flow.

### `VerificationStatus`

Each entity shares the same `VerificationStatus` type. This type contains the following statuses:

* `None`: The entity has yet to be processed automatically.
* `PreRejected`: Based on the system's rules, this entity should be rejected.
* `PreVerified`: The system did not find anything wrong, awaiting human review.
* `Rejected`: A human marked this entity as rejected.
* `Verified`: A human marked this entity as verified.

### `ProcessingStatus`

Each entity has a unique `ProcessingStatus` type associated with it. This flag is self-explanatory: it indicates how far along an entity is in the processing pipeline.

For example, consider `TournamentProcessingStatus`:

1. `NeedsApproval`: The tournament is submitted but awaiting approval from a verifier.
2. `NeedsMatchData`: Match data needs to be fetched via the osu! API.
3. `NeedsAutomationChecks`: The tournament, and all of its children, are awaiting automation checks.
4. `NeedsVerification`: The tournament has been checked and is awaiting human review.
5. `NeedsStatCalculation`: After human review, process statistics (must be complete before it is eligible for inclusion in the rating system).
6. `Done`: Processing is completed. `Verified` tournaments with this status are eligible for inclusion in the rating system.

### `RejectionReason`

Each entity has a custom `RejectionReason` type with various flags which may cause it to be marked as `PreRejected`. Flags can be combined with each other to form a set of reasons. For example, a `Game` could be marked as `PreRejected` by the system due to `NoScores` and `BeatmapNotPooled`.
2 changes: 1 addition & 1 deletion docs/topics/Development.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ In contrast to both ETX and SIP, o!TR intentionally does not take map difficulty

### Open-source code

o!TR is open source, compliant with osu! tournament [filtering rules](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding), and communicates all algorithm changes publicly. This means it can be used for [filtering](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding) in badged tournaments. We aim to have an open, compliant tool that meets the osu! community's high transparency standards.
o!TR is open source, compliant with osu! tournament [filtering rules](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding), and communicates all algorithm changes publicly. This means it can be used for [filtering and seeding](https://osu.ppy.sh/wiki/en/Tournaments/Official_support#registrant-filtering-and-seeding) in badged tournaments. We aim to have an open, compliant tool that meets the osu! community's high transparency standards.

### Support for other rulesets

Expand Down
4 changes: 4 additions & 0 deletions docs/topics/Related-Services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

# Related Services

<toc depth="1" />
6 changes: 1 addition & 5 deletions docs/topics/Score-Modifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,4 @@ This data includes:

The o!TR team manipulates raw match data in the following ways:

* Multiplies all score values that have the [`EZ` modifier](https://osu.ppy.sh/wiki/en/Gameplay/Game_modifier/Easy) by **1.75x**.

## Removal of user information
hburn7 marked this conversation as resolved.
Show resolved Hide resolved

At this time, the o!TR team does not provide a mechanism for removing a user's information. Users who close or [delete their osu! account](https://osu.ppy.sh/wiki/en/Help_centre/Account#account-deletion) may have their data automatically removed or anonymized from our systems without any formal request to us.
* Multiplies all score values that have the [`EZ` modifier](https://osu.ppy.sh/wiki/en/Gameplay/Game_modifier/Easy) by **1.75x**.
7 changes: 4 additions & 3 deletions docs/topics/o-TR-API.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# o!TR API

The [o!TR API](https://github.com/osu-tournament-rating/otr-api) is a [.NET](https://dotnet.microsoft.com/en-us/) project that is comprised of two applications:
The [o!TR API](https://github.com/osu-tournament-rating/otr-api) is a [.NET](https://dotnet.microsoft.com/en-us/) project that is comprised of multiple applications:

* o!TR API
* o!TR Data Processor
* [](DataWorkerService.md)
* OsuApiClient

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Database.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ See the [Getting Started](Getting-Started.md) section for setup.

## Contents

<toc depth="10" />
<toc depth="1" />
8 changes: 4 additions & 4 deletions docs/topics/o-TR-Processor.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# o!TR Processor

The [o!TR Processor](https://github.com/osu-tournament-rating/otr-processor) is a [Rust](https://www.rust-lang.org/) program that serves two purposes:
*This page is under construction!*

The [o!TR Processor](https://github.com/osu-tournament-rating/otr-processor) is a [Rust](https://www.rust-lang.org/) program that accomplishes the following:

1. Processes all match data from the [o!TR API](o-TR-API.md) for:
* Leaderboard recalculation
* Rating calculation
* Host of the algorithm by which ratings are affected
hburn7 marked this conversation as resolved.
Show resolved Hide resolved
2. (Coming soon) Host a private processing API to be used by the [o!TR API](o-TR-API.md):
* Allows internal clients to have access to model functions, such as match predictions.

## Contents

<toc depth="10" />
<toc depth="1" />
2 changes: 1 addition & 1 deletion docs/topics/o-TR-Web.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ The repository can be viewed [here](https://github.com/osu-tournament-rating/otr

## Contents

<toc depth="10" />
<toc depth="1" />