Skip to content

Commit

Permalink
Merge pull request #20 from atrifat/feat-topic-classification
Browse files Browse the repository at this point in the history
Feat Topic Classification
  • Loading branch information
atrifat authored Jul 30, 2024
2 parents a2c625b + d51781e commit b83e040
Show file tree
Hide file tree
Showing 5 changed files with 127 additions and 23 deletions.
28 changes: 25 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,36 @@ NODE_ENV=production
ENABLE_NSFW_CLASSIFICATION=true
NSFW_DETECTOR_ENDPOINT=http://localhost:8082/predict
NSFW_DETECTOR_TOKEN=

ENABLE_LANGUAGE_DETECTION=true
LANGUAGE_DETECTOR_ENDPOINT=http://localhost:5000/detect
LANGUAGE_DETECTOR_TOKEN=
LANGUAGE_DETECTOR_TRUNCATE_LENGTH=350

ENABLE_HATE_SPEECH_DETECTION=true
# (Required if ENABLE_HATE_SPEECH_DETECTION == true) set this to your own hate-speech-detector-api instance (https://github.com/atrifat/hate-speech-detector-api)
HATE_SPEECH_DETECTOR_ENDPOINT=http://localhost:8083/predict
# (Optional) set this to your own hate-speech-detector-api api_key if required
# (Optional) set this to your own hate-speech-detector-api api_key if required
HATE_SPEECH_DETECTOR_TOKEN=
# (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
HATE_SPEECH_DETECTOR_TRUNCATE_LENGTH=350

ENABLE_SENTIMENT_ANALYSIS=true
# (Required if ENABLE_SENTIMENT_ANALYSIS == true) set this to your own sentiment-analysis-api instance (https://github.com/atrifat/sentiment-analysis-api)
SENTIMENT_ANALYSIS_ENDPOINT=http://localhost:8084/predict
# (Optional) set this to your own sentiment-analysis-api api_key if required
# (Optional) set this to your own sentiment-analysis-api api_key if required
SENTIMENT_ANALYSIS_TOKEN=
# (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
SENTIMENT_ANALYSIS_TRUNCATE_LENGTH=350

ENABLE_TOPIC_CLASSIFICATION=true
# (Required if ENABLE_TOPIC_CLASSIFICATION == true) set this to your own topic-classification-api instance (https://github.com/atrifat/topic-classification-api)
TOPIC_CLASSIFICATION_ENDPOINT=http://localhost:8085/predict
# (Optional) set this to your own topic-classification-api api_key if required
TOPIC_CLASSIFICATION_TOKEN=
# (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
TOPIC_CLASSIFICATION_TRUNCATE_LENGTH=350

# (Required for classification filtering)
NOSTR_MONITORING_BOT_PRIVATE_KEY=
RELAYS_SOURCE=wss://relay.nostr.band,wss://relay.damus.io,wss://nos.lol,wss://relay.mostr.pub
Expand All @@ -40,23 +52,33 @@ WHITELISTED_PUBKEYS=
LISTEN_PORT=7860
# (Optional) Set true to enable forwarding of request headers to upstream server, useful if relays behind reverse proxy
ENABLE_FORWARD_REQ_HEADERS=false

# (Optional. Default: sfw. Options: all, sfw, partialsfw, and nsfw) Filter hate speech (toxic comment).
DEFAULT_FILTER_CONTENT_MODE=sfw
# (Optional. Default: 75, Options: 0-100) Default minimum probability/confidence score to determine the classification of nsfw content
DEFAULT_FILTER_NSFW_CONFIDENCE=75

# (Optional. Default: all. Multiple Options: all, or other language code)
DEFAULT_FILTER_LANGUAGE_MODE=all
# (Optional. Default: 15. Options: 0-100) Default minimum probability/confidence score to determine the classification of language
DEFAULT_FILTER_LANGUAGE_CONFIDENCE=15

# (Optional. Default: no. Options: all, no, yes) Filter hate speech (toxic comment). "all" will disable filtering, "no" will filter out any detected hate speech content, "yes" will select only detected hate speech content
DEFAULT_FILTER_HATE_SPEECH_TOXIC_MODE=no
# (Optional. Default: 75. Options: 0-100) Default minimum probability/confidence score to determine the classification of hate speech (toxic comment)
DEFAULT_FILTER_HATE_SPEECH_TOXIC_CONFIDENCE=75
# (Optional. Default: max. Options: max, sum) Methods to determine toxic content by using max value from all toxic classes score or sum value of all toxic classes score
DEFAULT_FILTER_HATE_SPEECH_TOXIC_EVALUATION_MODE=max

# (Optional. Default: all, Multiple Options: all,negative,neutral,positive) Multiple options separated by comma (eg: neutral,positive => filter to get both neutral and positive sentiment)
DEFAULT_FILTER_SENTIMENT_MODE=all
# (Optional. Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of sentiment
DEFAULT_FILTER_SENTIMENT_CONFIDENCE=35

# (Default: all, Multiple Options: list of valid topic in atrifat/nostr-filter-relay Github) Multiple options separated by comma (eg: life,music,sport,science_and_technology => filter to get life (short version of: diaries_and_life), music, sport, science_and_technology)
DEFAULT_FILTER_TOPIC_MODE=all
# (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of topic
DEFAULT_FILTER_TOPIC_CONFIDENCE=35

# (Optional. Default: all. Options: all, nostr, activitypub) Filter user type. "nostr" for native nostr users and "activitypub" for activitypub users coming from bridge
DEFAULT_FILTER_USER_MODE=all
DEFAULT_FILTER_USER_MODE=all
15 changes: 12 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG DENO_VERSION=1.45.0
ARG DENO_VERSION=1.45.3
FROM debian:bookworm AS builder_strfry

WORKDIR /builder
Expand Down Expand Up @@ -27,7 +27,7 @@ RUN apt update -y && \
rm -rf /var/lib/apt/lists/*

# Prepare nostr-filter
ENV NOSTR_FILTER_COMMIT_HASH_VERSION=c7f955e491fa268d2abf9feb4a31f989ef76438b
ENV NOSTR_FILTER_COMMIT_HASH_VERSION=4d719299b88203754b952809e4f937e6fd66fc34
ENV NOSTR_FILTER_BRANCH=main
RUN git clone --branch $NOSTR_FILTER_BRANCH https://github.com/atrifat/nostr-filter && \
cd /builder/nostr-filter && \
Expand All @@ -36,7 +36,7 @@ RUN git clone --branch $NOSTR_FILTER_BRANCH https://github.com/atrifat/nostr-fil
npm ci --omit=dev && npx tsc

# Prepare nostr-monitoring-tool
ENV NOSTR_MONITORING_TOOL_VERSION=v0.5.0
ENV NOSTR_MONITORING_TOOL_VERSION=v0.6.0
RUN git clone --depth 1 --branch $NOSTR_MONITORING_TOOL_VERSION https://github.com/atrifat/nostr-monitoring-tool && \
cd /builder/nostr-monitoring-tool && \
npm ci --omit=dev
Expand Down Expand Up @@ -102,6 +102,10 @@ ENV DEFAULT_FILTER_HATE_SPEECH_TOXIC_EVALUATION_MODE=max
ENV DEFAULT_FILTER_SENTIMENT_MODE=all
# (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of sentiment
ENV DEFAULT_FILTER_SENTIMENT_CONFIDENCE=35
# (Default: all, Multiple Options: list of valid topic in atrifat/nostr-filter-relay Github) Multiple options separated by comma (eg: life,music,sport,science_and_technology => filter to get life (short version of: diaries_and_life), music, sport, science_and_technology)
ENV DEFAULT_FILTER_TOPIC_MODE=all
# (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of topic
ENV DEFAULT_FILTER_TOPIC_CONFIDENCE=35
# (Default: all, Options: all, nostr, activitypub) Filter user type. "nostr" for native nostr users and "activitypub" for activitypub users coming from bridge
ENV DEFAULT_FILTER_USER_MODE=all

Expand All @@ -125,6 +129,11 @@ ENV SENTIMENT_ANALYSIS_ENDPOINT=
ENV SENTIMENT_ANALYSIS_TOKEN=
ENV SENTIMENT_ANALYSIS_TRUNCATE_LENGTH=350

ENV ENABLE_TOPIC_CLASSIFICATION=true
ENV TOPIC_CLASSIFICATION_ENDPOINT=
ENV TOPIC_CLASSIFICATION_TOKEN=
ENV TOPIC_CLASSIFICATION_TRUNCATE_LENGTH=350

ENV NOSTR_MONITORING_BOT_PRIVATE_KEY=
ENV RELAYS_SOURCE=
ENV RELAYS_TO_PUBLISH=ws://127.0.0.1:7777
Expand Down
29 changes: 16 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# nostr-filter-relay

A nostr relay docker image package which filter content based on content type (SFW/NSFW), user type, language, hate speech (toxic comment), sentiment, and various rules.
A [Nostr](https://github.com/nostr-protocol/nostr) relay docker image package which filter content based on content type (SFW/NSFW), user type, language, hate speech (toxic comment), sentiment, topic, and various rules.

This docker image consists of several software packages:

- [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter) (customized/fork of [imksoo/nostr-filter](https://github.com/imksoo/nostr-filter)) as frontend filter relay
- [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) as content classification tool
- [hoytech/strfry](https://github.com/hoytech/strfry) as backend relay

Several dependencies including [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api), [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api) are required depend on the features that were enabled.
Several dependencies including [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api), [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api), [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api) are required depend on the features that were enabled.

## Demo

A public demo (beta/test) instance is available on [wss://nfrelay.app](wss://nfrelay.app) or [ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion](ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion) (TOR Onion Hidden Service).
A public demo (beta/test) relay is available on [wss://nfrelay.app](wss://nfrelay.app) or [ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion](ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion) (TOR Onion Hidden Service).

## Usage

Expand All @@ -27,15 +27,15 @@ A relay software package that filter note (kind: 1) contents in various category
- [x] User type filtering (Nostr user/non bridged user, activitypub bridged user)
- [x] Hate speech (Toxic comment) detection
- [x] Sentiment analysis
- [ ] (WIP) Topic classification
- [x] Topic classification
- [x] All other features included in [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter) and [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool)

## How it works

![nostr-filter-relay-flowchart](resources/flowchart-nostr-filter-relay.png)

1. **nostr-filter-relay** is docker image that will run several softwares: [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool), [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter), and [hoytech/strfry](https://github.com/atrifat/nostr-filter) relay in launch script at startup.
2. **nostr-monitoring-tool** is classification tool that fetch and subscribe notes (kind: 1) from various relays. It will process every notes (extraction of image url, text preprocessing) that were seen and send them into external AI classification tool. Currently, it will send processed notes content into NSFW Detector API instance (using [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api)), Language Detector API instance (using [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate)), Hate Speech Detector API instance (using [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api)), and Sentiment Analysis API instance (using [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api)). All four API services will give classification results (SFW/NSFW classification, Language classfication, Toxic classification, Sentiment Analysis) that will be saved as **custom kind 9978** in local strfry relay that has already been running. Data format is shown in [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) repository.
2. **nostr-monitoring-tool** is classification tool that fetch and subscribe notes (kind: 1) from various relays. It will process every notes (extraction of image url, text preprocessing) that were seen and send them into external AI classification tool. Currently, it will send processed notes content into NSFW Detector API instance (using [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api)), Language Detector API instance (using [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate)), Hate Speech Detector API instance (using [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api)), Sentiment Analysis API instance (using [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api)), and Topic Classification API instance (using [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api)). All five API services will give classification results (SFW/NSFW classification, Language classfication, Toxic classification, Sentiment Analysis, Topic Classification) that will be saved as **custom kind 9978** in local strfry relay that has already been running. Data format is shown in [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) repository.

Basic Data flow:
**Source Relays (notes) -> nostr-monitoring-tool (connect to external API for classification) -> local strfry**
Expand All @@ -56,6 +56,7 @@ The following softwares are required if you want to run your own nostr-filter-re
- Personal instance of [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate). Check [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate) Github repository for more instructions.
- Personal instance of [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api). Check [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api) Github repository for more instructions.
- Personal instance of [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api). Check [atrifat/sentiment-analysis-api](https://github.com/atrifat/hate-speech-detector-api) Github repository for more instructions.
- Personal instance of [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api). Check [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api) Github repository for more instructions.

## Getting Started

Expand All @@ -66,7 +67,7 @@ git clone https://github.com/atrifat/nostr-filter-relay
cd nostr-filter-relay
```

Before running nostr-filter-relay, make sure you have already configured your own personal instance of [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), and [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api). You don't have to run all of them only if you enable classification for certain task (Example: NSFW detection only).
Before running nostr-filter-relay, make sure you have already configured your own personal instance of [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api), and [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api). You don't have to run all of them only if you enable classification for certain task (Example: NSFW detection only).

Copy `.env.example` into `.env` file and change the configuration according to your own settings.

Expand Down Expand Up @@ -106,6 +107,15 @@ or run it in the background (daemon):
docker run --init --env-file .env -p 7860:7860 -it --name nostr-filter-relay -d nostr-filter-relay
```

## Support

Development of nostr-filter-relay has been supported by [OpenSats - Fifth Nostr Grant - July 2024](https://opensats.org/blog/nostr-grants-july-2024).

You can also support this project by:

- ⭐ Starring the repo, reporting issue, or sending the pull requests.
- ⚡️ Sending some sats or paying my tea to my lightning address: [rifat@getalby.com](lightning:rifat@getalby.com)

## License

MIT License
Expand Down Expand Up @@ -133,10 +143,3 @@ SOFTWARE.
## Author

Rif'at Ahdi Ramadhani (atrifat)

## Support

You can support this project by:

- ⭐ Starring the repo, reporting issue, and sending the pull requests.
- ⚡️ Sending some sats to my lightning address: [rifat@getalby.com](lightning:rifat@getalby.com)
Loading

0 comments on commit b83e040

Please sign in to comment.