Merge pull request #20 from atrifat/feat-topic-classification

Feat Topic Classification
atrifat · Jul 30, 2024 · b83e040 · b83e040
2 parents a2c625b + d51781e
commit b83e040
Show file tree

Hide file tree

Showing 5 changed files with 127 additions and 23 deletions.
diff --git a/.env.example b/.env.example
@@ -4,24 +4,36 @@ NODE_ENV=production
 ENABLE_NSFW_CLASSIFICATION=true
 NSFW_DETECTOR_ENDPOINT=http://localhost:8082/predict
 NSFW_DETECTOR_TOKEN=
+
 ENABLE_LANGUAGE_DETECTION=true
 LANGUAGE_DETECTOR_ENDPOINT=http://localhost:5000/detect
 LANGUAGE_DETECTOR_TOKEN=
 LANGUAGE_DETECTOR_TRUNCATE_LENGTH=350
+
 ENABLE_HATE_SPEECH_DETECTION=true
 # (Required if ENABLE_HATE_SPEECH_DETECTION == true) set this to your own hate-speech-detector-api instance (https://github.com/atrifat/hate-speech-detector-api)
 HATE_SPEECH_DETECTOR_ENDPOINT=http://localhost:8083/predict
-# (Optional) set this to your own hate-speech-detector-api api_key if required 
+# (Optional) set this to your own hate-speech-detector-api api_key if required
 HATE_SPEECH_DETECTOR_TOKEN=
 # (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
 HATE_SPEECH_DETECTOR_TRUNCATE_LENGTH=350
+
 ENABLE_SENTIMENT_ANALYSIS=true
 # (Required if ENABLE_SENTIMENT_ANALYSIS == true) set this to your own sentiment-analysis-api instance (https://github.com/atrifat/sentiment-analysis-api)
 SENTIMENT_ANALYSIS_ENDPOINT=http://localhost:8084/predict
-# (Optional) set this to your own sentiment-analysis-api api_key if required 
+# (Optional) set this to your own sentiment-analysis-api api_key if required
 SENTIMENT_ANALYSIS_TOKEN=
 # (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
 SENTIMENT_ANALYSIS_TRUNCATE_LENGTH=350
+
+ENABLE_TOPIC_CLASSIFICATION=true
+# (Required if ENABLE_TOPIC_CLASSIFICATION == true) set this to your own topic-classification-api instance (https://github.com/atrifat/topic-classification-api)
+TOPIC_CLASSIFICATION_ENDPOINT=http://localhost:8085/predict
+# (Optional) set this to your own topic-classification-api api_key if required
+TOPIC_CLASSIFICATION_TOKEN=
+# (Default: 350) Set to 0 if you don't want to truncate text, or set to any positive number to truncate the text characters
+TOPIC_CLASSIFICATION_TRUNCATE_LENGTH=350
+
 # (Required for classification filtering)
 NOSTR_MONITORING_BOT_PRIVATE_KEY=
 RELAYS_SOURCE=wss://relay.nostr.band,wss://relay.damus.io,wss://nos.lol,wss://relay.mostr.pub
@@ -40,23 +52,33 @@ WHITELISTED_PUBKEYS=
 LISTEN_PORT=7860
 # (Optional) Set true to enable forwarding of request headers to upstream server, useful if relays behind reverse proxy
 ENABLE_FORWARD_REQ_HEADERS=false
+
 # (Optional. Default: sfw. Options: all, sfw, partialsfw, and nsfw) Filter hate speech (toxic comment).
 DEFAULT_FILTER_CONTENT_MODE=sfw
 # (Optional. Default: 75, Options: 0-100) Default minimum probability/confidence score to determine the classification of nsfw content
 DEFAULT_FILTER_NSFW_CONFIDENCE=75
+
 # (Optional. Default: all. Multiple Options: all, or other language code)
 DEFAULT_FILTER_LANGUAGE_MODE=all
 # (Optional. Default: 15. Options: 0-100) Default minimum probability/confidence score to determine the classification of language
 DEFAULT_FILTER_LANGUAGE_CONFIDENCE=15
+
 # (Optional. Default: no. Options: all, no, yes) Filter hate speech (toxic comment). "all" will disable filtering, "no" will filter out any detected hate speech content, "yes" will select only detected hate speech content
 DEFAULT_FILTER_HATE_SPEECH_TOXIC_MODE=no
 # (Optional. Default: 75. Options: 0-100) Default minimum probability/confidence score to determine the classification of hate speech (toxic comment)
 DEFAULT_FILTER_HATE_SPEECH_TOXIC_CONFIDENCE=75
 # (Optional. Default: max. Options: max, sum) Methods to determine toxic content by using max value from all toxic classes score or sum value of all toxic classes score
 DEFAULT_FILTER_HATE_SPEECH_TOXIC_EVALUATION_MODE=max
+
 # (Optional. Default: all, Multiple Options: all,negative,neutral,positive) Multiple options separated by comma (eg: neutral,positive => filter to get both neutral and positive sentiment)
 DEFAULT_FILTER_SENTIMENT_MODE=all
 # (Optional. Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of sentiment
 DEFAULT_FILTER_SENTIMENT_CONFIDENCE=35
+
+# (Default: all, Multiple Options: list of valid topic in atrifat/nostr-filter-relay Github) Multiple options separated by comma (eg: life,music,sport,science_and_technology => filter to get life (short version of: diaries_and_life), music, sport, science_and_technology)
+DEFAULT_FILTER_TOPIC_MODE=all
+# (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of topic
+DEFAULT_FILTER_TOPIC_CONFIDENCE=35
+
 # (Optional. Default: all. Options: all, nostr, activitypub) Filter user type. "nostr" for native nostr users and "activitypub" for activitypub users coming from bridge
-DEFAULT_FILTER_USER_MODE=all
+DEFAULT_FILTER_USER_MODE=all
diff --git a/Dockerfile b/Dockerfile
@@ -1,4 +1,4 @@
-ARG DENO_VERSION=1.45.0
+ARG DENO_VERSION=1.45.3
 FROM debian:bookworm AS builder_strfry
 
 WORKDIR /builder
@@ -27,7 +27,7 @@ RUN apt update -y && \
     rm -rf /var/lib/apt/lists/*
 
 # Prepare nostr-filter
-ENV NOSTR_FILTER_COMMIT_HASH_VERSION=c7f955e491fa268d2abf9feb4a31f989ef76438b
+ENV NOSTR_FILTER_COMMIT_HASH_VERSION=4d719299b88203754b952809e4f937e6fd66fc34
 ENV NOSTR_FILTER_BRANCH=main
 RUN git clone --branch $NOSTR_FILTER_BRANCH https://github.com/atrifat/nostr-filter && \
     cd /builder/nostr-filter && \
@@ -36,7 +36,7 @@ RUN git clone --branch $NOSTR_FILTER_BRANCH https://github.com/atrifat/nostr-fil
     npm ci --omit=dev && npx tsc
 
 # Prepare nostr-monitoring-tool
-ENV NOSTR_MONITORING_TOOL_VERSION=v0.5.0
+ENV NOSTR_MONITORING_TOOL_VERSION=v0.6.0
 RUN git clone --depth 1 --branch $NOSTR_MONITORING_TOOL_VERSION https://github.com/atrifat/nostr-monitoring-tool && \
     cd /builder/nostr-monitoring-tool && \
     npm ci --omit=dev
@@ -102,6 +102,10 @@ ENV DEFAULT_FILTER_HATE_SPEECH_TOXIC_EVALUATION_MODE=max
 ENV DEFAULT_FILTER_SENTIMENT_MODE=all
 # (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of sentiment
 ENV DEFAULT_FILTER_SENTIMENT_CONFIDENCE=35
+# (Default: all, Multiple Options: list of valid topic in atrifat/nostr-filter-relay Github) Multiple options separated by comma (eg: life,music,sport,science_and_technology => filter to get life (short version of: diaries_and_life), music, sport, science_and_technology)
+ENV DEFAULT_FILTER_TOPIC_MODE=all
+# (Default: 35, Options: 0-100) Default minimum probability/confidence score in percentage to determine the classification of topic
+ENV DEFAULT_FILTER_TOPIC_CONFIDENCE=35
 # (Default: all, Options: all, nostr, activitypub) Filter user type. "nostr" for native nostr users and "activitypub" for activitypub users coming from bridge
 ENV DEFAULT_FILTER_USER_MODE=all
 
@@ -125,6 +129,11 @@ ENV SENTIMENT_ANALYSIS_ENDPOINT=
 ENV SENTIMENT_ANALYSIS_TOKEN=
 ENV SENTIMENT_ANALYSIS_TRUNCATE_LENGTH=350
 
+ENV ENABLE_TOPIC_CLASSIFICATION=true
+ENV TOPIC_CLASSIFICATION_ENDPOINT=
+ENV TOPIC_CLASSIFICATION_TOKEN=
+ENV TOPIC_CLASSIFICATION_TRUNCATE_LENGTH=350
+
 ENV NOSTR_MONITORING_BOT_PRIVATE_KEY=
 ENV RELAYS_SOURCE=
 ENV RELAYS_TO_PUBLISH=ws://127.0.0.1:7777

diff --git a/README.md b/README.md
@@ -1,18 +1,18 @@
 # nostr-filter-relay
 
-A nostr relay docker image package which filter content based on content type (SFW/NSFW), user type, language, hate speech (toxic comment), sentiment, and various rules.
+A [Nostr](https://github.com/nostr-protocol/nostr) relay docker image package which filter content based on content type (SFW/NSFW), user type, language, hate speech (toxic comment), sentiment, topic, and various rules.
 
 This docker image consists of several software packages:
 
 - [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter) (customized/fork of [imksoo/nostr-filter](https://github.com/imksoo/nostr-filter)) as frontend filter relay
 - [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) as content classification tool
 - [hoytech/strfry](https://github.com/hoytech/strfry) as backend relay
 
-Several dependencies including [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api), [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api) are required depend on the features that were enabled.
+Several dependencies including [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api), [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api), [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api) are required depend on the features that were enabled.
 
 ## Demo
 
-A public demo (beta/test) instance is available on [wss://nfrelay.app](wss://nfrelay.app) or [ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion](ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion) (TOR Onion Hidden Service).
+A public demo (beta/test) relay is available on [wss://nfrelay.app](wss://nfrelay.app) or [ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion](ws://nfrelay6saohkmipikquvrn6d64dzxivhmcdcj4d5i7wxis47xwsriyd.onion) (TOR Onion Hidden Service).
 
 ## Usage
 
@@ -27,15 +27,15 @@ A relay software package that filter note (kind: 1) contents in various category
 - [x] User type filtering (Nostr user/non bridged user, activitypub bridged user)
 - [x] Hate speech (Toxic comment) detection
 - [x] Sentiment analysis
-- [ ] (WIP) Topic classification
+- [x] Topic classification
 - [x] All other features included in [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter) and [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool)
 
 ## How it works
 
 ![nostr-filter-relay-flowchart](resources/flowchart-nostr-filter-relay.png)
 
 1. **nostr-filter-relay** is docker image that will run several softwares: [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool), [atrifat/nostr-filter](https://github.com/atrifat/nostr-filter), and [hoytech/strfry](https://github.com/atrifat/nostr-filter) relay in launch script at startup.
-2. **nostr-monitoring-tool** is classification tool that fetch and subscribe notes (kind: 1) from various relays. It will process every notes (extraction of image url, text preprocessing) that were seen and send them into external AI classification tool. Currently, it will send processed notes content into NSFW Detector API instance (using [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api)), Language Detector API instance (using [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate)), Hate Speech Detector API instance (using [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api)), and Sentiment Analysis API instance (using [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api)). All four API services will give classification results (SFW/NSFW classification, Language classfication, Toxic classification, Sentiment Analysis) that will be saved as **custom kind 9978** in local strfry relay that has already been running. Data format is shown in [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) repository.
+2. **nostr-monitoring-tool** is classification tool that fetch and subscribe notes (kind: 1) from various relays. It will process every notes (extraction of image url, text preprocessing) that were seen and send them into external AI classification tool. Currently, it will send processed notes content into NSFW Detector API instance (using [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api)), Language Detector API instance (using [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate)), Hate Speech Detector API instance (using [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api)), Sentiment Analysis API instance (using [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api)), and Topic Classification API instance (using [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api)). All five API services will give classification results (SFW/NSFW classification, Language classfication, Toxic classification, Sentiment Analysis, Topic Classification) that will be saved as **custom kind 9978** in local strfry relay that has already been running. Data format is shown in [atrifat/nostr-monitoring-tool](https://github.com/atrifat/nostr-monitoring-tool) repository.
 
    Basic Data flow:
    **Source Relays (notes) -> nostr-monitoring-tool (connect to external API for classification) -> local strfry**
@@ -56,6 +56,7 @@ The following softwares are required if you want to run your own nostr-filter-re
 - Personal instance of [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate). Check [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate) Github repository for more instructions.
 - Personal instance of [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api). Check [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api) Github repository for more instructions.
 - Personal instance of [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api). Check [atrifat/sentiment-analysis-api](https://github.com/atrifat/hate-speech-detector-api) Github repository for more instructions.
+- Personal instance of [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api). Check [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api) Github repository for more instructions.
 
 ## Getting Started
 
@@ -66,7 +67,7 @@ git clone https://github.com/atrifat/nostr-filter-relay
 cd nostr-filter-relay
 ```
 
-Before running nostr-filter-relay, make sure you have already configured your own personal instance of [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), and [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api). You don't have to run all of them only if you enable classification for certain task (Example: NSFW detection only).
+Before running nostr-filter-relay, make sure you have already configured your own personal instance of [atrifat/nsfw-detector-api](https://github.com/atrifat/nsfw-detector-api), [atrifat/language-detector-api](https://github.com/atrifat/language-detector-api) or [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate), [atrifat/hate-speech-detector-api](https://github.com/atrifat/hate-speech-detector-api), [atrifat/sentiment-analysis-api](https://github.com/atrifat/sentiment-analysis-api), and [atrifat/topic-classification-api](https://github.com/atrifat/topic-classification-api). You don't have to run all of them only if you enable classification for certain task (Example: NSFW detection only).
 
 Copy `.env.example` into `.env` file and change the configuration according to your own settings.
 
@@ -106,6 +107,15 @@ or run it in the background (daemon):
 docker run --init --env-file .env -p 7860:7860 -it --name nostr-filter-relay -d nostr-filter-relay
 ```
 
+## Support
+
+Development of nostr-filter-relay has been supported by [OpenSats - Fifth Nostr Grant - July 2024](https://opensats.org/blog/nostr-grants-july-2024).
+
+You can also support this project by:
+
+- ⭐ Starring the repo, reporting issue, or sending the pull requests.
+- ⚡️ Sending some sats or paying my tea to my lightning address: [rifat@getalby.com](lightning:rifat@getalby.com)
+
 ## License
 
 MIT License
@@ -133,10 +143,3 @@ SOFTWARE.
 ## Author
 
 Rif'at Ahdi Ramadhani (atrifat)
-
-## Support
-
-You can support this project by:
-
-- ⭐ Starring the repo, reporting issue, and sending the pull requests.
-- ⚡️ Sending some sats to my lightning address: [rifat@getalby.com](lightning:rifat@getalby.com)