aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md37
1 files changed, 37 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..870bf34
--- /dev/null
+++ b/README.md
@@ -0,0 +1,37 @@
+# Scholscan
+
+Filters academic articles using TF-IDF on titles plus logistic regression.
+
+## Build
+```
+go build -o scholscan .
+```
+
+## Usage
+```
+# Train model from articles you like
+./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json
+
+# Score new RSS feed
+./scholscan scan --url RSS_URL --model model.json > results.jsonl
+
+# Run web server
+./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt
+```
+
+## Endpoints
+
+- GET `/` - redirect to live feed
+- GET `/live-feed` - filtered articles web UI
+- GET `/tools` - score individual articles
+- POST `/score` - API for scoring titles
+- POST `/scan` - API for scanning RSS
+- GET `/api/filtered/feed` - JSON feed
+- GET `/api/filtered/rss` - RSS feed
+- GET `/api/health` - health check
+
+## Model settings
+
+- TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8
+- Logistic regression: λ=0.001, L2 regularization
+- Class balancing: downsample majority to 1:1 ratio \ No newline at end of file