diff options
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 37 |
1 files changed, 37 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..870bf34 --- /dev/null +++ b/README.md @@ -0,0 +1,37 @@ +# Scholscan + +Filters academic articles using TF-IDF on titles plus logistic regression. + +## Build +``` +go build -o scholscan . +``` + +## Usage +``` +# Train model from articles you like +./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json + +# Score new RSS feed +./scholscan scan --url RSS_URL --model model.json > results.jsonl + +# Run web server +./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt +``` + +## Endpoints + +- GET `/` - redirect to live feed +- GET `/live-feed` - filtered articles web UI +- GET `/tools` - score individual articles +- POST `/score` - API for scoring titles +- POST `/scan` - API for scanning RSS +- GET `/api/filtered/feed` - JSON feed +- GET `/api/filtered/rss` - RSS feed +- GET `/api/health` - health check + +## Model settings + +- TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8 +- Logistic regression: λ=0.001, L2 regularization +- Class balancing: downsample majority to 1:1 ratio
\ No newline at end of file |
