blob: 870bf34976e88d61137c1b215e338106032afda4 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
# Scholscan
Filters academic articles using TF-IDF on titles plus logistic regression.
## Build
```
go build -o scholscan .
```
## Usage
```
# Train model from articles you like
./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json
# Score new RSS feed
./scholscan scan --url RSS_URL --model model.json > results.jsonl
# Run web server
./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt
```
## Endpoints
- GET `/` - redirect to live feed
- GET `/live-feed` - filtered articles web UI
- GET `/tools` - score individual articles
- POST `/score` - API for scoring titles
- POST `/scan` - API for scanning RSS
- GET `/api/filtered/feed` - JSON feed
- GET `/api/filtered/rss` - RSS feed
- GET `/api/health` - health check
## Model settings
- TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8
- Logistic regression: λ=0.001, L2 regularization
- Class balancing: downsample majority to 1:1 ratio
|