Skip to content

[Feed] Add GET /v1/feed/for-you endpoint with Twitter-style ranking algorithm#787

Open
dylanjeffers wants to merge 2 commits intomainfrom
feed/for-you-algorithm
Open

[Feed] Add GET /v1/feed/for-you endpoint with Twitter-style ranking algorithm#787
dylanjeffers wants to merge 2 commits intomainfrom
feed/for-you-algorithm

Conversation

@dylanjeffers
Copy link
Copy Markdown
Contributor

@dylanjeffers dylanjeffers commented May 6, 2026

Summary

Adds GET /v1/feed/for-you, a personalized track feed modeled on Twitter's open-sourced 2023 algorithm (the-algorithm / the-algorithm-ml). The pipeline is candidate-retrieval → ranking → filtering+diversity, the same three-stage shape Twitter uses on top of a learned heavy ranker. Audius doesn't yet have a trained ranker, so the heavy ranker is approximated by a hand-tuned linear blend; the candidate retrieval and diversity passes carry over directly so a learned model can drop in later.

Client consumer: AudiusProject/apps#14237.

Endpoint

GET /v1/feed/for-you?user_id=<hashId>&limit=25&offset=0&max_per_artist=3

user_id is required (the handler 400s without it — "For You" without a "you" degenerates into trending+underground). limit defaults to 25 (max 100), offset to 0 (max 200), max_per_artist to 3 (max 10).

Algorithm

1. Candidate retrieval (UNION across 4 capped sources)

Source What it pulls Cap
in_network Tracks uploaded in the last 14 days by users I follow 200
trending Top week-trending from track_trending_scores (mirrors /tracks/trending) 100
underground Week-trending whose owner has < 1500 follower & following count 50
similar Recent uploads (60d) by artists saved by users who also save my saved-artists (1-hop saves-graph CF) 100

DISTINCT ON (track_id) ORDER BY track_id, prio keeps the strongest source for each track, so an in-network track that's also trending keeps the in-network weight.

2. Ranking

recency_score    = exp(-ln(2) * age_hours / 48)
                   // 48h half-life: 48h → 0.5, 96h → 0.25
engagement_score = ln(1 + 3*saves + 2*reposts + plays) / 12
                   // preserves saves > reposts > plays, log-compressed
social_boost     = 1.0 + min(ln(1 + my_engagement_with_artist) / 4, 1)
                   // up to ~2x for artists I already engage with often
source_weight    = { in_network: 1.20, trending: 1.00,
                     underground: 0.95, similar: 0.90 }

final_score = (0.55 * recency_score + 0.45 * engagement_score)
              * social_boost * source_weight

3. Filters (applied once after the union)

  • Track liveness: is_current, is_delete=false, is_unlisted=false, is_available=true, stem_of IS NULL
  • Owner liveness: is_current, is_deactivated=false, is_available=true (same shape as v1_events_remix_contests.go)
  • Access-gating: ungated, or caller's wallet is on access_authorities (matches the v1_users_feed authed-wallet pattern)
  • Already-saved by caller (don't resurface)
  • Caller's own uploads

4. Diversity

  • SQL hard cap: ROW_NUMBER() OVER (PARTITION BY owner_id ORDER BY score DESC, track_id DESC) filtered to <= max_per_artist (default 3) — prevents a single hot artist from filling the page.
  • Go greedy pass: walks the ranked pool keeping global rank order, but if the next track shares an owner with the one just emitted, prefers the next non-same-owner candidate within a 5-position lookahead. Soft penalty on consecutive-same-artist runs without computing a second ranker.

Pagination is offset/limit applied on the diversity-ordered list, so pages are stable as long as underlying scores haven't shifted.

Test plan

  • TestV1FeedForYou_Basic — in-network + trending + underground all surface; deleted/unlisted/deactivated/own/saved tracks are excluded
  • TestV1FeedForYou_RequiresUserId — 400 without user_id
  • TestV1FeedForYou_ExcludesAlreadySavedTracks — already-saved exclusion works
  • TestV1FeedForYou_MaxThreePerArtist — 3-per-artist cap enforced
  • TestV1FeedForYou_DiversityPassNoConsecutiveSameArtist — Go greedy pass interleaves artists
  • TestV1FeedForYou_PaginationDoesNotRepeat — pagination doesn't repeat ids across pages
  • TestV1FeedForYou_InvalidParams — limit/offset out-of-range → 400
  • TestV1FeedForYou_RecencyAndEngagementRanking — fresh+engaged outranks low-engagement and old (joint signal test)
  • Smoke-test against staging once deployed: hit /v1/feed/for-you?user_id=… for a real account, eyeball the mix of in-network vs trending vs underground
  • Cross-check with apps#14237 client integration

🤖 Generated with Claude Code

dylanjeffers and others added 2 commits May 5, 2026 20:56
…lgorithm

Implements a personalized For You feed modeled on Twitter's 2023
open-sourced timeline pipeline (candidate generation -> ranking ->
filtering -> diversity).

Candidate sources (4):
- in-network: recent uploads from artists the viewer follows
- weekly trending (track_trending_scores, time_range=week)
- underground trending (sub-1500 follower/following artists)
- similar-artist 1-hop CF: artists co-saved by users who saved my saved
  artists' tracks

Ranking (SQL-side):
- 48h half-life recency: EXP(-LN(2) * hours_old / 48)
- engagement: LN(1 + 3*saves + 2*reposts + plays) (saves > reposts > plays)
- social affinity: 1 + min(LN(1 + my_engagement_count) / 4, 1)
- source weight: in-network 1.20, trending 1.00, underground 0.95, similar 0.90

Filtering / diversity:
- hard filters mirror the v1_events_remix_contests.go pattern:
  is_delete=false, is_unlisted=false, is_available=true, stem_of IS NULL,
  no access_authorities, owner not deactivated
- excludes tracks the viewer has already saved
- 3-per-artist cap via ROW_NUMBER() OVER (PARTITION BY owner_id)
- Go-side greedy diversity pass with a 5-track lookahead to avoid
  consecutive same-artist tracks without disturbing global rank

Pagination: user_id (required), limit (1-100, default 25), offset (0-200).

Consumed by apps#14237.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the OpenAPI entry for the new endpoint so it shows up in
/v1 (swagger UI) and the SDK codegen pipeline.

Documents the four query params (user_id required; limit, offset,
max_per_artist optional with min/max bounds matching the handler's
validate tags) and points the 200 response at the existing
"tracks" component schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant