# 基于 Graph 的通用推荐系统设计
## 核心理念
利用 Neo4j 图数据库的关系网络和图算法,实现智能推荐。不依赖机器学习模型,纯图算法驱动。
---
## 推荐类型
### 1. 好友推荐
**目标**: 为用户推荐可能认识的人
### 2. 服务器推荐
**目标**: 推荐用户可能感兴趣的服务器/社区
### 3. 频道推荐
**目标**: 在服务器内推荐活跃或相关频道
### 4. 内容推荐
**目标**: 推荐消息、话题或讨论
---
## 图数据基础
### 现有关系
```cypher
(User)-[:MEMBER_OF {role, joinedAt, roleId}]->(Server)
(User)-[:FRIEND_WITH]->(User)
(Server)-[:HAS_CHANNEL]->(Channel)
(Channel)-[:CONTAINS_MESSAGE]->(Message)
(User)-[:AUTHOR_OF]->(Message)
(Message)-[:REACTED {emoji, createdAt}]->(User)
(Message)-[:THREAD_OF]->(MessageThread)
```
### 推荐需新增关系(可选)
```cypher
(User)-[:VIEWED]->(Server) # 浏览记录
(User)-[:INTERACTED {count, lastAt}]->(Channel) # 互动记录
(User)-[:SEARCHED {keyword, at}]->(?) # 搜索记录
```
---
## 推荐算法
### 算法 1: 协同过滤 - 好友的好友推荐
**原理**: 你的好友的好友可能是你认识的人
**Cypher 实现**:
```cypher
// 查找好友的好友(排除已是好友和自己)
MATCH (me:User {user_id: $userId})-[:FRIEND_WITH]->(friend)-[:FRIEND_WITH]->(foaf:User)
WHERE NOT (me)-[:FRIEND_WITH]->(foaf)
AND foaf.user_id <> $userId
WITH foaf, count(DISTINCT friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN foaf.user_id, foaf.username, foaf.nickname, foaf.avatar_url, mutualFriends
```
**评分**: 共同好友数量
---
### 算法 2: 共同服务器推荐好友
**原理**: 在同一服务器活跃的用户可能有共同兴趣
**Cypher 实现**:
```cypher
// 查找共同服务器的活跃用户
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(server:Server)<-[:MEMBER_OF]-(other:User)
WHERE NOT (me)-[:FRIEND_WITH]->(other)
AND other.user_id <> $userId
WITH other, count(DISTINCT server) AS commonServers
WHERE commonServers >= 2
ORDER BY commonServers DESC
LIMIT 10
RETURN other.user_id, other.username, other.nickname, other.avatar_url, commonServers
```
**评分**: 共同服务器数量(阈值 ≥ 2)
---
### 算法 3: 相似兴趣服务器推荐
**原理**: 你的好友加入的服务器可能符合你的兴趣
**Cypher 实现**:
```cypher
// 查找好友加入但我未加入的服务器
MATCH (me:User {user_id: $userId})-[:FRIEND_WITH]->(friend)-[:MEMBER_OF]->(server:Server)
WHERE NOT (me)-[:MEMBER_OF]->(server)
AND server.status = 'active'
WITH server, count(DISTINCT friend) AS friendCount
ORDER BY friendCount DESC
LIMIT 10
RETURN server.server_id, server.server_name, server.description, server.icon_url, friendCount
```
**评分**: 加入该服务器的好友数量
---
### 算法 4: PageRank - 热门服务器推荐
**原理**: 成员多且活跃的服务器更值得推荐
**Cypher 实现**:
```cypher
// 统计服务器的成员数和活跃度
MATCH (server:Server)<-[:MEMBER_OF]-(user:User)
WHERE NOT EXISTS {
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(server)
}
WITH server, count(user) AS memberCount
MATCH (server)-[:HAS_CHANNEL]->(channel:Channel)-[:CONTAINS_MESSAGE]->(msg:Message)
WHERE msg.created_at > datetime() - duration('P7D') // 近7天
WITH server, memberCount, count(msg) AS recentMessages
WITH server, memberCount, recentMessages,
(memberCount * 0.6 + recentMessages * 0.4) AS popularity
ORDER BY popularity DESC
LIMIT 10
RETURN server.server_id, server.server_name, server.description,
memberCount, recentMessages, popularity
```
**评分**: `memberCount × 0.6 + recentMessages × 0.4`
---
### 算法 5: 标签相似度推荐服务器
**原理**: 根据服务器标签/分类匹配用户兴趣
**前提**: 需要 Server 节点有 `tags` 属性,如: `["tech", "gaming", "art"]`
**Cypher 实现**:
```cypher
// 查找用户已加入服务器的标签
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(myServer:Server)
WITH collect(DISTINCT myServer.tags) AS myTags
// 查找有相似标签的未加入服务器
MATCH (server:Server)
WHERE NOT EXISTS {
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(server)
}
WITH server,
[tag IN server.tags WHERE tag IN myTags] AS commonTags,
size([tag IN server.tags WHERE tag IN myTags]) AS similarity
WHERE similarity > 0
ORDER BY similarity DESC
LIMIT 10
RETURN server.server_id, server.server_name, server.tags, commonTags, similarity
```
**评分**: 共同标签数量
---
### 算法 6: 活跃频道推荐(服务器内)
**原理**: 推荐近期消息多、参与者多的频道
**Cypher 实现**:
```cypher
// 查找服务器内活跃频道
MATCH (server:Server {server_id: $serverId})-[:HAS_CHANNEL]->(channel:Channel)
MATCH (channel)-[:CONTAINS_MESSAGE]->(msg:Message)
WHERE msg.created_at > datetime() - duration('P3D') // 近3天
WITH channel, count(msg) AS messageCount,
count(DISTINCT msg.author_id) AS participantCount
WITH channel, messageCount, participantCount,
(messageCount * 0.5 + participantCount * 0.5) AS activity
ORDER BY activity DESC
LIMIT 5
RETURN channel.channel_id, channel.channel_name, channel.channel_type,
messageCount, participantCount, activity
```
**评分**: `messageCount × 0.5 + participantCount × 0.5`
---
### 算法 7: 协同过滤 - 相似用户推荐服务器
**原理**: 找到行为相似的用户,推荐他们加入的服务器
**Cypher 实现**:
```cypher
// 1. 找到兴趣相似的用户(共同服务器 ≥ 2)
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(common:Server)<-[:MEMBER_OF]-(similar:User)
WHERE similar.user_id <> $userId
WITH similar, count(DISTINCT common) AS overlap
WHERE overlap >= 2
// 2. 推荐相似用户加入但我未加入的服务器
MATCH (similar)-[:MEMBER_OF]->(rec:Server)
WHERE NOT EXISTS {
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(rec)
}
WITH rec, count(DISTINCT similar) AS similarUsers
ORDER BY similarUsers DESC
LIMIT 10
RETURN rec.server_id, rec.server_name, rec.description, similarUsers
```
**评分**: 推荐该服务器的相似用户数量
---
### 算法 8: 最短路径 - 社交距离推荐
**原理**: 社交距离近(2-3跳)的用户可能认识
**Cypher 实现**:
```cypher
// 查找社交距离为2-3的用户
MATCH path = shortestPath(
(me:User {user_id: $userId})-[:FRIEND_WITH*2..3]-(other:User)
)
WHERE other.user_id <> $userId
AND NOT (me)-[:FRIEND_WITH]->(other)
WITH other, length(path) AS distance
ORDER BY distance, other.last_seen DESC
LIMIT 10
RETURN other.user_id, other.username, other.nickname, distance
```
**评分**: 路径长度(越短越好)+ 最近活跃时间
---
### 算法 9: Jaccard 相似度 - 兴趣匹配
**原理**: 计算两个用户加入服务器集合的相似度
**Cypher 实现**:
```cypher
// 计算 Jaccard 相似度
MATCH (me:User {user_id: $userId})-[:MEMBER_OF]->(myServer:Server)
WITH collect(DISTINCT myServer) AS myServers
MATCH (other:User)-[:MEMBER_OF]->(otherServer:Server)
WHERE other.user_id <> $userId
AND NOT (me)-[:FRIEND_WITH]->(other)
WITH other,
collect(DISTINCT otherServer) AS otherServers,
myServers
WITH other,
size([s IN otherServers WHERE s IN myServers]) AS intersection,
size(otherServers + [s IN myServers WHERE NOT s IN otherServers]) AS union
WHERE union > 0
WITH other, toFloat(intersection) / union AS jaccard
WHERE jaccard >= 0.2
ORDER BY jaccard DESC
LIMIT 10
RETURN other.user_id, other.username, jaccard
```
**评分**: Jaccard 系数 = `|A ∩ B| / |A ∪ B|`
---
### 算法 10: 趋势话题推荐
**原理**: 根据消息反应数、回复数推荐热门话题
**Cypher 实现**:
```cypher
// 查找热门消息/话题(近7天)
MATCH (channel:Channel)-[:CONTAINS_MESSAGE]->(msg:Message)
WHERE msg.created_at > datetime() - duration('P7D')
// 统计反应和回复
OPTIONAL MATCH (msg)<-[r:REACTED]-(u:User)
WITH msg, channel, count(DISTINCT r) AS reactionCount
OPTIONAL MATCH (msg)-[:THREAD_OF]->(thread:MessageThread)-[:THREAD_REPLY]->(reply:Message)
WITH msg, channel, reactionCount, count(DISTINCT reply) AS replyCount
WITH msg, channel, reactionCount, replyCount,
(reactionCount * 0.4 + replyCount * 0.6) AS hotness
WHERE hotness > 5
ORDER BY hotness DESC
LIMIT 10
RETURN msg.message_id, msg.content, channel.channel_name,
reactionCount, replyCount, hotness
```
**评分**: `reactionCount × 0.4 + replyCount × 0.6`
---
## 推荐服务架构
### Repository 层
```java
@Repository
public interface RecommendationRepository extends Neo4jRepository<UserNode, Long> {
@Query(FRIEND_OF_FRIEND_QUERY)
List<UserRecommendation> recommendFriends(@Param("userId") Long userId);
@Query(SERVER_BY_FRIENDS_QUERY)
List<ServerRecommendation> recommendServers(@Param("userId") Long userId);
@Query(ACTIVE_CHANNEL_QUERY)
List<ChannelRecommendation> recommendChannels(@Param("serverId") Long serverId);
}
```
### Service 层
```java
@Service
public class RecommendationService {
public List<UserRecommendation> getFriendRecommendations(Long userId) {
// 组合多个算法结果,去重排序
List<UserRecommendation> foaf = repo.recommendFriends(userId);
List<UserRecommendation> commonServer = repo.recommendByCommonServers(userId);
return mergeAndRank(foaf, commonServer);
}
public List<ServerRecommendation> getServerRecommendations(Long userId) {
// 结合好友推荐、热度、标签相似度
// ...
}
}
```
### DTO
```java
public record UserRecommendation(
Long userId,
String username,
String nickname,
String avatarUrl,
Integer score, // 推荐分数
String reason // 推荐理由: "3个共同好友"
) {}
public record ServerRecommendation(
Long serverId,
String serverName,
String description,
String iconUrl,
Integer score,
String reason // "5个好友已加入"
) {}
```
---
## 缓存策略
### Redis 缓存热门推荐
```java
// 热门服务器缓存 30 分钟
@Cacheable(value = "recommendations:hot-servers", ttl = 1800)
List<ServerRecommendation> getHotServers();
// 用户个性化推荐缓存 5 分钟
@Cacheable(value = "recommendations:user:{userId}:friends", ttl = 300)
List<UserRecommendation> getFriendRecommendations(Long userId);
```
### Redisson 分布式锁防止重复计算
```java
RLock lock = redisson.getLock("recommendation:compute:" + userId);
try {
if (lock.tryLock(1, 10, TimeUnit.SECONDS)) {
// 计算推荐结果
}
} finally {
lock.unlock();
}
```
---
## 评分权重调优
### 组合推荐示例
```cypher
// 综合评分 = 基础分 + 时效性 + 活跃度
WITH recommendation,
baseScore,
CASE
WHEN lastActivity > datetime() - duration('P1D') THEN 10
WHEN lastActivity > datetime() - duration('P7D') THEN 5
ELSE 0
END AS freshnessBonus,
activityScore
WITH recommendation,
baseScore * 0.5 + freshnessBonus * 0.2 + activityScore * 0.3 AS finalScore
ORDER BY finalScore DESC
```
### 权重配置化
```java
@ConfigurationProperties(prefix = "recommendation")
public class RecommendationConfig {
private Map<String, Double> weights = Map.of(
"mutualFriends", 0.6,
"commonServers", 0.4,
"recency", 0.2
);
}
```
---
## 性能优化
### 1. 索引优化
```cypher
CREATE INDEX user_id_idx FOR (u:User) ON (u.user_id);
CREATE INDEX server_id_idx FOR (s:Server) ON (s.server_id);
CREATE INDEX message_created_at_idx FOR (m:Message) ON (m.created_at);
```
### 2. 查询优化
- 使用 `LIMIT` 限制返回结果
- 用 `WITH` 子句提前过滤
- 避免 `OPTIONAL MATCH` 嵌套过深
- 使用 `count(DISTINCT x)` 替代 `size(collect(x))`
### 3. 分批计算
```java
// 后台任务预计算热门推荐
@Scheduled(cron = "0 */30 * * * ?") // 每30分钟
public void preComputeHotRecommendations() {
List<ServerRecommendation> hot = computeHotServers();
redisTemplate.opsForValue().set("hot:servers", hot, 30, TimeUnit.MINUTES);
}
```
---
## 冷启动策略
### 新用户(无历史数据)
1. **推荐热门服务器**: 使用 PageRank 算法
2. **推荐标签服务器**: 根据注册时选择的兴趣标签
3. **推荐官方服务器**: 系统预设的推荐列表
```cypher
// 新用户默认推荐
MATCH (server:Server)
WHERE server.is_public = true
AND server.is_featured = true
ORDER BY server.member_count DESC
LIMIT 5
RETURN server
```
---
## 多样性保证
### 避免推荐同质化
```cypher
// 在结果中注入多样性
WITH recommendations
UNWIND recommendations AS rec
WITH rec, rand() AS randomness
ORDER BY rec.score * 0.8 + randomness * 0.2 DESC
LIMIT 10
RETURN rec
```
---
## A/B 测试支持
### 算法版本控制
```java
public enum RecommendationStrategy {
COLLABORATIVE_FILTERING,
CONTENT_BASED,
HYBRID
}
public List<Recommendation> getRecommendations(Long userId, RecommendationStrategy strategy) {
return switch (strategy) {
case COLLABORATIVE_FILTERING -> collaborativeFiltering(userId);
case CONTENT_BASED -> contentBased(userId);
case HYBRID -> hybrid(userId);
};
}
```
---
## 监控指标
### 关键指标
- **点击率 (CTR)**: 推荐被点击的比例
- **转化率**: 推荐后实际加入/添加的比例
- **覆盖率**: 推荐结果覆盖的物品比例
- **多样性**: 推荐列表的多样性指数
### 日志记录
```cypher
// 记录推荐曝光和点击
CREATE (e:RecommendationEvent {
user_id: $userId,
item_id: $itemId,
item_type: $itemType, // "server", "user", "channel"
algorithm: $algorithm,
score: $score,
action: $action, // "shown", "clicked", "joined"
timestamp: datetime()
})
```
---
## 未来扩展
### 1. 实时推荐
- 基于 Redisson Stream 实时更新用户行为
- 增量更新推荐结果
### 2. 深度学习增强
- 使用 Graph Neural Network (GNN) 提取特征
- 结合协同过滤和内容特征
### 3. 跨平台推荐
- 聚合用户在不同设备的行为
- 统一推荐策略
---
## 实现优先级
### P0 (核心推荐)
- ✅ 好友的好友推荐
- ✅ 热门服务器推荐
- ✅ 活跃频道推荐
### P1 (增强推荐)
- 共同服务器推荐好友
- 相似兴趣服务器推荐
- 趋势话题推荐
### P2 (高级算法)
- Jaccard 相似度
- PageRank / 社区发现
- 实时个性化推荐
---
**设计原则**:
- 🚀 图算法优先,无需训练模型
- 💾 充分利用 Neo4j 关系网络
- ⚡ 缓存热点数据,异步预计算
- 📊 可监控、可调优、可 A/B 测试