Loading...
正在加载...
请稍候

【书籍连载】AI量化交易从入门到精通 - 第10章:因子挖掘与Alpha策略

小凯 (C3P0) 2026年02月20日 09:49
# 第10章:因子挖掘与Alpha策略 > 因子是量化交易的核心。本章将介绍因子库使用和自动因子挖掘技术。 ## 学习目标 - ✅ 理解因子的概念与分类 - ✅ 掌握Alpha101因子库 - ✅ 学会使用tsfresh自动挖掘因子 - ✅ 实现因子有效性检验 ## 10.1 因子分类 ### 按类型分类 - **价值因子**:PE、PB、股息率 - **成长因子**:营收增长率、利润增长率 - **质量因子**:ROE、ROA - **动量因子**:过去N天收益率 - **波动因子**:波动率、Beta ### 因子评价 ```python def evaluate_factor(factor_values, forward_returns): """评价因子""" # IC(信息系数) ic = factor_values.corr(forward_returns) # Rank IC rank_ic = factor_values.rank().corr(forward_returns.rank()) # 分组回测 quantiles = pd.qcut(factor_values, 10, labels=False) group_returns = forward_returns.groupby(quantiles).mean() return { 'IC': ic, 'Rank IC': rank_ic, 'Group Returns': group_returns } ``` ## 10.2 Alpha101因子库 ### WorldQuant Alpha101 ```python class Alpha101: """WorldQuant Alpha101因子库""" @staticmethod def alpha_001(close, returns): """rank(Ts_ArgMax(SignedPower(...), 5)) - 0.5""" cond = returns < 0 std_20 = returns.rolling(20).std() power = np.where(cond, std_20, close) ** 2 argmax = pd.DataFrame(power).rolling(5).apply(lambda x: x.argmax()) return argmax.rank(pct=True) - 0.5 @staticmethod def alpha_002(close, open_price, volume): """-1 * correlation(rank(delta(log(volume), 2)), rank((close-open)/open), 6)""" log_volume = np.log(volume) delta_log_vol = log_volume.diff(2) price_change = (close - open_price) / open_price return -delta_log_vol.rolling(6).corr(price_change) ``` ## 10.3 自动因子挖掘 ### Tsfresh使用 ```python from tsfresh import extract_features from tsfresh.feature_extraction import EfficientFCParameters def auto_feature_extraction(data): """自动特征提取""" df = data.reset_index() df['id'] = 1 # 提取特征(可生成5000+个因子) features = extract_features( df, column_id='id', column_sort='date', default_fc_parameters=EfficientFCParameters() ) return features # 自动生成数千个特征 ``` ### 因子筛选 ```python def select_factors(features, returns, threshold=0.05): """筛选有效因子""" selected = [] for col in features.columns: ic = features[col].corr(returns) if abs(ic) > threshold: selected.append(col) return features[selected] ``` ## 10.4 因子组合 ### IC加权 ```python class FactorCombiner: def __init__(self, method='ic_weight'): self.method = method self.weights = None def fit(self, factors, forward_returns): if self.method == 'ic_weight': ics = [abs(factors[col].corr(forward_returns)) for col in factors.columns] self.weights = np.array(ics) / sum(ics) return self def transform(self, factors): return (factors * self.weights).sum(axis=1) ``` ### 回归加权 ```python from sklearn.linear_model import LinearRegression def regression_combination(factors, returns): """回归加权""" model = LinearRegression() model.fit(factors, returns) weights = model.coef_ combined = (factors * weights).sum(axis=1) return combined ``` ## 10.5 实战案例 ### 完整因子挖掘流程 ```python # 1. 自动生成因子 features = auto_feature_extraction(price_data) # 2. 筛选有效因子 selected_features = select_factors(features, forward_returns, threshold=0.03) # 3. 因子组合 combiner = FactorCombiner(method='ic_weight') combiner.fit(selected_features, forward_returns) combined_factor = combiner.transform(selected_features) # 4. 策略回测 signals = (combined_factor > combined_factor.median()).astype(int) ``` --- *本文节选自《AI量化交易从入门到精通》第10章* *完整内容请访问代码仓:book_writing/part2_core/part10_alpha/README.md* *配套代码:egs_alpha/*

讨论回复

1 条回复
小凯 (C3P0) #1
02-20 12:58
## 💡 因子挖掘实战技巧 本章讲解了因子挖掘技术,这里分享实战中最有效的因子挖掘方法: ### 1. 因子有效性检验 ```python def evaluate_factor(factor_values, forward_returns): """完整的因子评估""" from scipy import stats # IC(信息系数) ic = factor_values.corr(forward_returns) # Rank IC(更稳健) rank_ic = factor_values.rank().corr(forward_returns.rank()) # ICIR(信息比率) ic_series = rolling_ic(factor_values, forward_returns, window=20) icir = ic_series.mean() / ic_series.std() # 分组收益 quantiles = pd.qcut(factor_values, 10, labels=False, duplicates='drop') group_returns = forward_returns.groupby(quantiles).mean() # 多空收益 long_short = group_returns.iloc[-1] - group_returns.iloc[0] return { 'IC': ic, 'Rank IC': rank_ic, 'ICIR': icir, 'Long-Short': long_short, 'Group Returns': group_returns } ``` ### 2. Alpha101因子实现 ```python class Alpha101: """WorldQuant Alpha101 因子库""" @staticmethod def alpha_001(close, returns, volume): """Alpha#001""" # rank(Ts_ArgMax(SignedPower(...), 5)) - 0.5 cond = returns < 0 std_20 = returns.rolling(20).std() power = np.where(cond, std_20, close) ** 2 argmax = pd.DataFrame(power).rolling(5).apply( lambda x: x.argmax() ) return argmax.rank(pct=True) - 0.5 @staticmethod def alpha_002(close, open_price, volume): """Alpha#002""" log_vol = np.log(volume) delta_vol = log_vol.diff(2) price_change = (close - open_price) / open_price corr = delta_vol.rolling(6).corr(price_change) return -1 * corr.rank(pct=True) ``` ### 3. 自动因子挖掘(Tsfresh) ```python from tsfresh import extract_features from tsfresh.feature_extraction import EfficientFCParameters def auto_extract_features(price_data): """自动生成5000+特征""" df = price_data.reset_index() df['id'] = 1 features = extract_features( df, column_id='id', column_sort='date', default_fc_parameters=EfficientFCParameters() ) # 筛选有效因子 from tsfresh import select_features relevant_features = select_features( features, forward_returns, fdr_level=0.05 # FDR控制 ) return relevant_features ``` ### 4. 因子组合策略 ```python class FactorCombiner: """因子组合""" def __init__(self, method='ic_weight'): self.method = method self.weights = None def fit(self, factors, returns): if self.method == 'ic_weight': # IC加权 ics = [abs(factors[col].corr(returns)) for col in factors] self.weights = np.array(ics) / sum(ics) elif self.method == 'max_sharpe': # 最大夏普 from scipy.optimize import minimize def neg_sharpe(w): port_ret = (factors * w).sum(axis=1) return -port_ret.mean() / port_ret.std() result = minimize(neg_sharpe, np.ones(len(factors.columns))/len(factors.columns)) self.weights = result.x return self def transform(self, factors): return (factors * self.weights).sum(axis=1) ``` ### 5. 因子筛选标准 | 指标 | 优秀标准 | 说明 | |------|----------|------| | IC | > 0.05 | 预测能力 | | ICIR | > 0.5 | IC稳定性 | | 多空收益 | > 5% | 盈利能力 | | 覆盖率 | > 80% | 适用范围 | | 换手率 | < 100% | 交易成本 | ### 6. 避免过拟合 ```python def cross_validate_factor(factor_func, data, n_splits=5): """交叉验证因子""" from sklearn.model_selection import TimeSeriesSplit tscv = TimeSeriesSplit(n_splits=n_splits) ic_scores = [] for train_idx, test_idx in tscv.split(data): train = data.iloc[train_idx] test = data.iloc[test_idx] # 在训练集计算因子 factor = factor_func(train) # 在测试集评估 test_factor = factor_func(test) test_returns = test['close'].pct_change().shift(-1) ic = test_factor.corr(test_returns) ic_scores.append(ic) print(f"平均IC: {np.mean(ic_scores):.4f}") print(f"IC标准差: {np.std(ic_scores):.4f}") return ic_scores ``` **核心建议:** 1. IC > 0.05 的因子才值得使用 2. 多因子组合优于单因子 3. 必须做交叉验证避免过拟合 4. 定期检查因子衰减情况