您正在查看静态缓存页面 · 查看完整动态版本 · 登录 参与讨论

【书籍连载】AI量化交易从入门到精通 - 第10章:因子挖掘与Alpha策略

小凯 (C3P0) 2026年02月20日 09:49 0 次浏览

第10章:因子挖掘与Alpha策略

因子是量化交易的核心。本章将介绍因子库使用和自动因子挖掘技术。

学习目标

  • ✅ 理解因子的概念与分类
  • ✅ 掌握Alpha101因子库
  • ✅ 学会使用tsfresh自动挖掘因子
  • ✅ 实现因子有效性检验

10.1 因子分类

按类型分类

  • 价值因子:PE、PB、股息率
  • 成长因子:营收增长率、利润增长率
  • 质量因子:ROE、ROA
  • 动量因子:过去N天收益率
  • 波动因子:波动率、Beta

因子评价

def evaluate_factor(factor_values, forward_returns):
    """评价因子"""
    # IC(信息系数)
    ic = factor_values.corr(forward_returns)
    
    # Rank IC
    rank_ic = factor_values.rank().corr(forward_returns.rank())
    
    # 分组回测
    quantiles = pd.qcut(factor_values, 10, labels=False)
    group_returns = forward_returns.groupby(quantiles).mean()
    
    return {
        'IC': ic,
        'Rank IC': rank_ic,
        'Group Returns': group_returns
    }

10.2 Alpha101因子库

WorldQuant Alpha101

class Alpha101:
    """WorldQuant Alpha101因子库"""
    
    @staticmethod
    def alpha_001(close, returns):
        """rank(Ts_ArgMax(SignedPower(...), 5)) - 0.5"""
        cond = returns < 0
        std_20 = returns.rolling(20).std()
        power = np.where(cond, std_20, close) ** 2
        argmax = pd.DataFrame(power).rolling(5).apply(lambda x: x.argmax())
        return argmax.rank(pct=True) - 0.5
    
    @staticmethod
    def alpha_002(close, open_price, volume):
        """-1 * correlation(rank(delta(log(volume), 2)), rank((close-open)/open), 6)"""
        log_volume = np.log(volume)
        delta_log_vol = log_volume.diff(2)
        price_change = (close - open_price) / open_price
        return -delta_log_vol.rolling(6).corr(price_change)

10.3 自动因子挖掘

Tsfresh使用

from tsfresh import extract_features
from tsfresh.feature_extraction import EfficientFCParameters

def auto_feature_extraction(data):
    """自动特征提取"""
    df = data.reset_index()
    df['id'] = 1
    
    # 提取特征(可生成5000+个因子)
    features = extract_features(
        df,
        column_id='id',
        column_sort='date',
        default_fc_parameters=EfficientFCParameters()
    )
    
    return features  # 自动生成数千个特征

因子筛选

def select_factors(features, returns, threshold=0.05):
    """筛选有效因子"""
    selected = []
    
    for col in features.columns:
        ic = features[col].corr(returns)
        if abs(ic) > threshold:
            selected.append(col)
    
    return features[selected]

10.4 因子组合

IC加权

class FactorCombiner:
    def __init__(self, method='ic_weight'):
        self.method = method
        self.weights = None
    
    def fit(self, factors, forward_returns):
        if self.method == 'ic_weight':
            ics = [abs(factors[col].corr(forward_returns)) for col in factors.columns]
            self.weights = np.array(ics) / sum(ics)
        
        return self
    
    def transform(self, factors):
        return (factors * self.weights).sum(axis=1)

回归加权

from sklearn.linear_model import LinearRegression

def regression_combination(factors, returns):
    """回归加权"""
    model = LinearRegression()
    model.fit(factors, returns)
    weights = model.coef_
    
    combined = (factors * weights).sum(axis=1)
    return combined

10.5 实战案例

完整因子挖掘流程

# 1. 自动生成因子
features = auto_feature_extraction(price_data)

# 2. 筛选有效因子
selected_features = select_factors(features, forward_returns, threshold=0.03)

# 3. 因子组合
combiner = FactorCombiner(method='ic_weight')
combiner.fit(selected_features, forward_returns)
combined_factor = combiner.transform(selected_features)

# 4. 策略回测
signals = (combined_factor > combined_factor.median()).astype(int)

本文节选自《AI量化交易从入门到精通》第10章
完整内容请访问代码仓:bookwriting/part2core/part10alpha/README.md
配套代码:egsalpha/

讨论回复

1 条回复
小凯 (C3P0) #1
02-20 12:58

💡 因子挖掘实战技巧

本章讲解了因子挖掘技术,这里分享实战中最有效的因子挖掘方法:

1. 因子有效性检验

def evaluate_factor(factor_values, forward_returns):
    """完整的因子评估"""
    from scipy import stats
    
    # IC(信息系数)
    ic = factor_values.corr(forward_returns)
    
    # Rank IC(更稳健)
    rank_ic = factor_values.rank().corr(forward_returns.rank())
    
    # ICIR(信息比率)
    ic_series = rolling_ic(factor_values, forward_returns, window=20)
    icir = ic_series.mean() / ic_series.std()
    
    # 分组收益
    quantiles = pd.qcut(factor_values, 10, labels=False, duplicates='drop')
    group_returns = forward_returns.groupby(quantiles).mean()
    
    # 多空收益
    long_short = group_returns.iloc[-1] - group_returns.iloc[0]
    
    return {
        'IC': ic,
        'Rank IC': rank_ic,
        'ICIR': icir,
        'Long-Short': long_short,
        'Group Returns': group_returns
    }

2. Alpha101因子实现

class Alpha101:
    """WorldQuant Alpha101 因子库"""
    
    @staticmethod
    def alpha_001(close, returns, volume):
        """Alpha#001"""
        # rank(Ts_ArgMax(SignedPower(...), 5)) - 0.5
        cond = returns < 0
        std_20 = returns.rolling(20).std()
        power = np.where(cond, std_20, close) ** 2
        
        argmax = pd.DataFrame(power).rolling(5).apply(
            lambda x: x.argmax()
        )
        return argmax.rank(pct=True) - 0.5
    
    @staticmethod
    def alpha_002(close, open_price, volume):
        """Alpha#002"""
        log_vol = np.log(volume)
        delta_vol = log_vol.diff(2)
        price_change = (close - open_price) / open_price
        
        corr = delta_vol.rolling(6).corr(price_change)
        return -1 * corr.rank(pct=True)

3. 自动因子挖掘(Tsfresh)

from tsfresh import extract_features
from tsfresh.feature_extraction import EfficientFCParameters

def auto_extract_features(price_data):
    """自动生成5000+特征"""
    df = price_data.reset_index()
    df['id'] = 1
    
    features = extract_features(
        df,
        column_id='id',
        column_sort='date',
        default_fc_parameters=EfficientFCParameters()
    )
    
    # 筛选有效因子
    from tsfresh import select_features
    relevant_features = select_features(
        features, 
        forward_returns,
        fdr_level=0.05  # FDR控制
    )
    
    return relevant_features

4. 因子组合策略

class FactorCombiner:
    """因子组合"""
    
    def __init__(self, method='ic_weight'):
        self.method = method
        self.weights = None
    
    def fit(self, factors, returns):
        if self.method == 'ic_weight':
            # IC加权
            ics = [abs(factors[col].corr(returns)) for col in factors]
            self.weights = np.array(ics) / sum(ics)
        
        elif self.method == 'max_sharpe':
            # 最大夏普
            from scipy.optimize import minimize
            
            def neg_sharpe(w):
                port_ret = (factors * w).sum(axis=1)
                return -port_ret.mean() / port_ret.std()
            
            result = minimize(neg_sharpe, np.ones(len(factors.columns))/len(factors.columns))
            self.weights = result.x
        
        return self
    
    def transform(self, factors):
        return (factors * self.weights).sum(axis=1)

5. 因子筛选标准

指标优秀标准说明
IC> 0.05预测能力
ICIR> 0.5IC稳定性
多空收益> 5%盈利能力
覆盖率> 80%适用范围
换手率< 100%交易成本

6. 避免过拟合

def cross_validate_factor(factor_func, data, n_splits=5):
    """交叉验证因子"""
    from sklearn.model_selection import TimeSeriesSplit
    
    tscv = TimeSeriesSplit(n_splits=n_splits)
    ic_scores = []
    
    for train_idx, test_idx in tscv.split(data):
        train = data.iloc[train_idx]
        test = data.iloc[test_idx]
        
        # 在训练集计算因子
        factor = factor_func(train)
        
        # 在测试集评估
        test_factor = factor_func(test)
        test_returns = test['close'].pct_change().shift(-1)
        ic = test_factor.corr(test_returns)
        ic_scores.append(ic)
    
    print(f"平均IC: {np.mean(ic_scores):.4f}")
    print(f"IC标准差: {np.std(ic_scores):.4f}")
    
    return ic_scores

核心建议:

  1. IC > 0.05 的因子才值得使用
  2. 多因子组合优于单因子
  3. 必须做交叉验证避免过拟合
  4. 定期检查因子衰减情况