数据是量化交易的生命线。本章将详细介绍如何获取、清洗和处理股票数据。
| 数据源 | 特点 | 适用场景 |
|---|---|---|
| Tushare | 数据丰富,需积分 | A股研究 |
| Baostock | 完全免费,历史完整 | 长期回测 |
| Yfinance | 简单易用 | 美股/港股 |
| Akshare | 开源免费 | 多市场数据 |
import tushare as ts
# 设置token
ts.set_token('your_token_here')
pro = ts.pro_api()
# 获取日线数据
df = pro.daily(ts_code='000001.SZ', start_date='20230101', end_date='20231231')
import baostock as bs
# 登录
bs.login()
# 获取数据
rs = bs.query_history_k_data_plus(
"sh.600000",
"date,code,open,high,low,close,volume",
start_date='2023-01-01',
end_date='2023-12-31',
frequency="d",
adjustflag="2" # 前复权
)
class DataManager:
"""数据管理器"""
def __init__(self, source="baostock"):
self.source = source
def get_data(self, code, start_date, end_date):
"""获取数据"""
# 统一接口封装
pass
# 前向填充
df = df.fillna(method='ffill')
# 插值
df = df.interpolate()
# 删除
df = df.dropna()
# 3σ法则
mean = df['close'].mean()
std = df['close'].std()
df = df[(df['close'] >= mean - 3*std) & (df['close'] <= mean + 3*std)]
# IQR方法
Q1 = df['close'].quantile(0.25)
Q3 = df['close'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['close'] >= Q1 - 1.5*IQR) & (df['close'] <= Q3 + 1.5*IQR)]
# 收益率
df['returns'] = df['close'].pct_change()
# 移动平均
df['ma20'] = df['close'].rolling(20).mean()
# 波动率
df['volatility'] = df['returns'].rolling(20).std()
# RSI
def calculate_rsi(prices, window=14):
delta = prices.diff()
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
avg_gain = gain.rolling(window).mean()
avg_loss = loss.rolling(window).mean()
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
return rsi
本文节选自《AI量化交易从入门到精通》第3章
完整内容请访问代码仓:bookwriting/part1basics/part3_data/README.md