使用Python代码识别股票价格图表模式

2023年 12月 6日开发运维大树

在股票市场交易的动态环境中，技术和金融的融合催生了分析市场趋势和预测未来价格走势的先进方法。本文将使用Python进行股票模式识别。

from collections import defaultdict
 
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from scipy.signal import argrelextrema
 from statsmodels.nonparametric.kernel_regression import KernelReg
 from yahoofinancials import YahooFinancials

上面的库中，有几个要点需要介绍：

collections.defaultdict:当缺少键时，返回默认值。使用它可以有效地存储和组织数据，比如键反映日期或资产符号等可识别的度量，值表示相应的变量。

argrelextrema函数是SciPy库中的一个函数，用于进行科学计算和技术计算。它有助于识别价格数据中的局部最大值和最小值，指示价格数据中的潜在转折点或支撑位和阻力位。

statsmodels.nonparametric.kernel_regression.KernelReg:这个来自statmodels的子模块提供了非参数核回归功能。交易可以使用这种方法来拟合价格数据的平滑曲线，以确定趋势，无需假设曲线具有特定的参数形式。

YahooFinancials:该模块从雅虎财经获取财务数据。我们可以访问大量的财务数据，包括股票价格，财务报表和其他市场数据，用于分析和决定如何处理投资组合。

start_date = '2017-01-01'
 end_date = '2017-12-31'
 stock_code = 'FB' # e.g. AMZN, GOOG, FB, NVDA

我们获取的股票数据是在2017-01-01至2017-12-31期间。作为stock_code变量，Facebook, Inc.被设置为FB，即股票的代码。

在指定的日期范围内，交易算法将执行该股票代码的数据分析、交易信号或实际交易等操作。此代码的目的是为交易算法建立基本参数:目标时间框架和交易的特定股票。

变量最终将在代码中用于获取历史数据、执行财务分析和回溯测试交易策略。对于任何专注于股票市场的交易系统，这些参数是评估历史表现和执行实时交易的关键输入。

def preprocess_data(start_date, end_date, stock_code):
    stock_data = YahooFinancials(stock_code).get_historical_price_data(start_date, end_date, 'daily')
    price_data = stock_data[stock_code]['prices']
    columns = ['formatted_date', 'open', 'high', 'low', 'close', 'adjclose', 'volume']
    new_columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
    df = pd.DataFrame(data=price_data)[columns] # order dataframe columns
    df = df.rename(index=str, columns=dict(zip(columns, new_columns))) # rename dataframe columns
    return df, df['Close'], df['Date']

preprocess_data有三个参数:start_date、end_date和stock_code，它们指定时间范围和股票类型。此函数的主要目标是从Financials检索给定股票的指定日期范围内的历史股票价格。

获取包括全面的金融信息，包括每日股票价格、开盘价、最高价和最低价，以及调整后的收盘价。获得数据后，将其组织到pandas DataFrame中，通过重命名列，可以实现更好的可读性和与通用财务数据标准的一致性。该函数返回处理后的DataFrame以及两个Series一维数组，其中包括收盘价和收盘价发生的日期。

df, prices, dates = preprocess_data(start_date, end_date, stock_code)
 prices.index = np.linspace(1, len(prices), len(prices))
 dates.index = np.linspace(1, len(dates), len(dates))

我们为两组数据(价格和日期)设置索引。然后就是对价格的分析和局部最大值和最小值的识别，这对交易者来说是非常宝贵的。代码采用了一个核心回归模型，消除价格的周期性波动，从而更容易发现重要的趋势。

# https://onlinelibrary.wiley.com/doi/full/10.1111/0022-1082.00265
 # reference: https://www.quantopian.com/posts/an-empirical-algorithmic-evaluation-of-technical-analysis
 def find_max_min(prices):
    model = KernelReg(prices.values, prices.index.values, var_type='c', bw='cv_ls')
    smooth_prices = pd.Series(data=model.fit([prices.index.values])[0], index=prices.index) # index also from 1
 
    # use the minima and maxima from the smoothed timeseries
    # to identify true local minima and maxima in the original timeseres
    # by taking the maximum/minimum price within a t-1, t+1 window in the smoothed timeseries
    smooth_prices_max_indices = argrelextrema(smooth_prices.values, np.greater)[0]
    smooth_prices_min_indices = argrelextrema(smooth_prices.values, np.less)[0]
 
    price_max_indices = []
    for i in smooth_prices_max_indices:
        if 1 < i < len(prices)-1:
            price_max_indices.append(prices.iloc[i-2:i+2].idxmax())
 
    price_min_indices = []
    for i in smooth_prices_min_indices:
        if 1 < i < len(prices)-1:
            price_min_indices.append(prices.iloc[i-2:i+2].idxmin())
         
    price_max = prices.loc[price_max_indices]
    price_min = prices.loc[price_min_indices]
    max_min = pd.concat([price_max, price_min]).sort_index()
    max_min = max_min[~max_min.duplicated()] # deduplicate points that are both maximum and minimum
    max_min
     
    return smooth_prices, smooth_prices_max_indices, smooth_prices_min_indices, 
            price_max_indices, price_min_indices, max_min

用一种算法来识别基于平滑价格数据的价格曲线改变方向的点，代码在这个平滑的时间序列中搜索相对最大值和最小值。代码试图在平滑数据中找到这些极值后，将这些极值映射回原始的非平滑价格数据。

它通过检查平滑数据中每个极值点周围的小窗口来实现这一点，并确定该窗口内的价格最高或最低-这些是真正的局部最大值和最小值。在平滑和窗口化处理完成之后，代码将这些点组织到一个内聚输出中，删除可能同时存在于最大值和最小值的任何重复点。

可以使用这个结果来确定交易的进入和退出点。除了在代码中使用外，该代码还可以用于更大的策略中，根据这些发现触发买入或卖出信号。

smooth_prices, smooth_prices_max_indices, smooth_prices_min_indices, 
            price_max_indices, price_min_indices, max_min = find_max_min(prices

smooth_prices包含平滑版本的价格数据，可以消除噪音，使趋势更容易识别。

有各种各样的技术可用于平滑，包括移动平均线和其他算法。变量smooth_prices_max_indices和smooth_prices_min_indices可能表示平滑价格指数在局部最大值和最小值列表中的位置。当价格达到这些水平时，在价格反转之前识别潜在的买入或卖出信号是至关重要的。与前面的变量一样，price_max_indices和price_min_indices是从原始的、未平滑的价格中计算出来的。

max_min可能是一个数组或列表，其中包含有关已识别的最大值和最小值的信息，可能结合平滑和非平滑数据，用于根据本地价格极值确定是否进入或退出头寸。可以分析金融价格数据，识别峰值和低谷，并准备数据用于算法交易。作为更大的技术分析系统的一部分，它可以用于基于历史价格模式的自动交易活动。

下面我们看看上面代码计算得到的结果：

fig, ax = plt.subplots(figsize=(20,10), dpi=200)
 
 ax.plot(dates, prices, label='Prices')
 ax.plot(dates, smooth_prices, label='Smoothed Prices', linestyle='dashed')
 ax.set_xticks(np.arange(0, len(dates), 30))
     
 smooth_prices_max = smooth_prices.loc[smooth_prices_max_indices]
 smooth_prices_min = smooth_prices.loc[smooth_prices_min_indices]
 price_max = prices.loc[price_max_indices]
 price_min = prices.loc[price_min_indices]
 
 ax.scatter(dates.loc[smooth_prices_max.index], smooth_prices_max.values, s=20, color='red', label='Smoothed Prices Maxima')
 ax.scatter(dates.loc[smooth_prices_min.index], smooth_prices_min.values, s=20, color='purple', label='Smoothed Prices Minima')
 
 ax.scatter(dates.loc[price_max.index], price_max.values, s=50, color='green', label='Prices Maxima')
 ax.scatter(dates.loc[price_min.index], price_min.values, s=50, color='blue', label='Prices Minima')
 ax.legend(loc='upper left')
 ax.grid()

代码绘制了具有不同线条风格的实际价格和平滑价格。该图还显示了实际和平滑价格数据中局部最大值和最小值的位置，可能识别交易进入和退出信号。

为了区分最大值和最小值，使用较大的符号和不同的颜色。时间轴每隔一段时间显示在x轴上，以使其更清晰。图表的图例解释了情节元素，网格有助于分析价格随时间的变化，这些都是在绘图中必不可少的工作。

下面一个函数是Plot_window，它生成一个折线图，显示实际价格和平滑价格随时间的变化。平滑可能有助于识别趋势并过滤掉噪声。在这张图上可以区分出几个关键点。颜色和大小用于识别实际和平滑价格曲线的局部最大值和最小高点和低点。交易策略通常关注这些关键点，因为它们可能预示着趋势的逆转或继续。

def plot_window(dates, prices, smooth_prices, 
                smooth_prices_max_indices, smooth_prices_min_indices,
                price_max_indices, price_min_indices, 
                start, end, ax=None):
    if ax is None: fig, ax = plt.subplots(figsize=(20,10), dpi=200)
 
    ax.plot(dates.loc[start:end], prices.loc[start:end], label='Prices')
    ax.plot(dates.loc[start:end], smooth_prices.loc[start:end], label='Smoothed Prices', linestyle='dashed')
    ax.set_xticks(np.linspace(0, len(dates.loc[start:end]), 10))
    ax.tick_params(axis='x', rotatinotallow=45)
 
    smooth_prices_max = smooth_prices.loc[smooth_prices_max_indices].loc[start:end]
    smooth_prices_min = smooth_prices.loc[smooth_prices_min_indices].loc[start:end]
    price_max = prices.loc[price_max_indices].loc[start:end]
    price_min = prices.loc[price_min_indices].loc[start:end]
 
    ax.scatter(dates.loc[smooth_prices_max.index], smooth_prices_max.values, s=20, color='red', label='Smoothed Prices Maxima')
    ax.scatter(dates.loc[smooth_prices_min.index], smooth_prices_min.values, s=20, color='purple', label='Smoothed Prices Minima')
 
    ax.scatter(dates.loc[price_max.index], price_max.values, s=50, color='green', label='Prices Maxima')
    ax.scatter(dates.loc[price_min.index], price_min.values, s=50, color='blue', label='Prices Minima')
    ax.legend(fnotallow='small')
    ax.grid()

可以在较大的数据集中指定一个从开始到结束的时间窗口，这样可以查看数据的子集。为了清晰起见，在x轴上显示日期的同时还显示了一个图例和一个网格。

plot_window(dates, prices, smooth_prices, 
            smooth_prices_max_indices, smooth_prices_min_indices,
            price_max_indices, price_min_indices, 
            start=18, end=34, ax=None)

下面我们可以寻找一些简单的模式：

def find_patterns(max_min):
    patterns = defaultdict(list)
 
    for i in range(5, len(max_min)):
        window = max_min.iloc[i-5:i]
         
        # pattern must play out in less than 36 days
        if window.index[-1] - window.index[0] > 35:
            continue
 
        # Using the notation from the paper to avoid mistakes
        e1, e2, e3, e4, e5 = window.iloc[:5]
        rtop_g1 = np.mean([e1, e3, e5])
        rtop_g2 = np.mean([e2, e4])
         
        # Head and Shoulders
        if (e1 > e2) and (e3 > e1) and (e3 > e5) and 
            (abs(e1 - e5)  e5) and (e2  e2) and (e1 > e3) and (e3 > e5) and (e2 < e4):
            patterns['TTOP'].append((window.index[0], window.index[-1]))
 
        # Triangle Bottom
        elif (e1 < e2) and (e1 < e3) and (e3  e4):
            patterns['TBOT'].append((window.index[0], window.index[-1]))
 
        # Rectangle Top
        elif (e1 > e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and 
            (abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and 
            (abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2  max(e2, e4)):
            patterns['RTOP'].append((window.index[0], window.index[-1]))
 
        # Rectangle Bottom
        elif (e1 < e2) and (abs(e1-rtop_g1)/rtop_g1 < 0.0075) and 
            (abs(e3-rtop_g1)/rtop_g1 < 0.0075) and (abs(e5-rtop_g1)/rtop_g1 < 0.0075) and 
            (abs(e2-rtop_g2)/rtop_g2 < 0.0075) and (abs(e4-rtop_g2)/rtop_g2  min(e2, e4)):
            patterns['RBOT'].append((window.index[0], window.index[-1]))
             
    return patterns

迭代DataFrame中的条目，同时考虑5个数据点。确定每个5点窗口的模式是否在36天内发生。如果没有，则进入到下一个窗口。我们这里有几种类型的技术分析图表模式:

Head and Shoulders（头肩顶）:

这是一种反转图表模式，通常表示股价在涨势中即将反转。它包括一个中间峰（头）和两个较低的峰（肩），形成一个上升趋势的结束信号。

Inverse Head and Shoulders（倒头肩底）:

与头肩顶相反，这是一种底部反转图表模式。它包括一个中间洼地（倒头）和两个较低的洼地（倒肩），形成一个下降趋势的结束信号。

Broadening Top（扩顶形态）:

这是一种表示不稳定市场的图表模式，由两个趋势线分散开来形成。它可能表示市场波动性增加，预示价格的不确定性。

Broadening Bottom（扩底形态）:

与扩顶形态相反，这是一种表示不稳定市场的图表模式，由两个趋势线逐渐汇聚。它可能表示市场波动性增加，预示价格的不确定性。

Triangle Top（三角形顶部）:

这是一种形成在上升趋势中的图表模式，由两个趋势线收敛形成三角形。它可能表示价格即将下降。

Triangle Bottom（三角形底部）:

与三角形顶部相反，这是一种形成在下降趋势中的图表模式，由两个趋势线收敛形成三角形。它可能表示价格即将上升。

Rectangle Top（矩形顶部）:

这是一种在上升趋势中形成的图表模式，由水平线形成一个矩形。它表示市场可能经历一段横盘整理，价格可能会下跌。

Rectangle Bottom（矩形底部）:

与矩形顶部相反，这是一种在下降趋势中形成的图表模式，由水平线形成一个矩形。它表示市场可能经历一段横盘整理，价格可能会上升。

上面的这些模式是根据这些局部最大值和最小值的相对位置和值来识别的，检测到的每个模式都存储在一个字典中，模式名称作为键，窗口的开始和结束索引作为值。这些索引元组存储在每个模式末尾的字典中。

这样的代码在算法交易中很有用，当它自动检测与某些市场行为相关的历史模式时，允许交易者根据这些模式的存在做出明智的决策。

patterns = find_patterns(max_min)
 patterns

上面这些专有名字可能不太容易理解，所以我们可以使用代码把它们进行可视化

def visualize_patterns(dates, prices, smooth_prices, 
                        smooth_prices_max_indices, smooth_prices_min_indices, 
                        price_max_indices, price_min_indices, 
                        patterns, shorthand_fullname_dict):
    for name, end_day_nums in patterns.items():
        print('Pattern Identified: {} nNumber of Observations: {}'.format(shorthand_fullname_dict[name], len(end_day_nums)))
        rows = int(np.ceil(len(end_day_nums)/2))
        fig, axes = plt.subplots(rows, 2, figsize=(20,5*rows), dpi=200)
        fig.subplots_adjust(hspace=0.5)
        axes = axes.flatten()
        i = 0
        for start_date, end_date in end_day_nums:
            plot_window(dates, prices, smooth_prices, 
                smooth_prices_max_indices, smooth_prices_min_indices,
                price_max_indices, price_min_indices, 
                start=start_date-1, end=end_date+1, ax=axes[i])
            i += 1
        plt.show()
 visualize_patterns(dates, prices, smooth_prices, 
                        smooth_prices_max_indices, smooth_prices_min_indices, 
                        price_max_indices, price_min_indices, 
                        patterns, shorthand_fullname_dict)