Machine Learning for Trading--2-7 Dealing With Data

  • How Data Is Aggregated
  • Stock Splits
  • Dividends
  • Survior Bias

How Data Is Aggregated?

The finest resolution of data is a tick. A tick represents a successful match between a buyer and a seller: in other words, a successful transaction.

Consider the following plot, which shows share price along with share volume over time.

data_aggregation

Keep in mind that none of these transactions happens at a specific time or during a specific time slice. There is no guarantee regarding the number of ticks during any particular minute or hour. An exchange only records a tick when a successful transaction occurs.

Each exchange provides its own data feed regarding these transactions, and we can subscribe to multiple feeds to see ticks across different markets. Let’s add in the ticks from another exchange, shown in red, to our plot. Remember that all of these transactions are happening simultaneously, and prices at different exchanges aren’t guaranteed to be the same.

data_aggregation

Highly liquid stocks might experience hundreds of thousands of transactions occurring every second. Collecting ticks across all of the exchanges over a long period of time for stocks like these might result in an incomprehensible amount of data.

As a result, exchanges usually consolidate tick data into time slices - minute-by-minute or hour-by-hour, for example - and we can see such slices, demarcated by the dotted lines, in our plot.

data_aggregation

We can describe each chunk using five data points: open, high, low, close, and volume. Let’s consider the first chunk.

The open is the price of the first transaction within the chunk, which is \$100.00. The high is the highest transaction price within the chunk, which is also \$100.00. The low is the lowest transaction price within the chunk, which is \$99.05. The close is the price of the last transaction within the chunk, which is \$95.50. Finally, the volume represents the total number of shares transacted within the chunk, which is 600.

The data that we are going to work with is daily data; in other words, we are working with chunks that aggregate ticks on a daily frequency.

Price Anomaly Quiz

Consider the following plot of IBM stock prices over time.

Notice the sudden drops in price. In one example, the price drops from \$300 per share to \$75 per share; in another, the price drops from $250 per share to \$125 per share. These drops represent a 75% and a 50% price decline, respectively.

Undoubtedly, the value of IBM did not drop that much in one day, so, which of the following reasons might explain the sudden drop in stock price?

quiz

What we see here is a stock split, whereby a single share becomes n shares, and the price of each share is divided by n.

Stock Splits

The most common reason for a company to issue a stock split is to drive down the share price.

Naturally, that begs the question: why is a high share price a problem?

Consider a stock with a high share price, such as \$500. Investors generally buy stocks in groups of one hundred - referred to as a lot. At \$500 per share, a lot costs \$50,000, and some investors might find this price prohibitively expensive. Additionally, share price impacts derivative securities, like stock options, which typically control lots, and can make them more expensive and less liquid.

As well, investors often strive to build portfolios with finely-tuned proportions allocated to each stock. If some of the stock in the portfolio have very high share prices, such fine resolution may be difficult to achieve.

splits

If we want to trade using the actual close data, we need to account for these splits.

The solution to this problem is to use adjusted close instead of close. Adjusted close retroactively accounts for stock splits and provides a measure of split-adjusted price movement that gives investors - and computers - a less turbulent view of the share price over time.

Here’s how it works. We walk back in time, day by day, setting adjusted close equal to actual close until we encounter the most recent n:1 stock split. From that point back to the beginning of time, we set the adjusted close equal to the actual close divided by n.

By computing the adjusted close, we now have a smooth line representing the split-adjusted stock price, and we no longer have to account for stock splits explicitly. Additionally, we can look back into the past - several splits ago - and accurately understand the accumulation of value between then and now.

Split Adjustment Quiz

Consider the following plot of close prices for a particular stock over time. Notice the 2:1 stock split. For each of the three days identified below, what is the adjusted close price for this stock?

splits

Dividends

Many companies regularly pay dividends to their shareholders. For example, a company might pay a 1-2% dividend per share per year. For a stock trading at \$100 per share, this dividend equals \$1-2.

Dividend payments can have significant effects on a stock’s price. Consider a company whose stock is currently trading at around \$100 per share and for which investors have derived a fundamental value of exactly \$100 per share. Let’s suppose that this stock has an upcoming dividend payment of \$1.

Dividends Quiz

What share price do we expect to see the day before the dividend is paid? How about after the dividend is paid?

Dividends

The day before the dividend is paid, we should expect to see the stock price rise to \$101. A share price of \$101 reflects the underlying value of \$100 per share, for which there is consensus, plus the expected \$1 dividend payment.

Adjusting for Dividends

Between the date that the dividend is announced and the date it is paid, we see the share price generally rise from the consensus price - \$100 - to accommodate the value of the dividend. After the dividend is paid, we see the share price immediately drop back \$100.

Let’s consider now how we might adjust historical prices to account for dividends. Remember that the adjusted price as of today is always the same as the actual price. As we move backward in time, these two values remain the same until we encounter a dividend.

Once we encounter a dividend, we adjust all of the preceding prices downward by the proportion of the dividend payment. In this case, with a 1% dividend - \$1 paid on a \$100 share - we reduce all prices before the dividend date by 1%.

Dividends

Survior Bias

One of the activities we do in this class is simulate strategies that we are developing. To do this, we roll back time, trade according to our strategy, and analyze the results.

Before we can simulate our strategy, however, we first have to consider the universe of candidate stocks that we can buy or sell as we execute our strategy. The most common universe of stocks is the S&P 500.

A very common mistake that people make is to use the membership of that universe as it stands today and trade according to this universe at the beginning of their simulation.

We can see the strong performance of such a biased strategy below in blue.

The problem here is that we are selecting from a collection of stocks in the relative past that we know exists in the relative future. Since all of these stocks survived the trading period, any trading strategy exclusively considering them is likely to show unrealistically optimistic results. In other words, there is a built-in survivorship bias in our selection.

What if we used the S&P 500 universe as it existed in 2007? The performance of our strategy now is less biased and might look like the red line below.

Bias

If we consider the universe of stocks as it existed in 2007, we are considering those stocks that didn’t make it to the present day. Specifically, our strategy might advise investments in one of the sixty-eight stocks that disappeared from the S&P 500 between 2007 and 2009.

If this is the case, our strategy is likely to show much worse performance than had we just considered the biased universe. However, the performance of our strategy using the unbiased universe is going to be more realistic, which is what matters.

您的支持将鼓励我继续创作