Machine Learning for Trading--1-1 Reading and Plotting Data

Pandas是一个建立在 Python 之上的一个高效的,简单易用的数据结构和分析工具。 Pandas 的核心就是一个高效易用的数据类型:DataFrame。这个数据类型有点类似 R 语言的数据框 (Data Frame),也有点类似于 Excel 表格,但是比这两种更加适合在 Python 的语言环境内操作数据。在这个数据结构之下,我们可以轻松的对数据进行清洗,整理,归纳总结,合并,转换,计算等等。

  • 从CSV文件中读取并输出数据 – Reading in a CSV File
  • 选择特定行的数据–Select data in specific rows
  • 输出某列数据的最大值、平均值–Find maximum\mean closing value for stock
  • 配合matplotlib将数据绘制成折线图–Plotting

从CSV文件中读取并输出数据 – Reading in a CSV File

  • 读取数据–pd.read_csv(“data.csv”)
  • 输出整个dataframe–print(df)
  • 输出前五行数据—-print(df.head())
  • 输出最后五行数据–print(df.tail())

Example:

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd

def test _run():
"""Function called by Test Run."""
df = pd.read_csv("./data/AAPL.csv")
# Quiz: Print last 5 rows of the data frame
# print df # prints entire data set (dataframe)
print (df.head()) # prints first five records
print (df.tail()) # prints last five records

if __name__ = "__main__":
test_run()

选择特定行的数据–Select data in specific rows

  • 输出第10行到第20行之间的所有数据–print(df[10:21])

Example:

1
2
3
4
5
6
7
8
9
import pandas as pd

def test_run():
"""Function called by Test Run."""
df = pd.read_csv("data/AAPL.csv")
print (df[10:21]) # print rows between index 10 and 20 inclusive

if __name__ == "__main__":
test_run()

输出某列数据的最大值、平均值–Find maximum\mean closing value for stock

  • 输出Close列的最大值–df[‘Close’].max()
  • 输出Close列的平均–df[‘Close’].mean()

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd

def get_max_close(symbol):
"""Return the maximum closing value for stock indicated by symbol.

Note: Data for a stock is stored in file: data/<symbol>.csv
"""
df = pd.read_csv("data/{}.csv".format(symbol)) #read in data
return df['close'].max() #compute and return max

def test _run():
"""Function called by Test Run."""
for symbol in ['APPL', 'IBM']:
print "Max close"
print symbol, get_max_close(symbol)

if __name__ = "__main__":
test_run()

配合matplotlib将数据绘制成折线图–Plotting

  • 绘图–df.plot() 以及 plt.show()

Example 1–输出单列数据:

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
import matplotlib.pyplot as plt

def test_run():
"""Plot a single column."""
df = pd.read_csv("data/AAPL.csv")
print df['Ajf Close']
df['Adf Close'].plot()
plt.show()

if __name__ = "__main__":
test_run()

Example 2 – 输出两列数据:

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
import matplotlib.pyplot as plt

def test_run():
df = pd.read_csv("data/IBM.csv")
df[['Close', 'Adj Close']].plot()
plt.show() # must be called to show plots

if __name__ == "__main__":
test_run()

您的支持将鼓励我继续创作