Machine Learning exercise
The most common type of Machine Learning is supervised learning. A supervised machine learning algorithm is trained on a dataset seeing both the provided input values and their matching actual output values so the model can understand how these outcomes are achieved. The machine learns by observing both the input and the correct output. The goal of this process is to enable the model to predict what the output should be when any input is fed.
The process can be looked at in three parts. The training where a large amount of data is fed into the model and the outcome is known. The Mapping is where patterns and relationships are observed that links input to the outputs. Last, Prediction when you provide the model with new input and it predicts what that outcome should be.
Supervised learning normally delivers two categories of output. Classification where the model predicts a label of what the input is and regression where the model predicts a numeric value like a stock price.
Classification can be used in analysis x-rays and identifying possible issues like normal or abnormal. Regression may take in weather metrics and predict tomorrow's temperature.
Linear regression
Linear Regression is a supervised learning method of machine learning which uses regression to predict a continuous numerical value by establishing a relationship between two or more variables.
At its simplest, it assumes that if you plot your data on a graph, you can draw a straight line that best represents the trend of that data.
In this example we will use the principles of Linear Regression leveraging a common stock market pricing indicator called RSI
RSI or Relative Strength Index is a technique used to measure the price change movement of stocks. It indicates whether a stock is oversold meaning the price may go up or overbought meaning the price may go down.
RSI is a momentum oscillator where an Overbought cross back below 70 is a potential exit signal and an Oversold cross back above 30 is a potential entry signal.
We will use Linear regression and RSI to identify buying and selling opportunities.
====================================
This program has features that capture momentum and trends.
Moving averages are used to smooth out price noise.
Price Lag uses historical price changes.
Volatility is the standard deviation of the data.
Divergence occurs when price and RSI move in opposite directions.
When RSI indicates lower lows, downward momentum is fading and this is bullish.
RSI indicates higher highs, upward momentum is fading and this is bearish.
Linear Regression and RSI are tools for identifying probability, not certainty.
SMA_5(5-period Simple Moving Average) captures short-term momentum while SMA_20 captures the medium-term trend.
Linear Regression is a "line-of-best-fit" tool. Stocks rarely move in perfect lines. If you find your R-squared is still low, the next step would be using a Random Forest Regressor or an LSTM (Neural Network), which can handle the non-linear "wiggles" of the stock market much better.
It specifically uses Linear Regression to model how the independent variable (Volume) influences the dependent variable (Closing Price).
Moving Average Crossovers (Trend Following)
Since your model uses the SMA_5 (Short-term) and SMA_20 (Long-term), you can interpret the relationship between these two lines:
Golden Cross (Buy): When the 5-day SMA crosses above the 20-day SMA. This indicates short-term momentum is shifting upward.
Death Cross (Sell): When the 5-day SMA crosses below the 20-day SMA. This indicates the trend is breaking down.
Implementation: Adding "Signal" Columns
You can add logic to your code to label these moments. Here is a snippet you can add after creating your results DataFrame:
The Python Machine Learning Stack
The project leverages the Scikit-Learn ecosystem to handle the "Heavy Lifting" of data science:
Pandas - Data manipulation, loading Excel files, and cleaning.
Matplotlib - Visualizing the relationship between volume and price.
Train_test_split - Dividing data into a Training Set (to learn patterns) and a Test Set (to verify accuracy).
LinearRegression - The core algorithm that fits the "Line of Best Fit" to the historical data.
Evaluation Metrics
To determine if the model is actually useful, the project uses two standard statistical measurements:
Preprocessing: Cleaning the dataset to ensure quality.
Modeling: Using linear_model to create a linear combination of features.
Verification: Testing the model on data it has never seen before to ensure it can generalize to real-world market conditions.
Scikit-learn requires Python 3.6 or later.
Install scikit-learn using pip: Type the following command and press Enter:
Code
pip install scikit-learn
If you encounter permission errors, you might need to run the command with administrator privileges or use the --user flag:
Code
pip install --user scikit-learn
=====================================================
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from google.cloud import storage
# use your project ID from Google cloud
PROJECT_ID = 'cloud-project-examples'
# The ID of your GCS bucket (bucket name)
bucket_name = "cloud-storage-exam"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# 1. Download Data
ticker = 'AMZN'
df = yf.download(ticker, start='2025-04-01', end='2026-04-01', auto_adjust=True)
# 2. FEATURE ENGINEERING
df['SMA_5'] = df['Close'].rolling(window=5).mean()
df['SMA_20'] = df['Close'].rolling(window=20).mean()
df['Daily_Return'] = df['Close'].pct_change()
df.dropna(inplace=True)
# 3. Define Features and Target
features = ['SMA_5', 'SMA_20', 'Volume', 'Daily_Return']
X = df[features]
y = df['Close']
# 4. Split and Train (No shuffling for time-series!)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, shuffle=False
)
model = LinearRegression()
model.fit(X_train, y_train)
# 5. Predictions & Evaluation
y_pred = model.predict(X_test)
print(f"R-squared Score: {r2_score(y_test, y_pred):.4f}")
# 6. Create Results DataFrame and Signal Logic
results = pd.DataFrame({
'Actual': y_test.values.flatten(),
'Predicted': y_pred.flatten()
}, index=y_test.index)
# INDICATOR LOGIC:
# Buy if Predicted is > 0.5% above Actual. Sell if < 0.5% below Actual.
threshold = 0.005
results['Signal'] = 0
results.loc[results['Predicted'] > results['Actual'] * (1 + threshold), 'Action'] = 'Buy'
results.loc[results['Predicted'] < results['Actual'] * (1 - threshold), 'Action'] = 'Sell'
# 7. Plotting with Buy/Sell Markers
plt.figure(figsize=(14, 7))
plt.plot(results.index, results['Actual'], label='Actual Price', color='royalblue', alpha=0.6)
plt.plot(results.index, results['Predicted'], label='Model Prediction', color='darkorange', linestyle='--', alpha=0.8)
# Add Buy markers (Green Up Arrows)
buys = results[results['Action'] == 'Buy']
plt.scatter(buys.index, buys['Actual'], marker='^', color='green', s=100, label='Buy Signal', zorder=5)
# Add Sell markers (Red Down Arrows)
sells = results[results['Action'] == 'Sell']
plt.scatter(sells.index, sells['Actual'], marker='v', color='red', s=100, label='Sell Signal', zorder=5)
plt.title(f'{ticker} Trading Signals: Predicted vs Actual', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
# Save and Show
plt.savefig('amzn_trading_signals.png')
print("Plot with signals saved as amzn_trading_signals.png")
# Display the last few signals in the console
print("\nRecent Model Signals:")
print(results[['Actual', 'Predicted', 'Action']].tail(10))
plt.show()
plot_filename = 'amzn_trading_signals.png'
destination_blob_name = f'stocks/{plot_filename}'
source_file_name = 'amzn_trading_signals.png'
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name) #upload file to specified destination
print(f'File {source_file_name} uploaded to {destination_blob_name}.')
===================================================
This Python program is a sophisticated end-to-end machine learning pipeline that moves beyond simple volume-based analysis to predict stock prices and generate actionable trading signals. It also integrates cloud storage for deployment.
Here is the breakdown of the program’s logic:
Data Acquisition & Cloud Setup
The script begins by initializing the Google Cloud Storage (GCS) client to interact with your specific bucket (cloud-storage-exam). It then uses the yfinance library to download exactly one year of historical price data for Amazon (AMZN).
Feature Engineering
This section addresses the "noise" of raw data by creating more predictive inputs:
SMA_5 & SMA_20: Simple Moving Averages for 5 and 20 days to identify short-term and medium-term trends.
Daily_Return: The percentage change in price from the previous day.
dropna(): This is crucial because moving averages create "NaN" (empty) values at the beginning of the dataset where there aren't enough days to calculate an average yet.
Model Training (Time-Series Protocol)
The program uses Linear Regression to find the relationship between those new features and the Closing Price.
No Shuffling: Note the shuffle=False in train_test_split. In stock market data, order matters. Shuffling would allow the model to "cheat" by seeing future data to predict the past. It trains on the first 80% of the year and tests on the most recent 20%.
Predictive Signal Logic
This is where the model turns into a trading strategy. Instead of just guessing a price, it applies a 0.5% threshold to generate signals:
Buy Signal: If the model predicts the price will be 0.5% higher than the actual price.
Sell Signal: If the model predicts the price will be 0.5% lower than the actual price.
Visualization & Analysis
The program generates a professional-grade chart using matplotlib:
Price Lines: Shows how closely the "Predicted" (orange dashed line) follows the "Actual" (blue line).
Action Markers: It overlays Green Up-Arrows for buy signals and Red Down-Arrows for sell signals directly onto the price chart.
Cloud Integration & Export
Finally, the script automates the reporting process:
It calculates the R-squared Score to tell you how much of the price movement the model successfully explained.
It saves the final chart as a .png file.
It uploads that file to your Google Cloud bucket under a stocks/ folder, allowing you to access the results remotely or share them via a dashboard.
Quick Observation: By including the Moving Averages and Daily Returns, your model is significantly more robust than a simple Volume-only regression. The R-squared score in this version should be notably higher.
Are you planning to add any other indicators, like the RSI we discussed, into this specific script?
Moving Average Crossovers (Trend Following)
Since your model uses the SMA_5 (Short-term) and SMA_20 (Long-term), you can interpret the relationship between these two lines:
Golden Cross (Buy): When the 5-day SMA crosses above the 20-day SMA. This indicates short-term momentum is shifting upward.
Death Cross (Sell): When the 5-day SMA crosses below the 20-day SMA. This indicates the trend is breaking down.
Implementation: Adding "Signal" Columns
You can add logic to your code to label these moments. Here is a snippet you can add after creating your results DataFrame:
Python
How to Read Your Plot for Signals
When you look at your generated graph, interpret the interactions between the Blue Line (Actual) and Orange Line (Predicted):
Visual Pattern - Interpretation - Potential Action
Orange well above Blue - Model thinks price is "too low" based on recent trends. - Buy / Long
Orange well below Blue - Model thinks price is "too high" based on recent trends. - Sell / Short
Lines are hugging/overlapping - The market is in equilibrium or "choppy."
Wait / Hold
Because your model relies heavily on Moving Averages, it is a lagging indicator. This means it tells you what just happened rather than what will happen.
If the Blue line (Actual) drops sharply, the Orange line (Predicted) will likely stay high for a day or two before following it down. If you follow the signal blindly during those two days, you might buy into a falling knife. This is why traders often combine these models with a Stop Loss to manage risk.
No comments:
Post a Comment