UCONN Stamford Google Cloud Development Platform: Machine Learning Exercise

Machine Learning exercise

A powerful form of Machine Learning is supervised learning. A supervised machine learning algorithm is trained on a dataset seeing the provided input values together with their corresponding output values. This is so the model can find a way to generate these outputs from the supplied inputs. Going forward, the goal of this process is to enable the model to predict what the output should be produced given an input.

The process can be broken into three parts. The first part is the training where a large amount of data is fed into the model and the outcome is known. The second part is the mapping patterns and relationships. This ties together inputs to the outputs. In the last part, is the prediction of an output given a new input.

There are two basic types of supervised learning. The most basic form of supervised learning classifies inputs. Given inputs, a classifier separates them into categories based on the input characteristics. The other basic type of supervised learning is regression. Regression finds a linear relationship of the input data to the output data. This numeric relationship makes it easy to compute an output given any new input.

Classification can be used in analysis x-rays and identifying possible issues like normal or abnormal. Regression may take in weather metrics and predict tomorrow's temperature.

Linear regression

Linear Regression is a supervised learning method of machine learning which predicts a continuous numerical value by establishing a relationship between two or more variables.

At its simplest, it assumes that if you plot your data on a graph, you can draw a straight line that best represents the trend of that data.

In this example we will use the principles of Linear Regression leveraging a common stock market pricing indicator called RSI

RSI or Relative Strength Index is a heuristic that measures the price change movement of stocks. It indicates whether a stock is oversold meaning the price may go up or overbought meaning the price may go down.

RSI indicates a stock is overbought if it goes above 70. RSI indicates a stock is oversold if it goes below 30.

We will use linear regression and RSI to identify buying and selling opportunities.

====================================

This program has features that capture momentum and trends.

Moving averages are used to smooth out price noise.

Price Lag uses historical price changes.

Volatility is the standard deviation of the data.

Divergence occurs when price and RSI move in opposite directions.

When RSI indicates lower lows, downward momentum is fading and this is bullish.

RSI indicates higher highs, upward momentum is fading and this is bearish.

Linear Regression and RSI are tools for identifying probability, not certainty.

SMA_5(5-period Simple Moving Average) captures short-term momentum while SMA_20 captures the medium-term trend.

Linear Regression is a "line-of-best-fit" tool. Stocks rarely move in perfect lines. If you find your R-squared is still low, the next step would be using a Random Forest Regressor or an LSTM (Neural Network), which can handle the non-linear "wiggles" of the stock market much better.

It specifically uses Linear Regression to model how the independent variable (Volume) influences the dependent variable (Closing Price).

Moving Average Crossovers (Trend Following)

Since your model uses the SMA_5 (Short-term) and SMA_20 (Long-term), you can interpret the relationship between these two lines:

Golden Cross (Buy): When the 5-day SMA crosses above the 20-day SMA. This indicates short-term momentum is shifting upward.

Death Cross (Sell): When the 5-day SMA crosses below the 20-day SMA. This indicates the trend is breaking down.

Implementation: Adding "Signal" Columns

You can add logic to your code to label these moments. Here is a snippet you can add after creating your results DataFrame:

The Python Machine Learning Stack

The project leverages the Scikit-Learn ecosystem to handle the "Heavy Lifting" of data science:

Pandas - Data manipulation, loading Excel files, and cleaning.

Matplotlib - Visualizing the relationship between volume and price.

Train_test_split - Dividing data into a Training Set (to learn patterns) and a Test Set (to verify accuracy).

LinearRegression - The core algorithm that fits the "Line of Best Fit" to the historical data.

Evaluation Metrics

To determine if the model is actually useful, the project uses two standard statistical measurements:

Preprocessing: Cleaning the dataset to ensure quality.

Modeling: Using linear_model to create a linear combination of features.

Verification: Testing the model on data it has never seen before to ensure it can generalize to real-world market conditions.

Scikit-learn requires Python 3.6 or later.

Install scikit-learn using pip: Type the following command and press Enter:

Code

pip install scikit-learn

If you encounter permission errors, you might need to run the command with administrator privileges or use the --user flag:

Code

pip install --user scikit-learn

=====================================================

import pandas as pd

import yfinance as yf

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

from google.cloud import storage

# use your project ID from Google cloud

PROJECT_ID = 'cloud-project-examples'

# The ID of your GCS bucket (bucket name)

bucket_name = "cloud-storage-exam"

storage_client = storage.Client()

bucket = storage_client.bucket(bucket_name)

# 1. Download Data

ticker = 'AMZN'

df = yf.download(ticker, start='2025-04-01', end='2026-04-01', auto_adjust=True)

# 2. FEATURE ENGINEERING

df['SMA_5'] = df['Close'].rolling(window=5).mean()

df['SMA_20'] = df['Close'].rolling(window=20).mean()

df['Daily_Return'] = df['Close'].pct_change()

df.dropna(inplace=True)

# 3. Define Features and Target

features = ['SMA_5', 'SMA_20', 'Volume', 'Daily_Return']

X = df[features]

y = df['Close']

# 4. Split and Train (No shuffling for time-series!)

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=42, shuffle=False

)

model = LinearRegression()

model.fit(X_train, y_train)

# 5. Predictions & Evaluation

y_pred = model.predict(X_test)

print(f"R-squared Score: {r2_score(y_test, y_pred):.4f}")

# 6. Create Results DataFrame and Signal Logic

results = pd.DataFrame({

'Actual': y_test.values.flatten(),

'Predicted': y_pred.flatten()

}, index=y_test.index)

# INDICATOR LOGIC:

# Buy if Predicted is > 0.5% above Actual. Sell if < 0.5% below Actual.

threshold = 0.005

results['Signal'] = 0

results.loc[results['Predicted'] > results['Actual'] * (1 + threshold), 'Action'] = 'Buy'

results.loc[results['Predicted'] < results['Actual'] * (1 - threshold), 'Action'] = 'Sell'

# 7. Plotting with Buy/Sell Markers

plt.figure(figsize=(14, 7))

plt.plot(results.index, results['Actual'], label='Actual Price', color='royalblue', alpha=0.6)

plt.plot(results.index, results['Predicted'], label='Model Prediction', color='darkorange', linestyle='--', alpha=0.8)

# Add Buy markers (Green Up Arrows)

buys = results[results['Action'] == 'Buy']

plt.scatter(buys.index, buys['Actual'], marker='^', color='green', s=100, label='Buy Signal', zorder=5)

# Add Sell markers (Red Down Arrows)

sells = results[results['Action'] == 'Sell']

plt.scatter(sells.index, sells['Actual'], marker='v', color='red', s=100, label='Sell Signal', zorder=5)

plt.title(f'{ticker} Trading Signals: Predicted vs Actual', fontsize=14)

plt.xlabel('Date')

plt.ylabel('Price (USD)')

plt.legend()

plt.grid(True, alpha=0.3)

plt.tight_layout()

# Save and Show

plt.savefig('amzn_trading_signals.png')

print("Plot with signals saved as amzn_trading_signals.png")

# Display the last few signals in the console

print("\nRecent Model Signals:")

print(results[['Actual', 'Predicted', 'Action']].tail(10))

plt.show()

plot_filename = 'amzn_trading_signals.png'

destination_blob_name = f'stocks/{plot_filename}'

source_file_name = 'amzn_trading_signals.png'

blob = bucket.blob(destination_blob_name)

blob.upload_from_filename(source_file_name) #upload file to specified destination

print(f'File {source_file_name} uploaded to {destination_blob_name}.')

===================================================

This Python program is a sophisticated end-to-end machine learning pipeline that moves beyond simple volume-based analysis to predict stock prices and generate actionable trading signals. It also integrates cloud storage for deployment.

Here is the breakdown of the program’s logic:

Data Acquisition & Cloud Setup

The script begins by initializing the Google Cloud Storage (GCS) client to interact with your specific bucket (cloud-storage-exam). It then uses the yfinance library to download exactly one year of historical price data for Amazon (AMZN).

Feature Engineering

This section addresses the "noise" of raw data by creating more predictive inputs:

SMA_5 & SMA_20: Simple Moving Averages for 5 and 20 days to identify short-term and medium-term trends.

Daily_Return: The percentage change in price from the previous day.

dropna(): This is crucial because moving averages create "NaN" (empty) values at the beginning of the dataset where there aren't enough days to calculate an average yet.

Model Training (Time-Series Protocol)

The program uses Linear Regression to find the relationship between those new features and the Closing Price.

No Shuffling: Note the shuffle=False in train_test_split. In stock market data, order matters. Shuffling would allow the model to "cheat" by seeing future data to predict the past. It trains on the first 80% of the year and tests on the most recent 20%.

Predictive Signal Logic

This is where the model turns into a trading strategy. Instead of just guessing a price, it applies a 0.5% threshold to generate signals:

Buy Signal: If the model predicts the price will be 0.5% higher than the actual price.

Sell Signal: If the model predicts the price will be 0.5% lower than the actual price.

Visualization & Analysis

The program generates a professional-grade chart using matplotlib:

Price Lines: Shows how closely the "Predicted" (orange dashed line) follows the "Actual" (blue line).

Action Markers: It overlays Green Up-Arrows for buy signals and Red Down-Arrows for sell signals directly onto the price chart.

Cloud Integration & Export

Finally, the script automates the reporting process:

It calculates the R-squared Score to tell you how much of the price movement the model successfully explained.

It saves the final chart as a .png file.

It uploads that file to your Google Cloud bucket under a stocks/ folder, allowing you to access the results remotely or share them via a dashboard.

Quick Observation: By including the Moving Averages and Daily Returns, your model is significantly more robust than a simple Volume-only regression. The R-squared score in this version should be notably higher.

Are you planning to add any other indicators, like the RSI we discussed, into this specific script?

Moving Average Crossovers (Trend Following)

Since your model uses the SMA_5 (Short-term) and SMA_20 (Long-term), you can interpret the relationship between these two lines:

Golden Cross (Buy): When the 5-day SMA crosses above the 20-day SMA. This indicates short-term momentum is shifting upward.

Death Cross (Sell): When the 5-day SMA crosses below the 20-day SMA. This indicates the trend is breaking down.

Implementation: Adding "Signal" Columns

You can add logic to your code to label these moments. Here is a snippet you can add after creating your results DataFrame:

Python

How to Read Your Plot for Signals

When you look at your generated graph, interpret the interactions between the Blue Line (Actual) and Orange Line (Predicted):

Visual Pattern - Interpretation - Potential Action

Orange well above Blue - Model thinks price is "too low" based on recent trends. - Buy / Long

Orange well below Blue - Model thinks price is "too high" based on recent trends. - Sell / Short

Lines are hugging/overlapping - The market is in equilibrium or "choppy."

Wait / Hold

Because your model relies heavily on Moving Averages, it is a lagging indicator. This means it tells you what just happened rather than what will happen.

If the Blue line (Actual) drops sharply, the Orange line (Predicted) will likely stay high for a day or two before following it down. If you follow the signal blindly during those two days, you might buy into a falling knife. This is why traders often combine these models with a Stop Loss to manage risk.

UCONN Stamford Google Cloud Development Platform

UCONN

Machine Learning Exercise

When RSI indicates lower lows, downward momentum is fading and this is bullish.

SMA_5(5-period Simple Moving Average) captures short-term momentum while SMA_20 captures the medium-term trend.

Moving Average Crossovers (Trend Following)

Implementation: Adding "Signal" Columns

The Python Machine Learning Stack

Evaluation Metrics

Data Acquisition & Cloud Setup

Feature Engineering

Model Training (Time-Series Protocol)

Predictive Signal Logic

Visualization & Analysis

Cloud Integration & Export

Moving Average Crossovers (Trend Following)

Implementation: Adding "Signal" Columns

How to Read Your Plot for Signals

No comments:

Post a Comment

Assignment 10 due before grading

Report Abuse