This example has been auto-generated from the examples/ folder at GitHub repository.

Predicting Bike Rental Demand

# Activate local environment, see `Project.toml`
import Pkg; Pkg.activate(".."); Pkg.instantiate();
using RxInfer

Preamble: Enabling Predictions

RxInfer.jl facilitates predictions in two primary ways.

  1. Implicit Prediction: By adding missing instances directly into the data, which are then treated as regular observations during inference. This method typically does not require model alterations.
  2. Explicit Prediction: By defining a separate data variable in the model. This approach doesn't necessitate passing missing instances as the data variable but does require specifying the predictvar argument in the inference function.

Example

Consider the following model:

@model function example_model()
    y = datavar(Float64) where {allow_missing = true}

    h ~ NormalMeanPrecision(0, 1.0)
    x ~ NormalMeanPrecision(h, 1.0)
    y ~ NormalMeanPrecision(x, 10.0)
end
# Implicit Prediction
result = infer(model = example_model(), data = (y = missing,))
Inference results:
  Posteriors       | available for (h, x)
  Predictions      | available for (y)
# Explicit Prediction
result = infer(model = example_model(), predictvars = (y = KeepLast(),))
Inference results:
  Posteriors       | available for (h, x)
  Predictions      | available for (y)

Both approaches yield the same results, but the choice depends on personal preferences and the model's structure. In scenarios with a clear distinction between observed and predicted variables, the explicit method is preferable. However, our subsequent example will not differentiate between observations and predictions, as it utilizes a state space representation.

using CSV, DataFrames, Plots

Objective

This example aims to simultaneously learn the dynamics of the feature space and predict hourly bike rental demand through reactive message passing, a signature approach of RxInfer.jl.

Dataset Source

Data for this example study is sourced from the Kaggle Bike Count Prediction Dataset. For the purpose of this example, the original dataset from Kaggle has been adapted by removing categorical variables such as season, holiday, and working day. Additionally we take only 500 entries. This modification allows us to focus on modeling the continuous variables without additional complexities of handling categorical data. Nevertheless, this extension is feasible within RxInfer.jl.

# Load the data
df = CSV.read("../data/bike_count/modified_bicycle.csv", DataFrame)
df[1:10, :]
10×6 DataFrame
 Row │ datetime            temp     atemp    humidity  windspeed  count
     │ String31            Float64  Float64  Float64   Float64    Int64
─────┼──────────────────────────────────────────────────────────────────
   1 │ 2011-01-01 0:00:00     9.84   14.395      81.0     0.0        16
   2 │ 2011-01-01 1:00:00     9.02   13.635      80.0     0.0        40
   3 │ 2011-01-01 2:00:00     9.02   13.635      80.0     0.0        32
   4 │ 2011-01-01 3:00:00     9.84   14.395      75.0     0.0        13
   5 │ 2011-01-01 4:00:00     9.84   14.395      75.0     0.0         1
   6 │ 2011-01-01 5:00:00     9.84   12.88       75.0     6.0032      1
   7 │ 2011-01-01 6:00:00     9.02   13.635      80.0     0.0         2
   8 │ 2011-01-01 7:00:00     8.2    12.88       86.0     0.0         3
   9 │ 2011-01-01 8:00:00     9.84   14.395      75.0     0.0         8
  10 │ 2011-01-01 9:00:00    13.12   17.425      76.0     0.0        14
# we reserve few samples for prediction
n_future = 24

X = Union{Vector{Float64}, Missing}[[row[i] for i in 2:(ncol(df))-1] for row in eachrow(df)][1:end-n_future]
y = Union{Float64, Missing}[df[:, "count"]...][1:end-n_future]

state_dim = length(X[1]); # dimensionality of feature space

Generative Model with Priors

We present a generative model that delineates the latent dynamics of feature evolution, represented by $\mathbf{h}_t$, and their link to the bike rental counts, $\mathbf{y}_t$.

Equations and Priors

  1. Feature Dynamics with Prior:

    • Prior: $\mathbf{a} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$
    • Dynamics: $\mathbf{h}_t \sim \mathcal{N}(\mathbf{A h}_{t-1}, \mathbf{Q})$
    • \[\mathbf{A}\]

      is the transition matrix, reshaped from the prior vector $\mathbf{a}$, and $\mathbf{Q}$ represents process noise.
  2. Noisy Observations:

    • \[\mathbf{x}_t \sim \mathcal{N}(\mathbf{h}_t, \mathbf{P})\]

    • Represents the observed noisy state of the features.
  3. Count Prediction with Prior:

    • Prior: $\boldsymbol{\theta} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$
    • Prediction: $y_t \sim \mathcal{N}(\text{softplus}(\boldsymbol{\theta}^\top\mathbf{h}_t), \sigma^2)$
    • Models the bike rental count as influenced by a non-linear transformation of the hidden state.

Interpretation

  • This framework aims to simultaneously infer the transition matrix $\mathbf{A}$ and the regression parameters $\boldsymbol{\theta}$, providing a comprehensive view of the feature space dynamics and the count prediction.
  • By employing Gaussian priors on both $\mathbf{a}$ and $\boldsymbol{\theta}$, we incorporate beliefs about their distributions.
  • The inference process aims to discover these underlying dynamics, enabling predictions of both features $\mathbf{x}_t$ and counts $y_t$.
# We augument the dataset with missing entries for 24 hours ahead
append!(X, repeat([missing], n_future))
append!(y, repeat([missing], n_future));
# Function to perform the state transition in the model.
# It reshapes vector `a` into a matrix and multiplies it with vector `x` to simulate the transition.
function transition(a, x)
    nm, n = length(a), length(x)
    m = nm ÷ n  # Calculate the number of rows for reshaping 'a' into a matrix
    A = reshape(a, (m, n))  
    return A * x
end
transition (generic function with 1 method)
# The dotsoftplus function combines a dot product and softplus transformation.
# While useful for ensuring positivity, it may not be the optimal choice for all scenarios,
# especially if the data suggests other forms of relationships or distributions.
function dotsoftplus(a, x)
    log(1 + exp(dot(a, x)))
end
dotsoftplus (generic function with 1 method)
# model definction
@model function bicycle_ssm(n, h0, θ0, a0, Q, P)

    # `h` is a sequence of hidden states
    h = randomvar(n)

    # `x` is a sequence of observed states (features)
    x = datavar(Vector{Float64}, n) where {allow_missing = true}
    # `y` is a sequence of "count" bicycles
    y = datavar(Float64, n) where {allow_missing = true}

    a ~ MvNormal(μ=a0, Σ=diageye(length(a0)))

    θ ~ MvNormal(μ=mean(θ0), Σ=cov(θ0))

    h_prior ~ MvNormal(μ=mean(h0), Σ=cov(h0))
    h_prev = h_prior
    
    for i in 1:n
        
        h[i] ~ MvNormal(μ=transition(a, h_prev), Σ=Q)
        
        x[i] ~ MvNormal(μ=h[i], Σ=P)
        
        y[i] ~ Normal(μ=dotsoftplus(θ, h[i]), σ²=1.0)
        
        h_prev = h[i]
    end

end
# In this example, we opt for a basic Linearization approach for the transition and dotsoftplus functions.
# However, alternative methods like Unscented or CVI approximations can also be considered.
transition_meta = @meta begin 
    transition() -> Linearization()
    dotsoftplus() -> Linearization()
end
Meta specification:
  transition() -> Linearization()
  dotsoftplus() -> Linearization()
Options:
  warn = true
# We define rather uninformative priors 
prior_θ = MvNormalMeanCovariance(ones(state_dim), diageye(state_dim))
prior_h = MvNormalMeanCovariance(zeros(state_dim), 1e2diageye(state_dim))
prior_a = MvNormalMeanCovariance(zeros(state_dim^2), diageye(state_dim^2))
MvNormalMeanCovariance(
μ: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0
.0, 0.0]
Σ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 
0.0 1.0]
)
# the deterministic relationsships (transition) and (dotsoftplus) will induce loops in the graph representation of our model, this necessiates the initialization of the messages
imessages = ( h = prior_h, a = prior_a, θ = prior_θ,)

bicycle_model = bicycle_ssm(length(y), prior_h, prior_θ, prior_a, diageye(state_dim), diageye(state_dim))

result = infer(
    model = bicycle_model,
    data  = (y = y, x=X), 
    options = (limit_stack_depth = 500, ), 
    returnvars = KeepLast(),
    initmessages = imessages,
    meta = transition_meta,
    iterations = 20,
    showprogress=true,
)
Inference results:
  Posteriors       | available for (a, h, h_prior, θ)
  Predictions      | available for (y, x)
# For a sake of this example, we extract only predictions
mean_y, cov_y = mean.(result.predictions[:y]), cov.(result.predictions[:y])
mean_x, cov_x = mean.(result.predictions[:x]), var.(result.predictions[:x])

mean_x1, cov_x1 = getindex.(mean_x, 1), getindex.(cov_x, 1)
mean_x2, cov_x2 = getindex.(mean_x, 2), getindex.(cov_x, 2)
mean_x3, cov_x3 = getindex.(mean_x, 3), getindex.(cov_x, 3)
mean_x4, cov_x4 = getindex.(mean_x, 4), getindex.(cov_x, 4);
slice = (300, length(y))
data = df[:, "count"][length(y)-n_future:length(y)]

p = scatter(y, 
            color=:darkblue, 
            markerstrokewidth=0,
            label="Observed Count", 
            alpha=0.6)

# Plotting the mean prediction with variance ribbon
plot!(mean_y, ribbon=sqrt.(cov_y), 
      color=:orange, 
      fillalpha=0.3,
      label="Predicted Mean ± Std Dev")

# Adding a vertical line to indicate the start of the future prediction
vline!([length(y)-n_future], 
       label="Prediction Start", 
       linestyle=:dash, 
       linecolor=:green)

# Future (unobserved) data
plot!(length(y)-n_future:length(y), data, label="Future Count")

# Adjusting the limits
xlims!(slice)

# Enhancing the plot with titles and labels
title!("Bike Rental Demand Prediction")
xlabel!("Time")
ylabel!("Bike Count")

# Adjust the legend
legend=:topright

# Show the final plot
display(p)

using Plots

# Define a color palette
palette = cgrad(:viridis)

# Plot the hidden states with observations
p1 = plot(mean_x1, ribbon=sqrt.(cov_x1), color=palette[1], label="Hidden State 1", legend=:topleft)
plot!(df[!, :temp], color=:grey, label="Temperature")
vline!([length(y)-n_future], linestyle=:dash, color=:red, label="Prediction Start")
xlabel!("Time")
ylabel!("Value")
title!("Temperature vs Hidden State 1")

p2 = plot(mean_x2, ribbon=sqrt.(cov_x2), color=palette[2], label="Hidden State 2", legend=:topleft)
plot!(df[!, :atemp], color=:grey, label="Feels-Like Temp")
vline!([length(y)-n_future], linestyle=:dash, color=:red, label="")
xlabel!("Time")
ylabel!("Value")
title!("Feels-Like Temp vs Hidden State 2")

p3 = plot(mean_x3, ribbon=sqrt.(cov_x3), color=palette[3], label="Hidden State 3", legend=:topleft)
plot!(df[!, :humidity], color=:grey, label="Humidity")
vline!([length(y)-n_future], linestyle=:dash, color=:red, label="Prediction Start")
xlabel!("Time")
ylabel!("Value")
title!("Humidity vs Hidden State 3")

p4 = plot(mean_x4, ribbon=sqrt.(cov_x4), color=palette[4], label="Hidden State 4", legend=:topleft)
plot!(df[!, :windspeed], color=:grey, label="Windspeed")
vline!([length(y)-n_future], linestyle=:dash, color=:red, label="Prediction Start")
xlabel!("Time")
ylabel!("Value")
title!("Windspeed vs Hidden State 4")

for p in [p1, p2, p3, p4]
    xlims!(p, first(slice), last(slice))
end

plot(p1, p2, p3, p4, layout=(2, 2), size=(800, 400))

Conclusion

While the predictions provided by our current model may not align closely with the real-world data, it's crucial to acknowledge the assumptions and simplifications that were made which could have influenced the outcomes. This model serves as a conceptual framework, demonstrating that it's possible to infer states, parameters, and predictions concurrently with a focus on the predictive aspect of the analysis.

The real power of modeling lies in iteration and refinement. We encourage the community to build upon this foundation. Through collaborative effort and shared insights, we can enhance the predictive accuracy of such models, making them not only illustrative but also practical for real-world applications.