Title: | Bayesian Modal Regression Based on the GUD Family |
---|---|
Description: | Provides probability density functions and sampling algorithms for three key distributions from the General Unimodal Distribution (GUD) family: the Flexible Gumbel (FG) distribution, the Double Two-Piece (DTP) Student-t distribution, and the Two-Piece Scale (TPSC) Student-t distribution. Additionally, this package includes a function for Bayesian linear modal regression, leveraging these three distributions for model fitting. The details of the Bayesian modal regression model based on the GUD family can be found at Liu, Huang, and Bai (2024) <doi:10.1016/j.csda.2024.108012>. |
Authors: | Qingyang Liu [aut, cre] , Xianzheng Huang [aut] , Ray Bai [aut] |
Maintainer: | Qingyang Liu <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.2 |
Built: | 2025-01-26 03:28:21 UTC |
Source: | https://github.com/rh8liuqy/bayesian_modal_regression |
This dataset is sourced from the 5th edition of "The Art and Science of Learning from Data" by Alan Agresti and Christine Franklin.
crime
crime
crime
A data frame with 51 rows and 9 columns:
The list of 50 states in the United States and the District of Columbia.
The annual number of murders, forcible rapes, robberies, and aggravated assaults per 100,000 people in the population.
The annual number of murders per 100,000 people in the population.
Percentage of the residents with income below the poverty level.
Percentage of the adult residents who have at least a high school education.
Percentage of the adult residents who have a college education.
Percentage of families headed by a single parent.
Percentage of the adult residents who are unemployed.
Percentage of the residents living in metropolitan areas.
The DTP-Student-t Distribution
dDTP(x, theta, sigma1, sigma2, delta1, delta2) rDTP(n, theta, sigma1, sigma2, delta1, delta2)
dDTP(x, theta, sigma1, sigma2, delta1, delta2) rDTP(n, theta, sigma1, sigma2, delta1, delta2)
x |
vector of quantiles. |
theta |
vector of the location parameters. |
sigma1 |
vector of the scale parameters of the left skewed part. |
sigma2 |
vector of the scale parameters of the right skewed part. |
delta1 |
the degree of freedom of the left skewed part. |
delta2 |
the degree of freedom of the right skewed part. |
n |
number of observations. |
The DTP-Student-t distribution has the density
where
represents
and
Additionally, represents the density function of the standardized Student-t distribution with the degree of freedom
.
dDTP
gives the density. rDTP
generates random deviates.
Liu Q, Huang X, Bai R (2024). “Bayesian Modal Regression Based on Mixture Distributions.” Computational Statistics & Data Analysis, 108012. doi:10.1016/j.csda.2024.108012.
set.seed(100) require(graphics) # Random Number Generation X <- rDTP(n = 1e5,theta = 5,sigma1 = 7,sigma2 = 3,delta1 = 5,delta2 = 6) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-100,40,length.out = 1000), y = dDTP(x = seq(-100,40,length.out = 1000), theta = 5,sigma1 = 7,sigma2 = 3,delta1 = 5,delta2 = 6), type = "l", col = "red", lwd = 3, lty = 2)
set.seed(100) require(graphics) # Random Number Generation X <- rDTP(n = 1e5,theta = 5,sigma1 = 7,sigma2 = 3,delta1 = 5,delta2 = 6) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-100,40,length.out = 1000), y = dDTP(x = seq(-100,40,length.out = 1000), theta = 5,sigma1 = 7,sigma2 = 3,delta1 = 5,delta2 = 6), type = "l", col = "red", lwd = 3, lty = 2)
The Flexible Gumbel Distribution
dFG(x, w, loc, sigma1, sigma2) rFG(n, w, loc, sigma1, sigma2)
dFG(x, w, loc, sigma1, sigma2) rFG(n, w, loc, sigma1, sigma2)
x |
vector of quantiles. |
w |
vector of weight parameters. |
loc |
vector of the location parameters. |
sigma1 |
vector of the scale parameters of the left skewed part. |
sigma2 |
vector of the scale parameters of the right skewed part. |
n |
number of observations. |
The Gumbel distribution has the density
where is the mode as the location parameter,
is the scale parameter.
The flexible Gumbel distribution has the density
where is the weight parameter,
is the scale parameter of the left skewed part and
is the scale parameter of the right skewed part.
dFG
gives the density. rFG
generates random deviates.
Liu Q, Huang X, Bai R (2024). “Bayesian Modal Regression Based on Mixture Distributions.” Computational Statistics & Data Analysis, 108012. doi:10.1016/j.csda.2024.108012.
set.seed(100) require(graphics) # Random Number Generation X <- rFG(n = 1e5, w = 0.3, loc = 0, sigma1 = 1, sigma2 = 2) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-10,20,length.out = 1000), y = dFG(x = seq(-10,20,length.out = 1000), w = 0.3, loc = 0, sigma1 = 1, sigma2 = 2), type = "l", col = "red", lwd = 3, lty = 2)
set.seed(100) require(graphics) # Random Number Generation X <- rFG(n = 1e5, w = 0.3, loc = 0, sigma1 = 1, sigma2 = 2) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-10,20,length.out = 1000), y = dFG(x = seq(-10,20,length.out = 1000), w = 0.3, loc = 0, sigma1 = 1, sigma2 = 2), type = "l", col = "red", lwd = 3, lty = 2)
The TPSC-Student-t Distribution
dTPSC(x, w, theta, sigma, delta) rTPSC(n, w, theta, sigma, delta)
dTPSC(x, w, theta, sigma, delta) rTPSC(n, w, theta, sigma, delta)
x |
vector of quantiles. |
w |
vector of weight parameters. |
theta |
vector of the location parameters. |
sigma |
vector of the scale parameters. |
delta |
the degree of freedom. |
n |
number of observations. |
The TPSC-Student-t distribution has the density
where
and
Additionally, represents the density function of the standardized Student-t distribution with the degree of freedom
.
dTPSC
gives the density. rTPSC
generates random deviates.
Liu Q, Huang X, Bai R (2024). “Bayesian Modal Regression Based on Mixture Distributions.” Computational Statistics & Data Analysis, 108012. doi:10.1016/j.csda.2024.108012.
set.seed(100) require(graphics) # Random Number Generation X <- rTPSC(n = 1e5,w = 0.7,theta = -1,sigma = 3,delta = 5) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-70,50,length.out = 1000), y = dTPSC(x = seq(-70,50,length.out = 1000), w = 0.7,theta = -1,sigma = 3,delta = 5), type = "l", col = "red", lwd = 3, lty = 2)
set.seed(100) require(graphics) # Random Number Generation X <- rTPSC(n = 1e5,w = 0.7,theta = -1,sigma = 3,delta = 5) # Plot the histogram hist(X, breaks = 100, freq = FALSE) # The red dashed line should match the underlining histogram points(x = seq(-70,50,length.out = 1000), y = dTPSC(x = seq(-70,50,length.out = 1000), w = 0.7,theta = -1,sigma = 3,delta = 5), type = "l", col = "red", lwd = 3, lty = 2)
This R package encompasses the probability density functions of three key distributions: the flexible Gumbel distribution, the double two-piece Student-t distribution, and the two-piece scale Student-t distribution, all belonging to the general unimodal distribution family, along with their corresponding sampling algorithms. Additionally, the package offers a function for Bayesian linear modal regression, leveraging these three distributions for model fitting.
Maintainer: Qingyang Liu [email protected] (ORCID)
Authors:
Xianzheng Huang [email protected] (ORCID)
Ray Bai [email protected] (ORCID)
Useful links:
Bayesian Modal Regression
modal_regression(formula, data, model, ...)
modal_regression(formula, data, model, ...)
formula |
a formula. |
data |
a dataframe. |
model |
a description of the error distribution. Can be one of "FG", "DTP" and "TPSC". |
... |
Arguments passed to |
The Bayesian modal regression model based on the FG, DTP or TPSC distribution is defined as:
where follows the FG, DTP or TPSC distribution.
More details of the Bayesian modal regression model can be found at at Liu, Huang, and Bai (2024) https://arxiv.org/pdf/2211.10776.
A draw
object from the posterior package.
Liu Q, Huang X, Bai R (2024). “Bayesian Modal Regression Based on Mixture Distributions.” Computational Statistics & Data Analysis, 108012. doi:10.1016/j.csda.2024.108012.
# Save current user's options. old <- options() # (Optional - Running Multiple Chains in Parallel) options(mc.cores = 2) if (require(MASS)) { # Need Boston housing data from MASS package. # Fit the modal regression based on the FG distribution to the Boston housing data. FG_model <- modal_regression(formula = medv ~ ., data = Boston, model = "FG", chains = 2, iter = 2000) print(summary(FG_model), n = 17) # Fit the modal regression based on the TPSC-Student-t distribution to the Boston housing data. TPSC_model <- modal_regression(formula = medv ~ ., data = Boston, model = "TPSC", chains = 2, iter = 2000) print(summary(TPSC_model), n = 17) # Fit the modal regression based on the DTP-Student-t distribution to the Boston housing data. DTP_model <- modal_regression(formula = medv ~ ., data = Boston, model = "DTP", chains = 2, iter = 2000) print(summary(DTP_model), n = 17) } # reset (all) initial options options(old)
# Save current user's options. old <- options() # (Optional - Running Multiple Chains in Parallel) options(mc.cores = 2) if (require(MASS)) { # Need Boston housing data from MASS package. # Fit the modal regression based on the FG distribution to the Boston housing data. FG_model <- modal_regression(formula = medv ~ ., data = Boston, model = "FG", chains = 2, iter = 2000) print(summary(FG_model), n = 17) # Fit the modal regression based on the TPSC-Student-t distribution to the Boston housing data. TPSC_model <- modal_regression(formula = medv ~ ., data = Boston, model = "TPSC", chains = 2, iter = 2000) print(summary(TPSC_model), n = 17) # Fit the modal regression based on the DTP-Student-t distribution to the Boston housing data. DTP_model <- modal_regression(formula = medv ~ ., data = Boston, model = "DTP", chains = 2, iter = 2000) print(summary(DTP_model), n = 17) } # reset (all) initial options options(old)