R Markdown Project Instructions
Instructions:
Using the data file “Elasticity.xlsx Download Elasticity.xlsx” and R Markdown, please submit a Word document that includes:
Your answers to the questions,
The code you used, and
The output it produces.
You must submit individually via Canvas and ensure that your name appears as the first author, followed by the names of any team members you worked with. In addition to the Word document, you must also include the .Rmd file that generated it. The Word document you submit should be the one knitted from the R Markdown file—not a separate or manually created file. Please make sure your R code is clearly commented on so that others (including your instructor) can understand your steps and reasoning.
To help you get started, the following YouTube video provides an overview of how to use R Markdown. Note that the video knits to an HTML document, but for this assignment, you should choose “Word” from the output format options. You may also use the provided R Markdown templates and adapt them accordingly. If you do not have Microsoft Word installed, you can download it for free using your school credentials through our university’s Office365 portal.
https://www.youtube.com/watch?v=DNS7i2m4sB0Links to an external site.
What This Project Is About Aka the Big Picture:
We are interested in analyzing how the Demand for a product change with respect to the Price of the product, the Brand of the product, and whether the product was advertised as indicated by the variable Ad that equals 1 if the product was advertised and 0 otherwise.
We begin by exploring the relationship between Demand and Price through a simple regression. If the relationship does not appear linear based on a scatter plot, we will apply log transformations to improve model fit. From there, and using the preferred model only, we move on to include categorical predictors (Brand and Ad) and interaction terms to further understand how these factors influence price elasticity which is a measure of how responsive demand is to changes in price. Our goal is to improve the overall fit of the model and gain insights into how the additional predictors affect price elasticity.
Question 1)
a) Create the following visualizations:
A scatter plot of Demand vs Price
A box plot of Price vs Brand, and
A stacked bar plot of Brand and Ad.
Describe and interpret the patterns you observe in these plots.
b) Then, run four simple linear regressions where:
The response is either Demand or log (Demand)
The predictor is either Price or the log (Price)
In R, you can use log(x) to take the natural logarithm of a variable x. Use R² (from the full data) and RMSE from 4-fold cross-validation to evaluate model performance. Based on these metrics, identify the best model and explain your reasoning.
Hint: This is the simple regression version of Example 8.7.
c) Using your preferred model, generate a scatter plot with the regression line.
Comment on how this differs from the plot in part (a)
Report the estimated slope coefficient and interpret it clearly in terms of the original variables.
If the model includes a log transformation, adjust your explanation accordingly and explain what the slope implies on the original scale.
Hint: If a log transformation was applied, the interpretation of the coefficient changes. Refer to Table 8.11 for guidance.
d) Is the predictor statistically significant at the 2.5% level? Justify your answer using the regression output.
Question 2)
Now, run a multiple regression by adding Brand to your preferred regression from Question1. Before running the regression, you may want to create the appropriate dummy variables for Brand. (Hint: Please take a look at Example 7.4 and Example 8.2 for coding categorical variables.)
a) Report the estimated slope coefficients. Interpret each one in the context of the original variables. If your model includes log transformations, clearly explain what the estimates mean on the original scale.
b) Are the predictors significant at a significant level of 2.5%? What kind of statistical evidence does this provide with regards to the effect of the added variable and its impact on the price/demand relationship? Explain your reasoning.
c) Has the overall model fit improved compared to the simple regression in Question 1? Use both the measures of overall fit (aka goodness of fit measures) for the whole data and RMSE from 4-fold cross-validation as we learned in class.
d) Provide a visualization of the regression that shows the scatter plot along with the regression lines. Interpret what you see based on your answer to part a).
Question 3)
Now, run a multiple regression by adding the interaction between Brand and the associated Price variable on top of the variables from the regression from Question2.
a) Report the slope estimates and interpret them in terms of the original variables. If your model includes log transformations, be sure to explain what the coefficients imply on the original scale.
b) Are the predictors significant at a significant level of 2.5%? What kind of statistical evidence does this provide with regards to the effect of the added variable and its impact on the price/demand relationship? Explain your reasoning.
c) Has the overall fit of the regression improved compared to previous regressions? Explain your reasoning by using both the measures of overall fit (aka goodness of fit measures) for the whole data and 4-fold cross validation as we learned in class.
d) Provide a visualization of the regression that shows the scatter plot along with the regression lines. Interpret what you see based on your answer to part a).
Question 4)
Run a multiple regression on R Markdown by adding Ad, and its interaction with the predictors from Question 3. So your regression output should include all original predictor variables, all two-way interactions, and the three-way interaction.
a) Report the slope estimates and interpret them clearly in the context of the original variables. If the model includes log-transformed terms, explain what each coefficient means on the original scale, adjusting your interpretation appropriately.
b) Are the predictors significant at a significant level of 2.5%? What kind of statistical evidence does this provide with regards to the effect of the added variable and its impact on the price/demand relationship? Explain your reasoning.
c) Has the overall fit of the regression improved compared to previous regressions? Explain your reasoning by using both the measures of overall fit (aka goodness of fit measures) for the whole data and 4-fold cross validation as we learned in class.










