The group project is worth 20% of your final grade and is compulsory which means you must undertake the assessment or you shall receive zero marks (0) for this assessment. Note that this is a group assessment and you must work in a group of 3-4 students. The deadline for submitting the assignment is 4pm, 27^{th} May 2016.
Assignment submission consists of the submission of both a written report and an excel file showing all of your calculations. Your written report should take the form of a concise professional report and apply the following page set up and formatting:
Font: 12pt Times New Roman.
Margins: 2.5cm in all sides.
Line spacing: 1.5 lines.
Length: maximum 12 pages including any tables, graphs, appendices and references.
Failure to follow any of the above formatting instructions will result in an immediate penalty of 2 marks (out of 20). Markers will not read material beyond the 12-page limit.
There are no specific formatting requirements for the excel file apart from that all calculations used for the written report must be clearly presented and labelled according to the section of the written report for which the output was used.
There are two parts to be included in the written report:
Part A: Quantitative analysis
Part B: Group organisation and contribution
Both are described in detail below.
Part A: Quantitative analysis (out of 100)
The data file “sales.xlsx” contains information on the purchase value of the last transaction for 900 customers from a sportswear retailer in Australia, along with a range of other potential explanatory variables. The variable names along with their definitions are given in the table below:
ID | Variable Name | Description |
1 | birth_year | Birth year of the customer |
2 | gender | Customer’s gender |
3 | first_transaction_year | First transaction year |
4 | online-trans? | A dummy variable representing existence of an online transaction |
5 | avg_retail_value_per_trans | Average retail value per transaction |
6 | avg_days_between_trans | Average days between two consequent transactions |
7 | weekend-or-weekday? | Day of the week when last sales transaction was done |
8 | value-of-last-purchase | Value of the last sales transaction |
The company is examining the relative importance of variables 1 to 7 as determinants for the value of customer’s purchase (For all the tests use α=5%).
- (15 marks) Based on the given data in “first-transaction-year”, calculate the life time of each customer. Store the new variable in a column called “life-time” (i.e. “life-time” = 2016 – “first-transaction-year”). Also, calculate customer’s age from the data given in “birth-year” and store the new variable in a column called “age”. Provide a descriptive analysis of the following variables and discuss the results:
Age |
life-time |
Gender |
avg_days_between_trans |
- (5 marks) The company has recently run a survey asking customers how often they make a purchase. Results from the survey indicate that each customer makes a purchase at least twice a year. Can you test this result based on the given data?
Hint: Use a mean hypothesis test on “avg_days_between_trans”.
- (10 marks) When devising the marketing plan, the company assumes female customers spend more, on average, per transaction and also customers spend more in each transaction during the weekend. As a result, their advertising investment is aimed at attracting female customers and increasing the number of people visiting the stores over the weekend. Do you believe this investment policy is wise? Test the following two hypotheses to answer this question:
- Women spend more than men on average in each transaction (hint: use the variables “avg_retail_value_per_trans” and “gender” to conduct a difference between means test)
- The average spend during weekend is more than weekdays (hint: use the variables “value-of-last-purchase” and “weekend-or-weekday?” to conduct a difference between means test)
- (10 marks) The company has proposed a multivariable regression model to predict the customers’ purchase value. In this model, value-of-last-purchase is considered to be the dependent variable and the independent variables are:
age |
life-time |
Gender |
avg_retail_value_per_trans |
weekend-or-weekday? |
Estimate the regression model and report your estimated regression equation including the standard errors and R^{2}. Interpret the regression coefficients and examine the normality of the regression residuals.
Hint: You will need to replace some variables with appropriate dummy variables for the both the analysis here and below.
- (10 marks) Conduct a two-sided test for the significance of each estimated coefficient at the 5% level of significance. State which coefficients are significant and give your reasons (provide the general hypothesis testing steps once only).
- (10 marks) Discuss the pairwise correlations between following variables:
age |
birth_year
gender |
life-time |
online-trans? |
avg_retail_value_per_trans |
avg_days_between_trans |
weekend-or-weekday? |
If you want to model the dependent variable value-of-last-purchase within a multivariable regression framework, are there any variables that cannot be used together as explanatory variables? Why or why not?
- (15 marks) The company has asked for your expertise to help determine the best regression model to explain the value-of-last-purchase using the variables provided. They are considering three alternatives. The first alternative is the regression model discussed in section d) above. The second is a regression model using the following explanatory variables:
Age |
Gender |
life-time |
online-trans? |
avg_retail_value_per_trans |
weekend-or-weekday? |
And the third alternative is a regression model using the following explanatory variables:
Age |
Gender |
life-time |
online-trans? |
avg_retail_value_per_trans |
avg_days_between_trans |
weekend-or-weekday? |
Analyse these three options and choose your preferred model. Motivate your choice.
- (15 marks) Based on your analysis in sections f) and g) above, or otherwise, can you suggest any improvements to the model chosen in g)? Present and justify your chosen final model including any diagnostic testing and residual analysis you perform.
Note: A further 10 marks will be allocated to assessing your overall presentation and the Excel file containing your calculations.
Part B: Group organisation and contribution
Provide an outline of the contribution of each member of the group and a score out of 10 for their level of effort along with supporting evidence in the form of a record of meetings and progress. If a member is not included as part of the submission, it is assumed that they have not made a contribution and this will result in a mark of zero (0) for that student. Make sure to bring to our attention all cases of unfair contribution, preferably with supportive documentation.
Part B should be no more than one page. Material that spans longer than one page will be ignored. A template example on how to present the required information is provided below.
Example of one-page support information
Name and SID | Brief description of contribution | Score (out of 10) |
1. C Li (123456789) | Introduction/Executive Summary | 5 |
2. A Kim (789123456) | Descriptive Statistics, graphs and formatting/editing | 8 |
3. G Schmidt (369852741) | Regression model set up and hypothesis testing | 10 |
4. J Bow (400019952) | Hypothesis tests and interpretation of regression | 10 |
Summary of Meetings (extract)
Date/time of meeting | Initials of attendees | Discussions held and decisions made |
7/04/2016 | CL, AK, GS | Initial allocation of duties – to inform JB accordingly. AK to work on data description, GS on regression, CL and JB on hypothesis test and forecasts |
12/04/2016 | CL, AK, GS, JB | Initial data description discussed, … |
Marking Criteria
You will be assessed on your ability to:
- Use appropriate descriptive methods to analyse the data
- Correctly use hypothesis testing procedures for one- and two-sample framework.
- Use and correctly interpret inferential statistics in the context of regression.
- State the required statistical assumptions and the limitations of the analysis.
- Write a professional report, using proper expression, language and formatting, as well as clearly and appropriately presenting any relevant graphs and tables
- Provide a quality Excel spreadsheet that can be used to confirm calculations.
- Use appropriate transformations and specification testing to examine the relationship between value-of-last-purchase and other explanatory variables in a regression context.
- Communicate the required information and conclusions obtained with technical terms, but also in layman’s language where appropriate, i.e. in a manner appropriate for any prospective decision makers/managers to use and understand.
- The final report will have a 12 page MAXIMUM page limit, including Part A, Part B and any tables, graphs, appendices and references. Presentation will be a component of the marking criteria, in addition to the requirements listed in the assignment.