Satisfaction Factor Analysis
Project Type: Individual (Coursework)
Activities: Identification and Formulation of Research Question, Collection of Primary Data, Preliminary Data Analysis, Make Recommendations and Identify Use Cases
Mentor: Prof. Swapnajit Chakraborti
Tools: Tableau, RStudio, Microsoft Excel
Process Map:
Background:
This individual project enabled me to understand the practical analysis and application of machine learning. Using this, I was able to assess factors influencing student satisfaction and prospective students' acceptance of admission into FLAME University.
The goal of this project was to create a plan to identify those aspects of student experience and admissions that the university can make the most of by way of improving the student experience and attracting more students. The project included the collection of primary data from the students and alumni of FLAME University and aimed to answer the following problems:
What is the main correlating factor that leads to acceptance of admission to FLAME University?
What are the main correlating factors that result in student satisfaction
Based on the aforementioned questions, what are some areas that FLAME University can focus on as areas of improvement?
Problem Statement:
Which factors contribute most to student satisfaction in FLAME University?
Methods & Tools
Application of Tools:
Tableau
RStudio
For those who said "yes" to the variable "overall satisfaction," we see that those variables that have the highest average rating are "availability of academic services," "campus facilities," and "Student-run clubs."
In order to shortlist variables to include in the model, those variables which had a rating of 7+ were chosen
The final 5 variables selected to be part of the models to estimate student satisfaction include:
Technologicalintegrationrating - How well technology has been integrated into the campus
Diversityrating - How diverse is the student body
Campusfacilitiesrating - How do the students find the campus facilities
Studentrunclubsrating - Are there a high number of student-run clubs and societies
Academicservicesrating - Is there a high quality in academic services coupled with a high quantity of academic resources
Linear regression, ridge regression and lasso regression models were executed on the dataset using RStudio to understand which combination of variables results in the highest student satisfaction.
The image on the left is the output of five linear regression models executed on the dataset:
The first model contained all 5 of the shortlisted variables
The subsequent models had one less variable in each model by using the method of backward elimination of feature selection. This was done to observe whether a reduction of any variable was bringing about a higher student satisfaction than other models
From the image on the right we can see that the first model, which contained all 5 shortlisted variables, is the model that results in the highest student satisfaction as it has the lowest RMSE value and highest R-squared value (Lower the RMSE value, better the model and higher the R-squared value, better the model)
This tells us that all 5 of the shortlisted variables are significant in contributing toward a high level of student satisfaction
The ridge and lasso regression models provided the same results as that of the linear regression models
Why Use Machine Learning & Data Analysis:
Correlation
Models
Explanation
Machine learning models such as linear regression are used to establish the relationship between different variables. This was required to identify which variables are significant to aid the analyst in deciding what parameters he/she should pay attention to.
Machine learning uses the data to perform functions such as regression and learns from the model every time it executes the data set. As a result, the machine can provide an accurate and reliable model that is suitable towards the analyst's goal of prediction, correlation or probability.
While having access to data and performing functions on the dataset have their own importance, I find it as valuable to be able to draw insights from the data collected and present it in a comprehensible manner so that it can be made sense of. A combination of machine learning and visualizations seemed to be a perfect choice for this objective
Tableau was used to create preliminary visualizations to gain a better understanding of the data collected
A variety of visuals were created via Tableau in order to gain a preliminary understanding of the data and form inferences
Aside from visualizations, Tableau was used for storytelling and creating dashboards to give viewers a comprehensive understanding of the dataset
RStudio was used to implement models such as linear regression, ridge regression and lasso regression on the data with the aim of answering the research questions.
Analysis
Tableau Visualization & RStudio Output: