Satisfaction Factor Analysis

Project Type: Individual (Coursework)
Activities: Identification and Formulation of Research Question, Collection of Primary Data, Preliminary Data Analysis, Make Recommendations and Identify Use Cases
Mentor: Prof. Swapnajit Chakraborti
Tools: Tableau, RStudio, Microsoft Excel

Process Map:

Background:

This individual project enabled me to understand the practical analysis and application of machine learning. Using this, I was able to assess factors influencing student satisfaction and prospective students' acceptance of admission into FLAME University.

The goal of this project was to create a plan to identify those aspects of student experience and admissions that the university can make the most of by way of improving the student experience and attracting more students. The project included the collection of primary data from the students and alumni of FLAME University and aimed to answer the following problems:

  • What is the main correlating factor that leads to acceptance of admission to FLAME University?

  • What are the main correlating factors that result in student satisfaction

  • Based on the aforementioned questions, what are some areas that FLAME University can focus on as areas of improvement?

Problem Statement:

  • Which factors contribute most to student satisfaction in FLAME University?

Methods & Tools

Application of Tools:

Tableau

RStudio

  • For those who said "yes" to the variable "overall satisfaction," we see that those variables that have the highest average rating are "availability of academic services," "campus facilities," and "Student-run clubs."

  • In order to shortlist variables to include in the model, those variables which had a rating of 7+ were chosen

  • The final 5 variables selected to be part of the models to estimate student satisfaction include:

    • Technologicalintegrationrating - How well technology has been integrated into the campus

    • Diversityrating - How diverse is the student body

    • Campusfacilitiesrating - How do the students find the campus facilities

    • Studentrunclubsrating - Are there a high number of student-run clubs and societies

    • Academicservicesrating - Is there a high quality in academic services coupled with a high quantity of academic resources

  • Linear regression, ridge regression and lasso regression models were executed on the dataset using RStudio to understand which combination of variables results in the highest student satisfaction.

  • The image on the left is the output of five linear regression models executed on the dataset:

    • The first model contained all 5 of the shortlisted variables

    • The subsequent models had one less variable in each model by using the method of backward elimination of feature selection. This was done to observe whether a reduction of any variable was bringing about a higher student satisfaction than other models

    • From the image on the right we can see that the first model, which contained all 5 shortlisted variables, is the model that results in the highest student satisfaction as it has the lowest RMSE value and highest R-squared value (Lower the RMSE value, better the model and higher the R-squared value, better the model)

    • This tells us that all 5 of the shortlisted variables are significant in contributing toward a high level of student satisfaction

  • The ridge and lasso regression models provided the same results as that of the linear regression models

Why Use Machine Learning & Data Analysis:

Correlation

Models

Explanation

  • Machine learning models such as linear regression are used to establish the relationship between different variables. This was required to identify which variables are significant to aid the analyst in deciding what parameters he/she should pay attention to.

  • Machine learning uses the data to perform functions such as regression and learns from the model every time it executes the data set. As a result, the machine can provide an accurate and reliable model that is suitable towards the analyst's goal of prediction, correlation or probability.

  • While having access to data and performing functions on the dataset have their own importance, I find it as valuable to be able to draw insights from the data collected and present it in a comprehensible manner so that it can be made sense of. A combination of machine learning and visualizations seemed to be a perfect choice for this objective

  • Tableau was used to create preliminary visualizations to gain a better understanding of the data collected

  • A variety of visuals were created via Tableau in order to gain a preliminary understanding of the data and form inferences

  • Aside from visualizations, Tableau was used for storytelling and creating dashboards to give viewers a comprehensive understanding of the dataset

  • RStudio was used to implement models such as linear regression, ridge regression and lasso regression on the data with the aim of answering the research questions.

Analysis

Tableau Visualization & RStudio Output: