Why Do Students Drop Out of University? Performance Matters More Than Demographics
The fact that roughly half of bachelor’s students fail to complete their studies is a challenge faced by universities worldwide. For institutions, this represents financial losses; for society, it means untapped potential in the labor market. But is it possible to identify in advance which students are at risk of dropping out—and offer them timely support? Research by Petr Berka and Luboš Marek from the Faculty of Informatics and Statistics at Prague University of Economics and Business shows that at-risk students can indeed be identified early and with high accuracy.
The First Year and Lost Credits Are Crucial
When analyzing student success, factors such as age, gender, or type of secondary school are often considered. However, the research shows that these input characteristics have minimal predictive power. The strongest warning signal is the proportion of unearned (lost) credits.
If a student fails to complete required courses, they can fall into a vicious cycle: in the following semester, they must put in extra effort to catch up, which increases stress and the likelihood that they will eventually decide to leave their studies. The ideal moment for early warning comes at the end of the first year. At this stage, models can predict with about 85% accuracy who will persist, while still leaving enough time for intervention or academic counseling.
Language Barriers as a Secondary Factor
The data also revealed that nationality plays a role in programs taught in Czech. Students whose native language is neither Czech nor Slovak are more likely to drop out. This is likely not due to a lack of subject knowledge, but rather a language barrier—they may struggle to adapt to the pace of instruction and the broader university environment in a foreign language.
Interpretability vs. Accuracy: Why Trees “Won”
The researchers compared different algorithms trained on historical data, focusing on two main approaches:
- Logistic Regression: A traditional statistical method that calculates the probability of an outcome based on a combination of multiple factors.
- Decision Trees: A logical structure that divides students into groups based on specific criteria (e.g., “Did the student earn more than 20 credits? If not, are they an international student?”).
Although logistic regression proved slightly more accurate on historical data, decision trees ultimately prevailed. They were more stable (trees built for students from different years showed little variation) and more robust (they performed reliably on data from new cohorts). For practical use, their interpretability is also crucial—faculty leadership and academic advisors can clearly see the key factors behind academic failure and understand why a particular student has been classified as at risk.
About the Research and Methodology
The analysis was conducted by Petr Berka and Luboš Marek from the Faculty of Informatics and Statistics at Prague University of Economics and Business. They worked with anonymized data from bachelor’s students between 2013 and 2018, examining two types of variables:
- Entry characteristics: Age, gender, type of secondary school, and citizenship
- Academic performance: Number of earned and lost credits and grade point averages across semesters
The aim was to determine which group of variables carries more weight in predicting dropout risk early on. What makes this study unique is its temporal validation: instead of testing models only on data from the same year, the researchers examined whether a model built on students from 2013 could accurately predict the outcomes of students who enrolled several years later.
Looking Ahead
In conclusion, the authors note that a university is not a homogeneous entity. Individual faculties or fields of study may have their own specifics and particularly challenging courses. Future analyses should therefore focus on these finer differences to make student support even more targeted and effective.
Artificial intelligence tools were consulted in the preparation of this popular science article.