Pisahkan fitur dan target, lalu training model:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
X = df[['age', 'gender', 'study_hours_per_day', 'social_media_hours']]
y = df['performance']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = DecisionTreeClassifier(max_depth=4)
model.fit(X_train, y_train)
4. Evaluasi Model
Evaluasi dilakukan menggunakan akurasi, precision, recall, dan F1-Score untuk mengukur keakuratan model memprediksi performa mahasiswa.
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
5. Visualisasi Pohon Keputusan
Visualisasi pohon membantu memahami faktor penentu utama, seperti waktu belajar dan penggunaan media sosial.
from sklearn.tree import plot_tree
plt.figure(figsize=(20,10))
plot_tree(model, feature_names=X.columns, class_names=['Low', 'Medium', 'High'], filled=True)
plt.show()