What the post is describing is just ANOVA. If removing a category improves the overall fit then fitting the two terms independently has the same optimal solution (with the two independent terms found to be identical). MSE never increases when adding a category.
This is why you have to reach to things that penalize adding parameters to models when running model comparisons.
This is why you have to reach to things that penalize adding parameters to models when running model comparisons.