Thursday, July 27, 2023

Classifying Text using Gzip and KNN

 A new paper came out at ACL 2023 which showed that using gzip compression in combination with the k-nearest neighbours algorithm can achieve accuracy similar to that of state-of-the-art deep learning models such as BERT on text classification. Not only that, due to the fact that this technique is non-parametric, it beats those large deep learning models on out of distribution data samples. If this intrigues you, then read my article on substack which explains the findings of the paper in simple terms. 

No comments:

Post a Comment