We propose a methodological pipeline for analyzing the links between knowledge recombination and patent collaboration using techniques from machine learning and computational linguistics. Utilizing a dataset containing more than 7 million patents granted by the US Patent Technology Office from January 1976 to June 2020, we employ an ensemble of machine learning methods to convert patent abstracts into numerical vectors, construct text-based patent metrics (i.e., novelty, usefulness and significance), and characterize knowledge recombination between global collaborative and non-collaborative patents. After controlling for temporal variation and cross-technology differences, our results suggest that global collaborative patents tend to be more novel, yet less similar to follow-on innovations in the same technology field. Moreover, while team size has a (ceteris paribus) negative effect on patent novelty, inventions made by larger teams tend to be more significant. Our results provide empirical contributions to the scholarly debate surrounding knowledge diversity and team composition and emphasize the role of language in revealing knowledge content in inventive activity.