So something funny I learned when I was(still am?) in a Machine Learning program at university: all the initial large language models and chatbots were built on top of open source documents, mostly in court documents. The biggest tranche of documents was the entirety of Enron's emails and other internal communications. In many ways, it's still the foundation of these models as well. And if you have read how they communicated internally at Enron, you have to imagine how much filtering has to occur to pull out that much wretched vocabulary. Also, you have to imagine that part of the data is inherently deceptive and corrupt, cause, you know, ENRON.
Anyways, what I am really trying to say is that I wish I went into bioinformatics.
