Relevant to the post I sent earlier today, https://www.howtogeek.com/is-microsoft-using-your-word-documents-to-train-ai/says that an unnamed spokesperson at Microsoft claims that “Microsoft does not use customer data from Microsoft 365 consumer and commercial applications to train large language models.
A vast number of fortune 5000 companies use OneDrive for secure storage. Microsoft is not going to enter into parallel lawsuits with that threat.
You can use 365 to do simple textual analysis from models MSFT already has trained for simple things like sentiment analysis.
I’m not sure where or how using a textual classification service devolves into widespread violation of intellectual property law. Perhaps people worrying about it should read their contracts for use? That usually clears up any question.
I had to answer similar questions on projects 5-6 years ago. The master services agreements in force with companies cover this.
So I too freaked out over this one and ended doing my homework and seeing that it looks legit. It is optional, it sends data only when asked, it follows sound data management practices of minimizing data sent and retention time.
The one remaining issue is, many users reported being opted in by default. Which could be partly explained by company policy or users clicking away the nag box without reading.
Still.
When both the CEO and CTO make shockingly dumb statements in public about IP, and they partner with OpenAI of all people, and own parts of LinkedIn which pulled a silent opt-in stunt just now, and roll out piracy-based, celeb deepfake engine DallE3 as safe to minors with their OS, there is negative trust to build on.
Microsoft is constantly prompting me to back up my documents on their cloud server (OneDrive). Is their intention here to make my data more readily available for training their LLMs?
I just recently yanked everything I have off of Google Drive, and I never trusted OneDrive due to it's various eccentricities.
And because I'm a cheap bastard I'm using OpenOffice now for writing and spreadsheets.
It's inconvenient as hell, but I don't trust Google or Microsoft any more. Not just as regards to scraping all my writing into an LLM, but because I don't trust them to not abruptly lock my documents in place and charge me a fee to use them.
Google Docs and Drive have been great, but as they say, if the product is free you are the product.
But they might (and in fact do, if I understand the training they offered) use them to produce the boundary API systems which are not LLMs (because not ANNs) - that possibility is consistent with that statement. I have tried to raise this question officially and gotten nowhere, but I will also keep pressing.
Under GDPR this is punishable by fines. It remains to be seen whether Trump will stand with us Europeans to protect users or on the side of companies because "they are American".
Thanks for asking for clarity. So far, very little is explained nor is permission requested from consumers. What about teach-ins, town halls all across the country explaining objectives of this technology. Why passive partners?
That option has been around for a long time, I don't think it has anything to do with AI unless they've extended the definition secretly to include it recently. But given that it would cost them 4% of their global turnover in the world's most obvious GDPR violation, it's unlikely
It depends I suppose on what you mean by training. Maybe the lawyers define what they do as using data not training.
Thank you. 🙏
Another rule: PR = BS!
"Facebook cares about the mental health of teens, pinky promise."
A vast number of fortune 5000 companies use OneDrive for secure storage. Microsoft is not going to enter into parallel lawsuits with that threat.
You can use 365 to do simple textual analysis from models MSFT already has trained for simple things like sentiment analysis.
I’m not sure where or how using a textual classification service devolves into widespread violation of intellectual property law. Perhaps people worrying about it should read their contracts for use? That usually clears up any question.
I had to answer similar questions on projects 5-6 years ago. The master services agreements in force with companies cover this.
So I too freaked out over this one and ended doing my homework and seeing that it looks legit. It is optional, it sends data only when asked, it follows sound data management practices of minimizing data sent and retention time.
The one remaining issue is, many users reported being opted in by default. Which could be partly explained by company policy or users clicking away the nag box without reading.
Still.
When both the CEO and CTO make shockingly dumb statements in public about IP, and they partner with OpenAI of all people, and own parts of LinkedIn which pulled a silent opt-in stunt just now, and roll out piracy-based, celeb deepfake engine DallE3 as safe to minors with their OS, there is negative trust to build on.
m$crosoft is not to be trusted at all - see this: https://sneak.berlin/20200307/the-case-against-microsoft-and-github/ .
Windows 11 is one of the most horrible oses they released if not the one.
Microsoft is constantly prompting me to back up my documents on their cloud server (OneDrive). Is their intention here to make my data more readily available for training their LLMs?
I doubt it, they've been bothering me about that since 2018
I just recently yanked everything I have off of Google Drive, and I never trusted OneDrive due to it's various eccentricities.
And because I'm a cheap bastard I'm using OpenOffice now for writing and spreadsheets.
It's inconvenient as hell, but I don't trust Google or Microsoft any more. Not just as regards to scraping all my writing into an LLM, but because I don't trust them to not abruptly lock my documents in place and charge me a fee to use them.
Google Docs and Drive have been great, but as they say, if the product is free you are the product.
Reading the original post, it looks to me that people are confusing training and runtime.
But they might (and in fact do, if I understand the training they offered) use them to produce the boundary API systems which are not LLMs (because not ANNs) - that possibility is consistent with that statement. I have tried to raise this question officially and gotten nowhere, but I will also keep pressing.
Under GDPR this is punishable by fines. It remains to be seen whether Trump will stand with us Europeans to protect users or on the side of companies because "they are American".
Thanks for asking for clarity. So far, very little is explained nor is permission requested from consumers. What about teach-ins, town halls all across the country explaining objectives of this technology. Why passive partners?
Why would anyone waste cycles training AI with a bunch of random documents?
The cesspool of the internet is bad enough. But data from a bunch of word documents, not curated or verified in any way?
That option has been around for a long time, I don't think it has anything to do with AI unless they've extended the definition secretly to include it recently. But given that it would cost them 4% of their global turnover in the world's most obvious GDPR violation, it's unlikely
keep them on their toes, Marcus! They can say what they will but it's a fine line that we're "trusting" them.