Another day, another mass data-leak. The one that caught my attention most recently was that the details of 2.6 million Duolingo users were for sale on hacking forums. To be fair, it looks to be less “hacked” and more a product of misusing the DuoLingo API in order to collect data that could include people’s real name, email address and some DuoLingo specifics such as what language the user was learning, as well as some profile settings.

Much of this except email address was technically “public” in that you could see it on a user’s profile or at least access it via the API. But by feeding the API a big list of email addresses, probably obtained from some other leak, one could confirm their association with a DuoLingo account. This join to email gives an obvious opening for scammers to send accurately personalised spams or phishes - “Hey, to continue your French lessons we need you to re-enter your credit card details here” type stuff.

As it happens I wrote about one way to programmatically access your data from DuoLingo the past. But there I promise that I was more interested in the data of my on progress rather getting everyone’s data to sell.

It’s only because I’ve used DuoLingo that this stood out to me I guess. It’s far from the worst leak in recent times. Just this month we’ve seen details of 40 million UK folk leak from the UK Electoral Commission, details of 760000 Discord.io users released (enough to shut the site down for now) and 10 million French residents exposed via a leak from their government’s unemployment registration agency.

Basically, it’s feeling ever more like if anything related to you is on any computer connected to the internet anywhere in some form other than heavily encrypted then you might consider it could potentially become public one day. In some cases even if you explicitly deleted it years ago.