🌬 A dissipated storm is a mist opportunity
This issue covers a security breach at Hugging Face, extreme weather prediction, PyCon talks, privacy-preserving machine learning, climate data exploration, personal updates, and more!
Late to the Party 🎉 is about insights into real-world AI without the hype.
Hello internet,
how is it mid-March already?!
In this issue, we have the first big security issue of Hugging Face models, extreme weather prediction (featuring some of my work!), and popular PyCon talks. We talk about privacy-preserving machine learning and exploring climate data! Also don’t miss a cute dog video all the way at the bottom!
So let’s dive into our machine learning shenanigans!
The Latest Fashion
- Hugging Face hosted multiple backdoored machine learning models
- Can AI help us predict extreme weather? Featuring work I was involved in!
- All PyCon US/AU 2023 talks sorted by view count
Worried these links might be sponsored? Fret no more. They’re all organic, as per my ethics.
My Current Obsession
I started playing Pokemon Go again! And I just finished collecting the 151 original Pokemon! I’ve been enjoying this new motivation to go outside.
Generally, the fact that it’s getting light out earlier has massively improved my life lately. Been feeling much better!
I’m also thinking about creating more videos and maybe some short courses on Skillshare again. But after these intense work weeks, I don’t have much energy on weekends to do more creative things. I don’t know how others do it.
Thing I Like
I just bought and installed water cooling for my PC. The Arctic Freezer III was on sale, so I figured I could make it fit… It barely did, but now I have more RGB in my little gaming station!
Hot off the Press
In Case You Missed It
I think my Notion Task List hack for ADHD brains seems to have been shared somewhere again!
On Socials
I shared this YouTube playlist of the MIT course on Data-centric AI, which was quite popular.
ECMWF announced a new update to the AIFS forecasting system, including some scores and data announcements!
Python Deadlines
The PyCon Colombia will close in two days!
I got a lovely Pull Request for the Wagtail CMS conference in June! The Wagtail Netherlands CfP already closed, but you can still attend!
Generally, some folks shared the new Series view for Python conferences, which is awesome!
Machine Learning Insights
Last week I asked, In what ways can AI contribute to personalized user experiences without compromising privacy?, and here’s the gist of it:
Privacy-preserving machine learning has some history. It’s not easy, but these technologies are absolutely fun to read up on!
Anonymization and Data Masking: The most straightforward approach is to remove or mask personal identifiers before data is used for training AI models. This process ensures that the AI can learn from patterns in the data without being able to link the data back to any specific individual. While this method may reduce the level of personalization achievable, it significantly enhances privacy.
Federated Learning: This is a decentralized approach to training AI models. Instead of sending user data to a central server for training, federated learning trains models directly on users' devices. The model learns from data on the device, and only the model's updates (not the raw data) are sent back to the central server to improve the model. This method allows for personalization without compromising privacy since personal data never leaves the user's device.
Differential Privacy: This technique adds noise to the datasets used for training AI models, making it difficult to identify individual data points within the dataset. This means that AI can learn from patterns in the data without compromising the privacy of the individuals whose data contributed to those patterns. Differential privacy ensures that AI-driven personalization does not expose or misuse personal information.
Edge AI: Similar to federated learning, on-device processing keeps the user's data on their own device. AI algorithms run directly on the user's device, using the data available on the device to personalize the experience. Since the data does not leave the device, user privacy is maintained. This method is particularly useful for personalizing mobile apps and smart home devices.
Homomorphic Encryption: This advanced encryption method allows data to be encrypted in such a way that AI algorithms can still process and learn from it without ever decrypting it. This means that sensitive data can be used to train AI models without exposing the actual data, providing a powerful tool for privacy-preserving personalization. But it’s honestly still very experimental.
Minimal Data Collection: AI can be designed to use minimal data or to focus on collecting only non-sensitive data for personalization. By carefully selecting the data that is necessary for personalization, AI can reduce privacy risks. For instance, a weather app might personalize forecasts based on a user's general location without needing to know their exact address.
User Consent and Control: Finally, empowering users with control over their data is crucial. AI systems can be designed to offer users choices about what data they share and how it is used for personalization. Transparent policies and easy-to-use privacy settings ensure that users are informed and in control of their data.
Got this from a friend? Subscribe here!
Data Stories
We always hear about our changing climate.
But we rarely get to see it…
Of course, we can look at the climate stripes.
But what if we could actually look at the data?
The Copernicus Interactive Climate Atlas enables this! 30 different values from 8 state-of-the-art datasets.
Super interesting to click and zoom around!
Source: Copernicus
Question of the Week
- What methods do you recommend for ensuring fairness in AI algorithms, especially in high-stakes scenarios?
Post them on Mastodon and Tag me. I'd love to see what you come up with. Then I can include them in the next issue!
Tidbits from the Web
- This video is stolen probably, but this dog knows how to live!
- Wild-life photographer vents his frustration about obvious AI-generated fakes
- Half-way through this video, I forgot it was a joke and was just impressed
Jesper Dramsch is the creator of PythonDeadlin.es, ML.recipes, data-science-gui.de and the Latent Space Community.
I laid out my ethics including my stance on sponsorships, in case you're interested!