🧰 The shovel was a ground-breaking invention
This issue contains a fantastic transformer explainer from 3blue1brown, Mojo🔥, and mandatory AI use in university classes. Today we talk about unsupervised learning in ocean modelling!
Late to the Party 🎉 is about insights into real-world AI without the hype.
Hello internet,
Spring has officially arrived and the cherry trees are blossoming in Bonn!
In this issue, we’ll look at a fantastic transformer explainer from 3blue1brown, Mojo🔥, and mandatory AI use in university classes. I found some new PyCons announced and I talk about unsupervised learning in ocean modelling. Finally, I have some fun videos from around the internet! Oh, and my ChatGPT class is coming to Spotify UK!
Now let’s dive into some machine learning!
The Latest Fashion
- Probably the best transformer explainer just released by 3Blue1Brown
- The Mojo🔥 programming language is now in open-source development
- This professor made AI use mandatory in their classes and shares the learnings.
Worried these links might be sponsored? Fret no more. They’re all organic, as per my ethics.
My Current Obsession
I’ve focused on recovering from my burnout this week and started a small meditation challenge for myself. So far it’s been really nice.
I also picked up the keys for my new apartment! Very exciting!
Hot off the Press
In Case You Missed It
It looks like my class on chatGPT will be available on Spotify UK in the near future!
On Socials
My post about Gaussian Processes from Scratch was quite popular.
Python Deadlines
I emailed PyGrunn about their CfP and it’s closing this weekend!
Also PyData Paris 2024 and PyCon Taiwan 2024 are closing soon. North Bay Python 2024 closes later this week.
I found updated conference dates for PyCon Portugal, PyCon Japan, and PyCon Malaysia!
Machine Learning Insights
Last week I asked, How can unsupervised learning contribute to the exploration of oceanic data?, and here’s the gist of it:
Unsupervised learning is well-suited for advancing our understanding of data-sparse environments like the ocean. The deep sea, despite being one of the largest habitats on Earth, remains a frontier of discovery due to its inaccessible nature and the logistical challenges of deep-sea research. Here’s how unsupervised learning can play a role in light of the limited data available from these depths:
Maximizing Insights from Sparse Data
Given the high cost and difficulty of collecting data from the deep ocean, every bit of data is precious. But even in shallower waters, our data is intermittent and often sparse. Unsupervised learning can analyze this sparse data to identify patterns or anomalies without needing vast datasets. For instance, clustering algorithms can organize sparse samples of chemical compositions or biological entities into meaningful groups, revealing ecological zones or communities in the deep sea that are not apparent through direct observation.
Inferring Environmental Conditions
In regions where data is limited, unsupervised learning can help infer environmental conditions or behaviours. For example, by analyzing the distribution and types of bioluminescent organisms captured in sparse underwater images, algorithms can make inferences about the light, pressure, and nutrient conditions of different deep-sea layers.
Mapping Similar Unexplored Territories
Unsupervised learning can assist in creating maps or models of the ocean floor’s topography and habitat distributions by analyzing data from sonar and other remote sensing technologies. Techniques like clustering can categorize similar features, helping to identify underwater mountains, valleys, or hydrothermal vents without extensive labelled datasets. Here, semi-supervised learning can also be beneficial.
Discovering Unknown Patterns and Anomaly Detection
Unsupervised learning algorithms can identify previously unknown patterns within oceanic data. For instance, clustering algorithms can reveal natural groupings of water temperature, salinity levels, or marine life distributions that were not previously understood. This can lead to new hypotheses about ocean currents, climate change effects, or habitats.
Oceanic data often contains outliers or anomalies that could indicate significant events, such as oil spills, unusual marine animal behaviours, or abrupt changes in water chemistry. Unsupervised learning algorithms are adept at detecting these anomalies, which may not be evident through traditional analysis methods. This can facilitate early warning systems for ecological crises or possibly lead to discoveries of new oceanic processes.
Dimensionality Reduction and Feature Learning
Techniques like Principal Component Analysis (PCA) can reduce the dimensionality of complex oceanic datasets, making it easier to visualize and interpret the data. By focusing on the most critical variables, researchers can better understand the factors driving phenomena like El Niño events, marine biodiversity, or pollution dispersion patterns.
Moreover, unsupervised learning can automatically discover the features or characteristics that best represent the underlying structure of oceanic data. This is particularly useful in complex datasets where deciding which features are most relevant for analysis, such as studying phytoplankton diversity or the interactions between various oceanic layers, is challenging.
Self-Supervised Learning for Observation Data
Self-supervised learning, a subset of unsupervised learning techniques where the model generates its own supervision from the input data, is particularly adept at harnessing sparse time-varying data for revealing hidden patterns and dynamics within it. In the context of ocean observations, self-supervised learning can fill these gaps. By predicting parts of the data from other parts, it can learn the underlying temporal dynamics without needing densely sampled datasets. For example, a self-supervised model might predict missing environmental sensor data points based on the available surrounding measurements, effectively learning the typical progression of events or changes over time. Through self-supervised learning, sparse time-varying datasets become a rich source of insight, facilitating more accurate and comprehensive analyses of temporal phenomena where conventional ML methods might falter due to incomplete data.
Got this from a friend? Subscribe here!
Question of the Week
- How would you integrate domain-specific knowledge into a machine learning model in fields like geology or meteorology?
Post them on Mastodon and Tag me. I'd love to see what you come up with. Then I can include them in the next issue!
Tidbits from the Web
- I watched “how I fixed my attention span” and decided to do a 30-day meditation challenge for myself.
- 2 Youtubers expose scam by $4.92 billion company?
- And this Tiktok sent me
Jesper Dramsch is the creator of PythonDeadlin.es, ML.recipes, data-science-gui.de and the Latent Space Community.
I laid out my ethics including my stance on sponsorships, in case you're interested!