🥤 Dr. Pepper was a Fizz-ician
From Pytorch on VRs to backdoor vulnerabilities in Linux and a massive PythonDeadlin.es upgrade, all alongside my personal move!
Late to the Party 🎉 is about insights into real-world AI without the hype.
Hello internet,
where has the time gone? Happy egg holiday to all of you who partake!
In this issue, we’ll examine Pytorch on VRs, type-hinting in Python, a huge backdoor vulnerability in Linux, a massive upgrade to PythonDeadlin.es, and integrating diverse datasets in machine learning models. Also, I’m moving house!
I unfortunately haven’t been doing so well, but let’s enjoy some machine learning together:
The Latest Fashion
- Run Pytorch models on Mobile or even VR with ExecuTorch
- You can upgrade your CV step-by-step with ChatGPT
- Why type-hinting (sometimes) sucks in Python
Worried these links might be sponsored? Fret no more. They’re all organic, as per my ethics.
My Current Obsession
I massively burnt out last week. I was really taking care of myself after organising the machine learning training for member states at ECMWF. Still, then, another emergency happened, and I started waking up in pain and with breathing problems. But it’s better today; taking a sick day and the long weekend are to be thanked! So, apologies for being on more of a biweekly schedule currently.
On that note, we finished the ECMWF machine learning training that I organised. I think it went very well. We went from “this is machine learning” to graph neural networks and data-driven weather forecasting. Many of my colleagues were on for lessons, and the excitement around the topic was palpable. I am thrilled to have been involved in this despite the massive workload.
Made a massive upgrade to PythonDeadlin.es again; more below!
And I found a new place to live closer to work! Moving is terrifying, but I’m also excited for new things to come! Personally, I hope it’ll make it easier for me to socialise, but we’ll see how that goes. After all, wherever you go, there you are. It’ll still be me in that new place.
Thing I Like
Using my foam roller to relieve some of the worst tension in my shoulder probably helped a lot with my recovery over the last few days. I love that thing.
Hot off the Press
In Case You Missed It
I’ve been asked if I wanted to update my advice on buying a laptop for machine learning. I think the advice still stands, but there might be some updates regarding LLMs?
Maybe someday.
On Socials
I shared about the XZ backdoor in Linux, which took off on Mastodon. (Check if you’re affected!)
Parsr was quite popular on Linkedin for information extraction from PDF documents.
People are still trying to figure out how to write tests for machine learning. Of course, there’s also the testing section for ml.recipes.
Python Deadlines
I translated PythonDeadlin.es to German and Spanish! It was a huge effort, and unfortunately, the compile time went up significantly. But I think it’s worth it. My Spanish is already quite dodgy, and I’m only learning Portuguese on Duolingo, so I’m not sure I should take a stab at adding Portuguese as well.
Let me know what you think!
We have EARL 2024 closing today and Pydata Paris and PyCon Taiwan later next week.
I added 24 new PythonDeadlin.es, so I will skip a complete list here and refer you to the actual page!
Additionally, I dove into the history of Python conferences step by step. This has been a monumental effort, but I think it's pretty awesome to be able to see the complete history of PyCon US, for example!
I still have about 300 conferences to go. So, this effort will be ongoing!
Machine Learning Insights
Last week, I asked, “How do you address the challenge of integrating diverse data types (like satellite imagery and ground sensor data) in ML models?” and here’s the gist of it:
Integrating diverse data types, such as satellite imagery and ground sensor data, into machine learning (ML) models presents a unique set of challenges. These challenges stem from the heterogeneity of the data in terms of format, scale, and the information they contain.
However, addressing these challenges is essential for applications in fields like meteorology, where combining different data sources can significantly enhance the predictability of NWP models.
Here are several strategies to effectively integrate these diverse data types into ML models:
- Data Preprocessing: The first step involves preprocessing data to a common format or scale. In conventional weather modelling, this is usually called Observation Processing and Data Assimilation. For instance, observations from satellite data might need to be processed to match the spatial resolution of ground sensor data or vice versa. Additionally, different data types require normalisation in machine learning to ensure that they are on a similar scale, making them easier to combine and process by the ML model.
- Feature Engineering: Extracting meaningful features across multiple data types is crucial. For satellite imagery, this might involve extracting common indices like NDVI that add information to the data. Feature engineering allows the model to find patterns across disparate data sources.
- Fusion Techniques: Data fusion techniques can be applied to effectively combine data from multiple sources. Fusion can occur at different levels:
- Data-level fusion involves combining the raw data before input into the ML model. This could mean stacking satellite images and sensor readings into a single multi-dimensional input vector.
- Feature-level fusion combines features extracted from different data sources into a comprehensive feature set from which the model can learn.
- Decision-level fusion involves running separate models on each data type and then combining their predictions to reach a final decision. This method, related to ensemble models, leverages the strengths of each data source independently before making a comprehensive prediction.
- Deep Learning Architectures: Certain deep learning models can be combined into hybrid models, such as Convolutional Neural Networks (CNNs) for image data and Recurrent Neural Networks (RNNs) for sequential sensor data. These hybrid models can effectively process and learn from image and sequential data. For example, a CNN could process satellite imagery, while an RNN handles time-series data from sensors, with both models’ outputs being integrated into a final prediction layer.
- Graph Neural Networks: Sometimes, our data sources aren’t available in a nice stacked matrix. In meteorology, we have a diverse data set from different sources that are only sometimes available and sometimes at different locations. This is where dynamic graph neural networks can leverage their unique structure and combine varying data sources.
- Transfer Learning: Transfer learning involves using a model trained on one task as the starting point for a model on a second task. This can be particularly useful when integrating satellite imagery with sensor data, as pre-trained image processing models can be fine-tuned with sensor data.
- Multi-task Learning: Multi-task learning, on the other hand, trains a model on multiple tasks simultaneously, allowing it to learn shared representations that are beneficial for processing both data types.
- Self-supervised Learning: This technique involves unsupervised learning. The model usually learns to fill in masked “missing data” from the original training data. This leads to a model that can be robust to missing values and fill these in in the first processing step towards a prediction.
By leveraging these strategies, ML models can effectively integrate diverse data types like satellite imagery and ground sensor data, providing comprehensive insights that are particularly valuable in fields such as meteorology with real-world data.
Got this from a friend? Subscribe here!
Question of the Week
- How can unsupervised learning contribute to the exploration of oceanic data?
Post them on Mastodon and Tag me. I’d love to see what you come up with. Then, I can include them in the next issue!
Tidbits from the Web
- I’ve been closely following the XZ vulnerability; this timeline was quite interesting.
- I discovered the streamer “Nicole Belafonte”, who acts in character as a time traveller out of the 60s. Super fun to chill out in.
- Also remember to be extra skeptical on the internet tomorrow.
Jesper Dramsch is the creator of PythonDeadlin.es, ML.recipes, data-science-gui.de and the Latent Space Community.
I laid out my ethics including my stance on sponsorships, in case you're interested!