š When you write a slow Rust program, it's EsCargo
Late to the Party š is about insights into real-world AI without the hype.
Hello internet,
Happy weekend! I took a lovely bike tour yesterday to soak up the last sun rays. Letās enjoy some machine learning this morning!
The Latest Fashion
- Have you tried the Potato Prompt yet?
- Scammers have now taken to fake AI-generated Travel Guidebooks
- AI models are transforming weather forecasting!
Got this from a friend? Subscribe here!
My Current Obsession
Unfortunately, I was a bit unwell earlier this week. The heat and the pressure culminated in me checking out for a few days.
But I dug myself out by setting up a Proxmox server! Itās usually an enterprise-grade virtualisation layer, but it works just as well on a nice tiny 100ā¬ office PC. I started with the classic of setting up a Piholeāa little DNS home solution that can block any traffic you donāt want on your local network. So far, I have about 2 million ad and tracking domains blocked, and they make up about 15% of traffic, depending on which app I open on my PC or phone. Some are definitely greedier than others!
Iām now working on finding a replacement for good old Google Reader, which I still mourn. Trying out tiny tiny RSS now. I am happy for recommendations!
Thing I Like
Iāve been playing around with setting up a little home server for myself to run a few different apps locally and have some storage for my music collection that is readily available. I thought this would be really expensive, but I picked up a refurbished office computer that is perfect for this! Iāve been loving the little challenges to work with this and make it exactly how I want it to work!
Did you know thereās an open-source Google Photos alternative?!
Hot off the Press
Iāve been a bit more quiet this week. But thereās still some things you may have missed!
On Socials
I wrote a nice educational post about how we learned about gradient descent in school, but machine learning evolved into AdamW optimisers.
My post about improving LLM prompts didnāt take, though. Pretty sure I was being too wordy, which is fair enough.
Google now bakes in spyware into Chrome to generate a usage profile of you as you browse. There was some informative conversation under that post!
Python Deadlines
No new Python deadlines. Time to take a breather!
If you find new conferences or CfPs, please feel free to submit a pull request!
Machine Learning Insights
Last week, I asked, What are the big problems applying machine learning to weather forecasts?, and hereās the gist of it:
Applying machine learning to weather forecasting is a complex and challenging task.
About a year ago, it was still considered near-impossible by experts in the field to build a purely data-driven weather forecasting system.
Simply due to the fact that there are several significant problems and limitations in this field.
1. Data Quality and Quantity:
- Problem: Weather forecasting relies heavily on historical and real-time data. Ensuring the quality and quantity of this data is a big challenge. Missing or inaccurate data can lead to less reliable forecasts.
- Example: Imagine a weather prediction model trying to forecast a hurricaneās path. The forecast may be less accurate if it doesnāt have access to accurate and timely data on factors like sea surface temperatures and wind patterns.
- Solution: The ERA5 dataset prepared at the ECMWF has been essential in the process of training all current open-source data-driven weather forecasting systems. This provides a consistent historical weather state over the last 80 years globally. And weāre, of course, all waiting for ERA6 to drop in 2024.
2. Complex Atmospheric Processes:
- Problem: Weather systems are governed by complex physical and atmospheric processes. Machine learning models often simplify these processes, which can lead to inaccurate predictions, especially for extreme weather events.
- Example: Predicting the formation and behaviour of thunderstorms requires understanding intricate interactions between temperature, humidity, and wind patterns. Machine learning models may struggle to capture these complexities.
- Solution: Build extensive evaluation pipelines and build sophisticated models that can capture multi-scale processes accurately.
3. Limited Spatial and Temporal Resolution:
- Problem: Machine learning models have limitations in handling high-resolution spatial and temporal data. Weather phenomena can vary greatly over short distances and time periods, making it challenging to capture fine-grained details.
- Example: Forecasting localised events like flash floods or tornadoes requires very high-resolution data. If the modelās spatial or temporal resolution is too coarse, it may miss these events.
- Solution: Generate high-resolution training data that captures the complexity of such events.
4. Enormous Computational Requirements:
- Problem: Weather models require vast computational resources due to the sheer volume of data and complex simulations. Running machine learning models for high-resolution, real-time forecasts demands significant computational power.
- Example: Simulating a global weather model at high resolution can require supercomputers. This limits the accessibility of advanced forecasting methods to smaller institutions or regions with limited resources.
- Solution: Make generated datasets available, such as ERA5, to enable access to computational resources through cloud infrastructure, and since inference is much cheaper to run, itās crucial to make model weights available to run these models for researchers to enable anything from verification to fine-tuning.
5. Rapid Model Degradation:
- Problem: Weather forecasting models degrade in accuracy over time, especially for longer-term forecasts. This is because small errors in initial data can magnify as forecasts extend into the future.
- Example: A one-week weather forecast may be reasonably accurate for the first few days but less reliable for days four, five, and beyond due to the accumulation of errors.
- Solution: Train on ārolloutsā that evaluate multiple model steps and propagate the gradients through the model multiple times.
6. Incorporating Non-Meteorological Data:
- Problem: While machine learning can help incorporate non-meteorological data (like social and economic factors) into weather models for more holistic predictions, integrating these diverse datasets accurately remains a challenge.
- Example: Predicting the impact of a severe storm on a city requires considering not only meteorological factors but also the cityās infrastructure, population density, and emergency response capabilities.
- Solution: Interact and consult with local communities and help build resilient infrastructure and decision-makers.
7. Communication of Uncertainty:
- Problem: Weather forecasts inherently involve uncertainty. Effectively communicating this uncertainty to the public and decision-makers is a significant challenge to ensure people make informed decisions.
- Example: When a weather model predicts a 60% chance of rain, itās crucial for the public to understand what this means and how to interpret the uncertainty.
- Solution: Use ensemble forecasts to capture different weather scenarios and communicate their meaning adequately.
8. Climate Change:
- Problem: Climate change is both changing the baseline of the atmospheric variables (such as temperature), as well as making extreme weather events more likely.
- Example: The model has only seen a specific set of historic weather states, and the news constantly breaks with new ever-recorded global temperature highs that are outside of the training distribution.
- Solution: Automatic retraining, continuous monitoring of model performance, and possibly building models that capture the underlying physical connections, which would make the model resilient to some changes.
Despite these challenges, machine learning continues to advance weather forecasting capabilities. Researchers and meteorologists are working to address these issues through improved data collection, more sophisticated modelling techniques, and enhanced computing resources. While perfect weather prediction may remain elusive, ongoing progress in machine learning offers the promise of increasingly accurate and timely forecasts.
This week, CrowJane mentioned the difficulties of climate change in weather forecasting machine learning models. Philipp Birken made a great point about the importance of proper evaluation to make sure these models generalise from its training data, especially since weather data is security critical.
Data Stories
Back in middle school, when we learned about different cities in the US, one thing stuck out.
Many US cities have a grid layout.
It permeates culture, where we know US movies and sitcoms will often reference ācorner 5th and 16th streetā. The only named street is Wall Street.
That's not surprising since many movies are set in New York, Los Angeles, Seattle or Washington. These cities are laid out at 90Ā° anywhere you go.
But itās refreshing to see the utter chaos that is Boston.
Those polar plots show the frequency of alignment of different streets in the area, and you can clearly see it makes sense.
The organic shape of Boston is also what we see in many old European cities. (The reason I got my first smartphone. I would always get lostā¦)
How does your city look like?
Check the publication in the source to see the biggest cities worldwide!
Source: Geoff Boeing (check out figures 4 and 5 too!)
Question of the Week
- What are different ways to build ensemble weather forecasts with ML?
Post them on Mastodon and Tag me. Iād love to see what you come up with. Then, I can include them in the next issue!
Job Corner
The deadline is tomorrow!
The ECMWF is hiring 5 people who touch machine learning right now!
Four positions in the core team to develop a data-driven machine learning weather forecasting:
- Machine Learning Engineer: Focus on model optimisation and parallel implementations to train large machine learning models on vast datasets. Prior experience with deep learning frameworks, model optimisation, and memory footprint improvements is essential. Background in earth-system modelling is welcomed.
- Observations and Data Assimilation Expert: Interface observations with machine learning algorithms and play a vital role in data assimilation. Exceptional interpersonal skills and expertise in using earth-system observations are highly valuable for this role.
- Machine Learning Scientist for Learning from Observations: Contribute to making future earth system predictions from observation data using deep learning frameworks. Experience in earth-system observation data and data assimilation approaches is desirable.
- Machine Learning Scientist for Precipitation: Specialise in accurate precipitation predictions with generative machine learning models. Experience with GANs, VAEs, or Diffusion approaches is advantageous, along with expertise in using neural networks for precipitation prediction.
And one on the EU project Destination Earth
You will leverage cutting-edge machine learning techniques and statistical methods to support uncertainty quantification for weather-induced extremes in the revolutionary Destination Earth (DestinE) Digital Twin. Your work will contribute to more accurate and reliable predictions, shaping the future of weather forecasting and its impact on climate understanding and resilience. If youāre a proactive and talented individual with a passion for Earth System Science and a flair for machine learning, apply now and make a meaningful difference in tackling climate challenges.
(I like to stress that these positions are, as always, written by a committee, so if identify as part of an under-represented minority, please consider applying, even if you donāt hit every single bullet point.)
Tidbits from the Web
- A complete history of the Youtube algorithm
- How oddly satisfying golf ball pyramids are stacked
- How to start many computers in a single $100 computer
Jesper Dramsch is the creator of PythonDeadlin.es, ML.recipes, data-science-gui.de and the Latent Space Community.
I laid out my ethics including my stance on sponsorships, in case you're interested!