🍝 Do Noodles ever get Impasta-syndrome?
Exploring the math of deep learning, new GitHub code search, Python deadlines, and integrating domain-specific knowledge in ML.
Late to the Party 🎉 is about insights into real-world AI without the hype.
Hello internet,
What a nice weekend! I had the first ice cream cup of the season yesterday!
In this issue, we’ll look at the math of deep learning, the new GitHub code search, and vector retrieval. We have a bunch of new Python deadlines. And finally, talk about domain knowledge in ML.
Let’s read about some machine learning!
The Latest Fashion
- The mathematical engineering of deep learning book
- This is the technology behind GitHub’s code search
- Here’s a paper on the foundations of vector retrieval
Worried these links might be sponsored? Fret no more. They’re all organic, as per my ethics.
Hot off the Press
In Case You Missed It
My task list sequence in Notion was probably linked somewhere as it has seen some extra attention!
On Socials
This week I shared torchdim, which still seems like a neat idea to me! I think it has made its way into Vanilla Pytorch here.
Python Deadlines
I found new deadlines for the Swiss Python Summit, PyData Amsterdam, Python Sul (Brazil), PyData Vermont. Also, FlaskCon is open!
PyCon Estonia and BelPy are closing soon.
Machine Learning Insights
Last week, I asked, How would you integrate domain-specific knowledge into a machine learning model in fields like geology or meteorology?, and here’s the gist of it:
Integrating domain-specific knowledge into a machine learning model, especially in geology or meteorology, can significantly enhance the model’s performance and ability to generalize from the data. To the point that sometimes you cannot even make it work without it! Here are some strategies:
Feature Engineering
Domain knowledge can be used to create more informative features that a machine learning model might not be able to learn on its own. For instance:
- Geology: Features might include specific rock formations, mineral content, or erosion patterns identified through domain expertise.
- Meteorology: Features could include weather patterns, historical climate data, or specific atmospheric conditions, such as weather regimes or micro-climates.
Data Augmentation
Incorporating domain-specific transformations or synthetic data generation based on expert knowledge can enhance the diversity and volume of training data:
- Geology: Simulating various geological processes under different conditions to create more training data.
- Meteorology: Generating weather patterns under unobserved but possible scenarios based on climate models.
Pre-trained Models & Transfer Learning
Using models pre-trained on domain-relevant data can provide a good starting point. For example:
- Geology: Using models pre-trained on similar types of rock or terrain data.
- Meteorology: Employing models initially trained on large climatological datasets.
Transfer models from broader earth science applications to specific tasks in geology or meteorology, adjusting the final layers or parameters to fit the specific domain.
Physics-informed Machine Learning
Integrating physical laws directly into the learning algorithms:
- Geology: Ensuring that model predictions for seismic activity adhere to known geological constraints.
- Meteorology: Incorporating atmospheric dynamics and fluid mechanics into weather prediction models to respect conservation laws and physical dynamics.
Hybrid Models
Combining traditional numerical models with machine learning:
- Geology: Using machine learning techniques to parameterize components of geological models, such as sediment transport in hydrology models.
- Meteorology: Machine learning models could predict parameters in weather forecasting models, improving the overall predictions of complex systems like hurricanes.
Got this from a friend? Subscribe here!
Question of the Week
- What’s the potential of deep learning in enhancing real-time response to natural disasters?
Post them on Mastodon and Tag me. I’d love to see what you come up with. Then, I can include them in the next issue!
Jesper Dramsch is the creator of PythonDeadlin.es, ML.recipes, data-science-gui.de and the Latent Space Community.
I laid out my ethics including my stance on sponsorships, in case you're interested!