Data Science and Machine Learning Resources

A collection of useful links to resources and tools.
data science
Author

Allen Kamp

Published

September 25, 2022

This is a living document and will continue to be updated. Last update 25-October-2022

Free Online Classes & Learning Resources

One of the great things about the fields of data science and machine learning is the number of free courses from industry leaders.

fastai


Books, Authors & Publishers

  • Green Tea Press - Allen Downey provides a collection of his books for free. Titles include Elements of Data Science, Think Stats, Think Bayes, Think Python and more. Many examples in Python with supporting code.

  • An Introduction to Statistical Learning - A foundational text that covers many methods for statistical learning with computers. Such as Linear Regression, Classification, Sampling, Tree-Based methods, Unsupervised Learning/Clustering & Deep Learning. Also a supporting course with videos

  • Handbook of Regression Modeling in People Analytics Keith McNulty - Examples in R & Python. Free webversion, but book can also be purchased.

  • Packt - Packt produce a wide variety of accessible books on technical topics. Like anything, the quality can vary by author. There is a free trial for 7 days to access all their books. And generally if you buy more than two books from Packt a year, consider getting a yearly subscription. - it is beter value, you can access all the books and also select some to keep in electronic form.

  • Manning MEAP - Manning MEAP program allows early access to books as they are being written and can purchase at a discount price. This is a an excellent way to learn new technologies and software, often before there are any mainstream resources available.

  • James D, McCaffrey - Blog. A researcher/developer at Microsoft and regular contributor of Machine Learning Columns to MSDN and Visual Studio Magazine. Many great ideas & snippets, across classical machine learning, statistics, and deep learning. Writes in C#, Javascript and Python. Like a box of chocolates..

  • Weights & Biases - Articles - Weights & Biases or (wandb) is a fantastic service for tracking your machine learning project runs and to generate reports. (free for individuals). However, they are also one of the best educations sources for Machine Learning too. Weights & Biases - Videos

  • R for Data Science - The starting point for learning R and Data Science. Co-authored by Hadley Wickham


ML News & Interviews

  • ML News. Regular news style updates. A simple way to keep up with current happenings in ML.

  • Machine Learning Street Talk. In depth topic interviews.

  • Lex Fridman Interviews - Lex interviews the great minds of machine learning and artificial intelligence on his video podcast.

  • Two Minute Papers - Catch the latest papers in video, summmarised in two minutes.

  • Microsoft Research Podcast - Interviews with researchers in Microsoft in diverse research areas. What is their current focus area and the problems they are trying to solve?


Tools & Libraries

The choices have varied over the years, but currently falling into two camps. Tensorflow and Pytorch. In the end you will probably learn both, but currently more new papers are in Pytorch than Tensorflow. It is not too hard to move from one to the other.

  • TensorFlow - Much improved ease of use since V2 and using Keras as the primary API. If you want to run ML in the browser via javascript. Tensorflow is the better option over Pytorch with tensorflow.js. Keras API can be used with multiple backends, not just Tensorflow, so there may be future systems that will use it.

  • Pytorch - Whilst Google was still inflicting dreadful programming interfaces with Tensorflow V1, much of the research world moved to Pytorch. It is flexible and expandible and well supported with blogs and books.

  • FastAI - Fastai library extends Pytorch with a layed API, providing SOTA defaults and best practice training loop and increased productivity.

  • ML.Net - ML.Net provides a AutoML tools and ML pipeline for use on dotnet platforms. It has promise, but documentation & examples are limited and often broken.

  • FLAX Flax is a high-performance neural network library and ecosystem for JAX that is designed for flexibility. The new cool kid on the block.

  • Pytorch Image Models (timm) - timm is a deep-learning library created by Ross Wightman and is a collection of SOTA computer vision models, layers, utilities, optimizers, schedulers, data-loaders, augmentations and also training/validating scripts with ability to reproduce ImageNet training results.

  • Hugging Face Spaces - Share your (Python) ML model applications in a few minutes. Spaces are a simple way to host ML demo apps directly on your profile or your organization’s profile. This allows you to create your ML portfolio, showcase your projects at conferences or to stakeholders, and work collaboratively with other people in the ML ecosystem.

  • Shiny - Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.

  • TorchSharp - TorchSharp is a wrapper .NET library that provides access to the library that powers PyTorch. Allowing C# and F# to more easily use Pytorch from those languages on Windows & Linux. It is part of the .NET Foundation. Examples repo.


Papers

Finding and reading papers is a key skill to develop particularly in Deep Learning, where you may have to implement the technique yourself because it may not make it into your favourite tools for a while.

  • Papers With code - I’m a firm believer that results in papers should be replicable. Papers with code is an excellent resource for machine learning papers that have provided code and also shows the SOTA (state of the art) results for papers and related benchmark datasets.

  • Semantic Scholar - An excellent free AI-powered research tool for scientific literature. Results includes citations (and the egotistical H-Index)

  • W&B Paper Reading Group - Aman Arora hosts this Deep Learning Paper reading group for beginners. Step by step Video walk-throughs of key papers covering ML architectures such as Resnets, DETR, Squeeze & Excitation Nets etc. A great place to begin learning through reading papers.

  • Two Minute Papers Catch the latest papers in video, summmarised in two minutes.

  • NLP Progress Tracks the state of the art (SOTA) for numerous Natural Language Processing tasks.

  • Trending Papers Displays trending ML papers, recent, weekly or monthly feed-view. Also has search.

  • Annotated Papers A collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better.


Blogging Tools

  • Jekyll Cheatsheet - Options for use with Jekyll static sites. This cheat sheet serves as a quick reference of everything Jekyll can do.
  • FastPages - Deprecated. Use Quarto instead. FastPages to Quarto Migration Guide FastPages: Turn your Jupyter Notebooks, Word and Markdown documents into Blog posts. fastpages automates the process of creating blog posts via GitHub Actions, so you don’t have to fuss with conversion scripts.
  • Quarto - Quarto® is an open-source scientific and technical publishing system built on Pandoc
    • Create dynamic content with Python, R, Julia, and Observable.
    • Author documents as plain text markdown or Jupyter notebooks.
    • Publish high-quality articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more.
    • Author with scientific markdown, including equations, citations, crossrefs, figure panels, callouts, advanced layout, and more.
  • Markdown Guide for Jupyter - A guide to the Markdown format, so you can get the most from your Jupyter Notebook writing.