Since the inception of modern web browsers, saving and relocating interesting pages has always been a challenge. There are local and online bookmark managers, "Read Later" applications, personal notebooks, or simply your browser history. With the advent of the social web, it has become even more complex: back when Twitter was a thing, you could star interesting tweets, on Mastodon (or Bluesky, Threads), you can like or bookmark your favorites.
I created star-collector which takes the recent bookmarks from my bookmark manager (I use Linkding) and from my Mastodon account and creates an RSS feed that can be consumed by any RSS client. I take the feed and render it on the Favorites page on this site:
Behind the Scenes: Automatically Publish with GitHub Actions π
The source code is available on GitHub: star-collector. The feed generation process runs automatically every day at midnight using GitHub Actions. The workflow checks out the repository, sets up Python, and runs the feed generator with the configured settings. The generated RSS feed is then pushed to my local host via FTP (not GitHub Pages, because I already have a host and thought, why not). This automation ensures that the feed stays current with minimal maintenance and that any new favorites from my bookmark manager or Mastodon Favorites will appear the next day.
Title Generation with AI π
One interesting challenge was handling favorites from Mastodon, where the text is more than a few words: What title should I use? A simple answer would be to trim the message and append "[...]" at the end of the title text. But aren't transformers (introduced in 2017 by this famous paper) ideal for this task?
Nowadays, it is surprisingly easy to quickly take advantage of this technology and run it locally: I use Hugging Face with its pre-trained models and the transformers
library to automatically generate titles from post content. With a few lines of code, it is surprisingly easy to run and test models: save the following few lines of code to extract.py
:
import sys
from transformers import pipeline, AutoTokenizer
def extract_title(text):
# setup model and pipeline
MODEL = "Ateeqq/news-title-generator"
pipe = pipeline("summarization", model=MODEL)
# Generate summary
result = pipe(text, max_length=40)
title = result[0]['summary_text'].replace("\n", " ")
return title
if __name__ == "__main__":
print(extract_title(" ".join(sys.stdin)))
If you have uv installed, you can run the following one-liner to obtain a title from your input text, no other steps needed:
echo "Your long text goes here [....]" | \
uv run --with 'git+https://github.com/huggingface/transformers.git' \
--with torch --no-project extract.py
Note that uv
takes care of installing the necessary Python dependencies (and the latest git version transformers
) and transformers
handles downloading the 857 MB model when you run the command for the first time - hat tip to Simon Willison for this trick.
And here is a tongue-in-cheek, "sloppy" example - using llm for the input text and then run it through the summary transformer above:
$ llm "Give me an example of an 80 word short message about Switzerland, no hashtags" | tee example.txt
Switzerland is a breathtaking blend of stunning landscapes, charming villages, and vibrant cities. Nestled in the heart of the Alps, it offers majestic mountains, pristine lakes, and lush valleys, making it a paradise for nature lovers and outdoor enthusiasts. Famous for its rich history of neutrality and diplomacy, Switzerland boasts a unique cultural diversity with four official languages. Donβt miss the delicious Swiss chocolate and cheese, along with the opportunity to explore picturesque towns like Lucerne and Zermatt.
$ cat example.txt | uv run --with 'git+https://github.com/huggingface/transformers.git' --with torch --no-project extract.py
Updated https://github.com/huggingface/transformers.git (4adc415b6)
Device set to use mps:0
Switzerland is a breathtaking blend of stunning landscapes, villages, cities & towns & villages & c'ships & hotels
I haven't taken a lot of time to search for appropriate models on Hugging Face, but Ateeqq/news-title-generator
seems to be adequate. It is based on T5 and trained on a dataset of news articles. It has a tendency to over-use acronyms and might not be the best option. While testing different models, I also used a title generator for titles from scientific papers and it gave more "scienc-y" titles π. As a next step I'd like to train my own model, based on the recently introduced ModernBert. At the moment, this is foolishly/blindly applying a model without any evals or quality assurance.
This is what I worry about: Even if I validated and optimized the model, I still cannot guarantee that something stupid, awful or offensive comes out of the transformer. But my stakes are not as high as Apple's who recently received serious complaints by the BBC, so I guess I'm fine for now.
Feedback? π
How are you managing your web favorites? Any feedback on the usage of AI? Happy to get in touch with you! Also, feel free to star the repo1.
Including GitHub as a source for favorites is considered. The challenge is that the public API doesn't store the timestamp when you star a repo, which would be ideal so it shows up at the top of the feed once starred.Β β©