Not every website has an RSS feed. Some never did. Some had one years ago and quietly removed it. And some sites have content that updates regularly but was never structured as a feed in the first place: job boards, product listings, event calendars, changelog pages. Until now, if a site didn’t offer RSS, you were out of luck.
Web Feeds is a new feature that creates RSS feeds from any website. Point it at a URL, and NewsBlur analyzes the page structure, identifies the repeating content patterns, and generates extraction rules that turn the page into a live feed. It works on news sites, blogs, job boards, product pages, or really anything with a list of items that changes over time.
This is a huge feature and has been requested for years. I’m so thrilled to finally be able to offer it in a way that I feel comfortable with. Other solutions including having you select story titles on a re-hosted version of the page, but it was clumsy and error-prone. This way, we use LLMs to figure out what the story titles are likely to be, present the variations to you, and then let you decide what’s right. So much better!

Open the Add + Discover Sites page and click the Web Feed tab. Paste a URL and click Analyze. NewsBlur fetches the page, strips out navigation and boilerplate, and analyzes the HTML structure. Within a few seconds, you’ll see multiple extraction variants, each representing a different content pattern found on the page.
Progress updates stream in real-time while the analysis runs. NewsBlur typically finds 3-5 different extraction patterns on a page. The first variant is usually the main content (article list, blog posts, product grid), but sometimes the page has multiple distinct sections worth subscribing to. Each variant shows a label, a description of what it captures, and a preview of 3 extracted stories so you can see exactly what you’d get.

Select the variant that matches what you want to follow, pick a folder, and subscribe. NewsBlur will re-fetch and re-extract the page on a regular schedule, just like any other feed.
Sometimes the initial best guess isn’t what you’re looking for. Maybe the page has a blog section and a job listings section, and you want the jobs. Click the Refine button and type a hint like “I’m looking for the job postings.” NewsBlur re-analyzes the page with your hint in mind and reorders the variants to prioritize what you described.
For each story, NewsBlur extracts whatever it can find: title, link, content snippet, image, author, and date. Not every field will be available on every site, and that’s fine. At minimum you’ll get titles and links. The extraction uses XPath expressions, which means it’s precise and consistent across page refreshes as long as the site’s HTML structure stays the same.
Websites redesign. HTML structures shift. When NewsBlur detects that the extraction rules have stopped working (after 3 consecutive failures), the feed is flagged as needing re-analysis. You’ll see a feed exception indicator, and you can re-analyze the page with one click to generate updated extraction rules.
Some examples of sites that work well with Web Feeds:
Web Feeds are available to Premium Archive and Premium Pro subscribers. The ongoing feed fetching and extraction runs on NewsBlur’s servers like any other feed.
If you have feedback or ideas for improvements, please share them on the NewsBlur forum.
Les Orchard:
I started programming in 1982. Every language I’ve learned since then has been a means to an end — a new way to make computers do things I wanted them to do. AI-assisted coding feels like the latest in that progression. Not a rupture, just another rung on the ladder.
But I’m trying to hold that lightly. Because the ladder itself is changing, the building it’s leaning against is changing, and I’d be lying if I said I knew exactly where it’s going.
What I do know is this: I still get the same hit of satisfaction when something I thought up and built actually works. The code got there differently than it used to, but the moment it runs and does the thing? That hasn’t changed in my over 40 years at it.
I’ve been thinking about a different divide than the one Orchard writes about here. (The obvious truth is that the AI code generation revolution is creating multiple divisions, along multiple axes.)
The divide I’m seeing is that the developers who are craftspeople are elated because their productivity is skyrocketing while their craftsmanship remains unchanged — or perhaps even improved. They’re achieving much more, much faster, than ever before. It’s a step change as great, or greater than, the transition from assembly code to higher-level programming languages. The developers who are hacks are elated because it’s like they’ve been provided an autopilot switch for a task they never enjoyed or really even understood properly in the first place. The industry is riddled with hack developers, because in the last 15-20 years, as the demand for software far outstripped the supply of programmers who wanted to write code because they love writing code and creating software, the jobs have been filled by people who got into the racket simply because they were high-paying jobs in high demand. Good programmers create software for fun, outside their jobs. Hack programmers are no more likely to write software for fun than a garbage man is to collect trash on his days off.
Orchard’s fine essay examines a philosophical divide within the ranks of talented, considerate craftsperson developers. The divide that I’m talking about has been present ever since the demand for programmers exploded, but AI code generation tooling is turning it into an expansive gulf. The best programmers are more clearly the best than ever before. The worst programmers have gone from laying a few turds a day to spewing veritable mountains of hot steaming stinky shit, while beaming with pride at their increased productivity.
This week’s big headline is: “Silicon Valley is buzzing about this new idea: AI compute as compensation“. Uh huh. [Business Insider]
The idea is that instead of getting paid dollars to work for an AI company … you get paid in AI tokens. The units that the AI vendor charges API access in. You have to use these tokens in your job, too.
This is not in any way a “new” idea. It’s company scrip — a company’s own made-up money that you can only spend in the company store. Company scrip was always just a scam, and paying workers in company scrip has been illegal in the US since 1938. But you know these guys don’t care.
Other companies would love to pay workers in AI tokens too! And not in, y’know, money.
Tech-illiterate CEOs and venture capitalists keep talking about AI tokens like they’re a commodity you can pile up. Even though tokens aren’t commensurable at all between different models.
The key point is that they love the idea of printing their own money. It’s the word “token.” They want AI tokens to be treated like crypto tokens. Something you can print out of thin air, then exchange like it’s money.
This idea was most recently floated by Thibault Sottiaux at OpenAI: [Twitter, archive]
I am increasingly asked during candidate interviews how much dedicated inference compute they will have to build with Codex.
So firstly, I don’t believe anyone’s asking that. But OpenAI president Greg Brockman retweeted Sottiaux. So this idea is the OpenAI corporate line. [Twitter, archive]
AI bros have previously promoted Universal Basic Income once the AI singularity comes — and not a moment before. Even though we could do this tomorrow — the main barrier to a reasonable welfare system is whiny billionaires who hate being taxed. Specifically, these guys.
Remember that Sam Altman is still a crypto bro, with his proof-of-eyeballs magic bean Worldcoin. Altman’s been pushing the idea of a universal basic income — or universal basic compute — made of AI tokens for a few years now. This is Altman on the All-In Podcast in May 2024, talking to his fellow billionaires: [YouTube]
I wonder if the future looks something more like Universal Basic Compute than Universal basic income, and everybody gets a slice of GPT7’s compute and they can use it, they can resell it, they can donate it to somebody to use for cancer research, but what you get is not dollars but this productivity slice, you own part of the productivity.
This wasn’t a one-off. Here’s Altman again last May, on the Theo Von podcast: [YouTube]
I mean a crazy idea, but in the spirit of crazy ideas is, if the world, there’s like eight roughly eight billion people in the world. If the world can generate eight quintillion tokens per year, if that’s the world, actually let’s say the world can generate 20 quintillion tokens per year. Tokens are like each word generated by an AI. Okay, just making up a huge number here. We’ll say 12 of those go to the normal capitalistic system, but eight of those eight quintillion tokens are going to get divided up equally among eight billion people. So everybody gets one trillion tokens and that’s your universal basic wealth globally.
Altman really likes the idea of made-up credit at OpenAI being the money now. Because he’s a crypto bro.
This token as money talk leaves me wondering if the investment in the AI companies is getting shaky. Nvidia’s just said this latest OpenAI investment round might be the last: [Reuters]
Nvidia CEO Jensen Huang said the latest investments in OpenAI and Anthropic might be the chipmaker’s last in those companies, as the AI companies prepare to go public this year.
Nvidia is about to spend $26 billion building its own open weight AI model too. [Wired]
I’m also wondering if the AI vendors are running a bit low on actual cash dollars, and not just promises and letters of intent.
The good news is that even though these bozos are all sociopaths, AI is not so useful, and more tokens for the AI aren’t so useful either. Unless you’re a terminal vibe coder and probably working at an AI vendor.
I don’t think a lot of people will accept a Copilot allowance in place of actual money. I owe my SOUL.md to the company store.
Every AI agent that fetches web content is playing Russian roulette with prompt injection. I’ve been researching this problem since early March, and I think most people building autonomous agents (like OpenClaw instances) haven’t fully internalized how bad it is. When your AI agent fetches a web page, every piece of that content flows directly into the model’s context window, and attackers can embed instructions in that content. They use hidden HTML divs, zero-width Unicode characters, fake LLM delimiters, and social engineering disguised as helpful advice. The agent can’t tell the difference between your instructions and the attacker’s, because to the model, it’s all just text. If you want a concrete example of how this plays out in the real world, the Clinejection attack earlier this year compromised roughly 4,000 developer machines through a prompt injection embedded in a GitHub issue title, which eventually led to credential theft and a malicious package publish. The AI triage bot had tool access and processed untrusted input in the same context, which is exactly the mistake this project tries to prevent.
I spent weeks digging through the academic literature, evaluating existing tools, and testing real attack patterns before writing a single line of code. Google DeepMind’s CaMeL paper proposed a rigorous P-Agent/Q-Agent architecture that’s theoretically sound but practically unusable. Their reference implementation is an abandoned research artifact that Google explicitly says they won’t fix bugs on. Lasso Security’s MCP Gateway looked promising until I discovered their prompt injection detection requires a commercial API key. Simon Willison has been writing about the Dual LLM pattern for years, and he’s right about the core insight: you need to keep untrusted content away from your privileged tools. But nobody had shipped a production-ready MCP server that actually does this, so I figured I might as well start building one myself, even if it meant getting a lot of things wrong at first.
MCP-Airlock is an open source MCP server that extracts web content through a three-layer defense system, where each layer uses a fundamentally different detection approach. I didn’t plan it as three layers from the start. It evolved that way as I kept finding attack patterns that bypassed whatever I’d already built, and I think that iterative process is actually what makes the architecture sound. Let me walk through how that evolution happened.
The first thing I built was a deterministic seven-stage sanitization pipeline, which became Layer 1. It strips hidden HTML elements, invisible Unicode characters, encoded payloads, data exfiltration URLs, and fake LLM delimiters. Since there’s no LLM involved at this stage, the whole thing runs fast and doesn’t cost anything, and for the specific structural attacks I designed it to catch, I haven’t seen it produce a single false negative yet. If someone hides instructions in a display:none div or fragments keywords with zero-width characters, this layer catches it every time.
Then I added Meta’s Llama Prompt Guard 2 as Layer 2, a 22-million parameter classifier running locally via ONNX Runtime. I almost cut this layer entirely after initial testing showed it only caught attacks that Layer 1 already handles. But after I ran systematic threshold tuning against 50 samples, I found that Prompt Guard catches five attack categories that completely bypass Layer 1’s regex patterns, things like “forget your training,” “override safety protocols,” and roleplay-based jailbreaks that use imperative behavioral commands instead of the explicit override language that regex can match. The classifier runs in about 30-80 milliseconds on CPU and costs nothing, and I’m glad I didn’t rip it out when I was tempted to.
I thought I had a pretty decent setup with the first two layers, but then I started throwing some of the more subtle, social-engineering style attacks at it and realized that both a regex parser and a trained classifier are basically blind to someone writing a convincing-sounding paragraph that tricks the agent into exfiltrating data. That’s what led to Layer 3, a quarantined Gemini Flash-Lite instance that acts as a Q-Agent. I set it up so it has no tool access at all, and on top of that it has no memory, which means there are fewer ways for an attacker to actually hijack anything important. This layer catches attacks like DAN (“Do Anything Now”) persona assignment, where the injected text defines a new AI character that has no ethical restrictions and instructs the model to respond as that character instead of following its original instructions. It also catches fake state transitions claiming policies have already been updated, and the really subtle ones where social engineering is disguised as article content that frames data exfiltration as a helpful action. Prompt Guard 2 scores these below 0.05, and Layer 1 doesn’t see them at all.
MCP-Airlock ships with tools for both cautious and paranoid use cases. The safe_fetch and safe_read tools run Layer 1 only and reject content when injection is detected, which works well for trusted domains where you want speed. The quarantine_fetch and quarantine_read tools run all three layers and warn but proceed, passing only the Q-Agent’s sanitized extraction to your primary agent. There’s also quarantine_scan for pre-flight threat assessment without returning any content, and I recently added safe_search and quarantine_search tools that pipe web searches through the same three-layer defense.
Now, I’m pretty sure there are still holes in this approach, but I think it’s a meaningful step forward. If you’re running OpenClaw instances that make web requests, I’d strongly encourage you to route them through MCP-Airlock instead of letting your agents fetch web content directly. The interface is MCP only, and it’s designed to run in a separate container from your autonomous agent, which means the trust boundary is enforced at the container level, not just by hoping the model follows instructions. Even Layer 1 by itself strips a significant amount of attack surface, and the full three-layer pipeline is a much better position to be in than raw, unfiltered web content flowing straight into your agent’s context window. It’s a small configuration change that eliminates entire categories of prompt injection.
For security researchers, the project is AGPL-licensed and the code is on GitHub. It takes about two minutes to install via pip install mcp-airlock-crunchtools or podman run quay.io/crunchtools/mcp-airlock. If you find an attack pattern that gets past all three layers, file an issue. I want to know about it and I want to fix it.