Meta (Facebook) bot user agents

Bill Statler

US, Mon, 09 Sep 2024 02:15:47 +0200 from Bunny of Doom

I think this is a current list, if anyone wants to block access by user agent:

FacebookExternalHit
The primary purpose of FacebookExternalHit is to crawl the content of an app or website that was shared on one of Meta’s family of apps, such as Facebook, Instagram, or Messenger.
Note that the FacebookExternalHit crawler might bypass robots.txt when performing security or integrity checks, such as checking for malware or malicious content.

Meta-ExternalAgent
The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.

Meta-ExternalFetcher
The Meta-ExternalFetcher crawler performs user-initiated fetches of individual links to support specific product functions. Because the fetch was initiated by a user, this crawler may bypass robots.txt rules.

FacebookBot
FacebookBot crawls public web pages to improve language models for our speech recognition technology.

Sources:
https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
https://developers.facebook.com/docs/sharing/bot/

#Facebook #Meta #WebCrawling

Bill Statler

Mon, 09 Sep 2024 02:21:20 +0200 from Bunny of Doom

In addition to the above,

facebookexternalua
is the user agent used by Threads.

Haakon Meland Eriksen (Els Mussols)

Mon, 09 Sep 2024 07:37:13 +0200 from Streams at els Mussols

Thanks for the list.