暗无天日

=============>DarkSun的个人博客

如何更改url package访问HTTP时的user-agent header

有些网站会根据 http request 中的 user-agent header 的值返回不同的response,例如 http://wttr.in 会根据就会根据 user-agent 是否为 curl 来决定是返回带图片的HTML,还是字符拼接图案的文本。

一开始我以为修改 url package 中的 user-agent 就是直接把相应的 header 内容加到 url-request-extra-headers 中就行了,事实证明我还是太天真了,这样做的后果是会产生两个 user-agent header...

(let ((url-debug t)
      (url-request-extra-headers '(("User-Agent" . "curl/7.78.0"))))
  (kill-buffer (url-retrieve-synchronously "http://wttr.in"))
  (with-current-buffer "*URL-DEBUG*"
    (keep-lines "^User-Agent" (point-min) (point-max))
    (buffer-substring-no-properties (point-min) (point-max))))
User-Agent: URL/Emacs Emacs/28.0.50 (X11; x86_64-pc-linux-gnu)
User-Agent: curl/7.78.0

在翻阅了 url manual 之后才知道,原来 url 专门有个变量用来控制 user-agent:

url-user-agent is a variable defined in ‘url-vars.el’.

Its value is ‘default’

  You can customize this variable.
  This variable was introduced, or its default value was changed, in
  version 26.1 of Emacs.
  Probably introduced at or before Emacs version 25.1.

User Agent used by the URL package for HTTP/HTTPS requests.
Should be one of:
* A string (not including the "User-Agent:" prefix)
* A function of no arguments, returning a string
* ‘default’ (to compute a value according to ‘url-privacy-level’)
* nil (to omit the User-Agent header entirely)

所以修改 user-agent header 的正确方法是修改 url-user-agent 这个变量的值:

(let ((url-debug t)
      (url-user-agent "curl/7.78.0"))
  (kill-buffer (url-retrieve-synchronously "http://wttr.in"))
  (with-current-buffer "*URL-DEBUG*"
    (keep-lines "^User-Agent" (point-min) (point-max))
    (buffer-substring-no-properties (point-min) (point-max))))
User-Agent: curl/7.78.0