Wget is a command-line utility for downloading files.
The official description on its man page on my Linux distribution says that it is “free utility for non-interactive download of files from the Web”, and that it “supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.”
It especially comes in handy when I need to download an image from another Web resource. I’ve never had any problem using it until just today when I got a 403 Forbidden error message. The edited output is shown in this code block:
wget http://www.example-site.com/image.png --2015-01-12 19:23:14-- http://www.example-site.com/image.png Resolving www.example-site.com (www.example-site.com)... 555.111.111.111, 555.111.113.112 Connecting to www.example-site.com (www.example-site.com)|555.111.111.111|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2015-01-12 19:23:14 ERROR 403: Forbidden.
Years ago I read somewhere that that usually indicates an Apache server that has been configured via .htaccess to deny file downloads or something of that nature. Since the image file was something I really needed for an article, I decided to poke around the documentation and search the Internet for clues. That effort drew my attention to one of the tools options – the –user-agent or -U option.
Here’s what the man page has to say about it:
The HTTP protocol allows the clients to identify themselves using a “User-Agent” header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current version number of Wget.
However, some sites have been known to impose the policy of tailoring the output according to the “User-Agent”-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more frequently, Microsoft Internet Explorer. This option allows you to change the “User-Agent” line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.
Specifying empty user agent with –user-agent=”” instructs Wget not to send the “User-Agent” header in HTTP requests.
With that, I decided to retry the download by specifying an empty user agent as shown in this code block:
wget http://www.example-site.com/image.png --2015-01-12 19:23:14-- http://www.example-site.com/image.png Resolving www.example-site.com (www.example-site.com)... 555.111.111.111, 555.111.113.112 Connecting to www.example-site.com (www.example-site.com)|555.111.111.111|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 39946 (39K) [image/png] Saving to: ‘image.png’ image.png 100%[=========================>] 39.01K --.-KB/s in 0.07s 2015-01-12 19:37:58 (567 KB/s) - ‘image.png’ saved [39946/39946]
That was all it took. There might be other situations where the –user-agent or -U option likely fail, but in this specific case, that was what the doctor ordered.