Downloading whole websites

Ever needed to download an entire website for offline viewing? I didn’t really need to download it for myself. A client needed an offline version of his website. wget to the rescue, I thought!

Showstopper

wget is not included on OS X… bummer! You can, however, install it from source (or from Mac Ports). Just download the newest package from http://ftp.gnu.org/gnu/wget/ (at the time of writing this post wget-1.13.tar.gz).

Shell
1
2
3
4
tar xzf wget-1.13.tar.gz
cd wget-1.13
./configure  –with-ssl=openssl
sudo make install

Now let’s try again!

wget to the rescue!

The following command downloads the entire website and converts any internal links and paths to local ones.

Shell
1
2
3
4
5
6
wget --recursive \
--no-clobber \
--page-requisites \
--convert-links \
--domains yourdomain.com \
www.yourdomain.com

What the switches do:

  • –recursive - The switch responsible for downloading more than just the one URL you specify
  • –domains yourdomain.com – tells wget to not follow links outside the specified domain
  • –page-requisites - instructs wget to download all the elements that make up the page (images, CSS, scripts, etc).
  • –convert-links - convert links and paths so that it can be browsed locally
  • –no-clobber - don’t overwrite existing files

If you need to download only a particular subdirectory you can limit wget to that directory by using the switch –no-parent. If you work on a windows machine (why?) you may need the –restrict-file-names=windows switch. With that switch wget modifies filenames to make the website work in Windows as well.

One last thing

There may be some resources that wget does not fetch because it obeys the rules specified in robots.txt. You can disables that behaviour with the switch -e robots=off.

Note: It is considered evil to circumvent that behavior, so please be careful and only use the following switch on your own site (emphasis on your own)!!!

That should do it. Now you are all set to create local backups of websites. But beware – with great wget-fu comes great responsibility!

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>