Download A Website Using wget

Use the wget command line utility to download an entire website.

Be careful with recursive retrieval - you might download the entire internet!

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --domains \
     --no-parent \
  • –recursive: downloads entire site
  • –no-clobber: doesn’t overwrite files, useful for interrupted downloads
  • –page-requisites: download all the files required to display the page (CSS, images etc)
  • –html-extension: save files with extension HTML
  • –convert-links: make links relative so they work off-line
  • –domains: Set domains to be followed
  • –no-parent: Don’t ascend to the parent directory when retrieving recursively - guarantees that only the files below a certain hierarchy will be downloaded

Alternative Method

This can sometimes works better:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla

Friendlier on the target website and avoids getting blocked.

Note: We use this primarily for downloading our own or our client’s CMS based sites in “flat” HTML - so we’re only hitting our own site resources, or we’re downloading with permission.

If you’re using this method to download other people’s websites, be responsible.

See wget man page for more details.

