Download Whole Website in Linux the Smart Way !!

Tags: ,
Posted Under: Linux

Have you ever Googled the Internet for a software to Download complete website for you , but you only found a  Windows software or maybe a Linux one too , but did you ever knew that your Linux box has a nifty command to make all your troubles go away and download a full website with just a single command , Yes ! wget does it and here is the command just copy paste it in the shell and edit the website details at the bottom .

$ wget \
–recursive \
–no-clobber \
–page-requisites \
–html-extension \
–convert-links \
–restrict-file-names=windows \
–domains techstroke.com \
–no-parent \
www.techstroke.com/Windows/


This command downloads the Web site www.techstroke.com/Windows/.

The options are:

  • –recursive: download the entire Web site.
  • –domains-techstroke.com: don’t follow links outside techstroke.com.
  • –no-parent: don’t follow links outside the directory /Windows/.
  • –page-requisites: get all the elements that compose the page (images, CSS and so on).
  • –html-extension: save files with the .html extension.
  • –convert-links: convert links so that they work locally, off-line.
  • –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
  • –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
    resumed).

All these options are uber cool and they download a perfect browsable copy with all images javascript and css intact !!

via [linuxJournal]


This entry was posted on Sunday, September 7th, 2008 and is filed under Linux. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Download Whole Website in Linux the Smart Way !!”

  1. Rava

    Hi, I tried that with a webpage. The one is https, but all was OK when using –no-check-certificate

    I used “-r –no-check-certificate -page-requisites” only. The problem: The main css file is saved, and in it the other css files are linked with
    @import url(bla.css);
    @import url(blubb.css);

    and none of these are saved at all. Is there a wget tweak that also makes wget save these files as well?

    –page-requisites itself should be enough to make wget do it, but it seems it just won’t. Using wget 1.11.1

  2. Vivek

    @Rava I dont think so , that we have such a tweak, it parses html files but not the CSS ones, I went through the wget manual and found nothing regarding this, you can google for some website copy tool for linux if you need this done .

  3. Chrissy

    Wow…

  4. Swapnil

    Thanks for sharing this … the best

Leave a Reply

See What others were Looking For !!

download whole site linux (35),download whole website linux (27),download whole website ubuntu (15),linux download whole website (11),download a whole website linux (10),linux save website (10),ubuntu download whole website (9),how to download a whole website linux (9),save website linux (8),save whole website linux (6),linux download whole site (5),how to download whole website linux (5),how to download whole website in linux (5),linux website copy (5),download the whole site Linux (4),linux how to download a whole website (4),how to download whole site in linux (4),wget save whole site (4),download entire website linux (4),download whole website (3)