سؤال

I'm trying to use the following command to mirror a Drupal 6 site which needs to be archived as a static site. Unfortunately, I've only been able to find a static generator for Drupal 7, not 6. I'm having issues because it seems that

  1. By logging in as different roles, the wget command will pull down completely different directory structures. This is not unanticipated, but in some cases as an administrator there is quite a lot of content missing which is pulled down with accounts that have lesser permissions.
  2. There are links on the main page in separate blocks whose directory and pages do not get pulled down in the file directory structure by logging in with any role, including administrator.

So far I've successfully been able to mirror the site (with the above limitations), but I'm not sure if the command is actually mirroring the entire site as I need it to do. Here is the code I am using for wget:

wget --mirror -w 2 -p --convert-links --load-cookies cookies.txt -e robots=off https://url.org/user

where the cookies file is created like this:

wget --save-cookies cookies.txt --post-data 'name=MY_USERNAME&pass=MY_PASSWORD&form_build_id=FORM_ID&form_id=user_login&op=Log+in' https://url.org/user

I am very new to Drupal, so am not sure how roles and permissions, or the structure of the content might affect a wget mirror operation. Any suggestions would be appreciated!

هل كانت مفيدة؟

المحلول

It appears that the issue had to do with the fact that a "Logout" block was in the header of the main site. As a result, when wget went to pull things down, it would actually go to the logout link, and thus the rest of the files would either display a login screen or wouldn't be downloaded. By disabling the logout block OR adding --reject logout to my wget command, it seems to have fixed the issue and now the full directory structure is being downloaded. The command I ended up using was:

wget --mirror -w 2 -p --convert-links --load-cookies cookies.txt -e robots=off --reject logout https://url.org/user

نصائح أخرى

wget -mnH -k --html-extension https://example.com

I'm not setting any cookies, but this is what I usually run to mirror a website and it works well. You can sub the last portion if you don't want html extensions

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى drupal.stackexchange
scroll top