Below is a simple script I use almost daily which I thought I would share with you all. The script fetches various URL’s then extracts all hyper links from the fetched data. It cleans them up a bit and prints the resulting data as standard output
#!/usr/bin/perl use strict; use HTML::LinkExtractor; use LWP::Simple; if(!$ARGV[0]){ print "Usage: $0 URL URL URL ... "; exit; } # Fetch and parse the link for my $link (@ARGV){ my $LX = new HTML::LinkExtractor; my $page = get($link); $LX->parse(\$page); # figure out URL base. my $base; if($link =~ /^(https?:\/{2}[^\/]+)\/?/i){ $base = $1; } if($link !~ /\/$/){ my @link = split(/\//, $link); pop @link; $link = join('/', @link); } for(@{ $LX->links } ){ if(lc $_->{tag} eq 'a'){ my $url; if($_->{href} =~ /^\//){ $url = $base . $_->{href}; } else { $url = $link . '/' . $_->{href}; } print qq{$url\n}; } } }
I call it linkext. Below is an example usage to fetch all of the links available on the Intel Lustre download page:
$ linkext https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64/ https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//?C=N;O=D https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//?C=M;O=A https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//?C=S;O=A https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//?C=D;O=A https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/ https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-debuginfo-common-x86_64-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-devel-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-firmware-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-headers-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-debuginfo-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-dkms-2.6.0-1.el6.noarch.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-iokit-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-modules-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-osd-ldiskfs-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-osd-zfs-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-source-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-tests-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//perf-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//perf-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//python-perf-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//python-perf-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//sha256sum
So then, to have this a bit more useful lets parse it with egrep and pass the arguments to xargs, which executes wget to fetch our files:
linkext https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64/ | egrep "\.rpm$" | xargs wget
Which would start downloading the various files. Or you can of course have xargs execute echo and display the full command line in which wget is to work on:
$ linkext https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64/ | egrep "\.rpm$" | xargs echo wget wget https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-debuginfo-common-x86_64-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-devel-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-firmware-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//kernel-headers-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-debuginfo-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-dkms-2.6.0-1.el6.noarch.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-iokit-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-modules-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-osd-ldiskfs-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-osd-zfs-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-source-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//lustre-tests-2.6.0-2.6.32_431.20.3.el6_lustre.x86_64.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//perf-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//perf-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//python-perf-2.6.32-431.20.3.el6_lustre.x86_64.rpm https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64//python-perf-debuginfo-2.6.32-431.20.3.el6_lustre.x86_64.rpm
I hope you all find this useful.