WebCopy v0.97b 95/05/31
Copyright 1994, 1995 by
Copy files (recursively) via HTTP protocol.
WebCopy is a
perl program that retrieves
the URL specified in a
unix-like command line.
It can also retrieve recursively any file that an HTML file references,
i.e. inlined images and/or anchors, if specified with an option.
It can be used as a "mirror" program to retrieve
a tree of documents from a remote site, and put them on-line immediately
through the local server.
By default, only the document pointed by the URL in the command line
Many switches can be specified in the command line and each option enables
one type reference to follow.
To avoid endless recursion,
only files at one site can be retrieved with one command.
WebCopy never follows links to files not in the same host, port number
and protocol (only HTTP supported) of the first document retrieved.
A list of discarded URLs are logged for future references.
This program does not comply with the
since it retrieves almost what the user specifies in the command line.
The user must know what kind of server and documents he want to access.
Webcopy does all what it knows to stop at CGI-generated files
- Slightly improved code :-)
- More restrictive, to avoid endless recursion.
- Added PROXY support. An HTTP proxy must be specified in the command line
or through the
http-proxy environment variable.
- It can POST data to CGI scripts (but cannot recurse on output).
- Added delay time between connections, to avoid overload on the server.
Duration can be specified by an option in the command line.
Defaults to 30 seconds.
webcopy [options] http://host:port/path/file [http://proxy:port]
Options (can be combined):
- output through stdout.
You can redirect the output to another filename or pipe it to a program.
Use it with
You cannot recurse HTML files or use
options in this mode.
- operates in verbose mode.
Displays every URL to fetch.
-vv is "very verbose" and
outputs every header line the server sends with the file.
- query each URL to transfer.
Use it to select the files to transfer.
n' to skip file, '
y' to transfer,
a' to transfer all the remaining files,
q' to quit immediately.
If you don't say '
y' to the first file (the one specified
in the command line), no recursion is made.
- do not log in '
This file is stored in the root of the working directory.
It can be parsed to get a list of every file (NOT) transfered.
- set 'path' seconds between transfers.
This option is used to change the default 30 seconds of delay before
every connection. This delay is due to avoid server overload.
- set working directory to 'path'.
WebCopy stores files in the current working directory. Use this
option to force WebCopy to use another directory.
- set default index to 'file'.
When a directory index is required, this is the filename that is used
to store the output. Defaults to
- post 'file' or query string if ommited.
You can send some URL-encoded form data using the POST method to
a CGI script. If the filename is omited, the data is taken from the
query string specified in the URL after a "
The data in the file must be in URL-encoded format, and spaces are
- recurse HTML documents.
- include inlined images.
Retrieves the files referenced in
- follow hypertext links.
Recurse through hypertext references in
Warning: Never leave WebCopy unattended if you don't
know what you are recursively retrieving.
- allow imagemaps.
NOT available yet.
- allow links to CGI scripts.
By default, WebCopy discards references that seems to be a CGI script
/cgi-bin/ in the path).
Use this option if you want to retrieve the output of a CGI script.
If the base path is other than current, you'll also require options
- allow absolute references to the same host.
/path/to/file.html, where the path is the
current one (the one that was specified in the command line) are not
rejected when this option is specified.
If other paths are required, also use option
- allow full URL references to the same host.
http: URLs are accepted only if this option is
specified in the command line and the host and port remain the same than
the current, but still rejected unless option
-p is also
- allow paths other than current.
/images/some.gif, where the path is not the
current, are accepted. Use this option to allow references to CGI scripts.
To keep the same document structure of the server and to avoid document name
-d is recommended.
Warning: This option can cause WebCopy to retrieve the
whole data from a server if it finds a reference to the server
root in some document while using recursion.
Never leave WebCopy unattended if you don't know what you are recursively
- keep directory path in URL for local file.
The defaut behaviour of WebCopy is to set the working directory the
equivalent of the document directory specified in the command line's URL.
Using this option, WebCopy sets the working directory to be the same of
the root directory of the server, so directories in the path are also
created in the working directory.
If you want to specify this option after doing some documents transfer,
you'll have to create the subdirectories yourself and move the retrieved files
in working directory to the subdirectory, or you will get duplicated files.
- use local copy of file if exists.
Before doing a request to a server, WebCopy checks for the file in the
working directory, and sends file information to the server. Only if the file
was changed since last access, the new version is retrieved.
This option forces WebCopy to use the local copy of the file if it exists,
without checking if the file was changed in the server.
- don't use defined PROXY.
http_proxy environtment variable is defined, this option
makes WebCopy to ignore it. It also ignores a PROXY specified in the command
Webcopy displays a brief help, ignores other options specified and exits.
Note: Some options conflicts with others.
For example, you cannot use
-o at the
same time because both require STDOUT.
- To retrieve a single file and store it with some name in current directory:
webcopy -so http://www.host/images/icon.gif > logo.gif
- To retrieve a page and some of the inlined images without delay:
webcopy -vsiqt0 http://www.host/page.html
and press RETURN on each file NOT to transfer.
- To mirror a group of files in some other directory:
webcopy -rwpub/mirror/name http://www.host/intro.html
- To retrieve the output of a form:
If you are smart enough, you can write your own files of data and just do
step 6, or use the following:
- Get the form:
webcopy -so http://www.host/form.html > form.html
- Using an editor, change:
<FORM METHOD=POST ACTION="http:/www.host/cgi-bin/proc">
<FORM METHOD=POST ACTION="mailto:yourself@yourdomain">
- Using a WWW browser, read the modified file, fill the form and
press "OK" button.
- Wait for your own mail to arrive. It should contain the posted
URL-encoded data in the body.
- Save the mail in a file (
the mail headings.
- Post the data:
webcopy -so -zpost.dat http://www.host/cgi-bin/proc > result.html
webcopy -so -z http://www.host/cgi-bin/proc?postdata > result.html
- To verbosely retrieve html documents and icons that are not in the same
directory of the server:
webcopy -vvrpafd http://www.host/path/page.html
- To retrieve a file using a PROXY, overriding the default
http_proxy environtment variable:
webcopy http://www.host/path/page.html http://otherproxy
License Agreement and Lack of Warranty:
- The author of this program is Victor Parada <email@example.com>.
- This program is "Freeware", not "Public Domain".
- This program must be distributed for free, and cannot be included in
commercial packages without prior written permisson from the autor.
- This program cannot be distributed if modified in any way.
- This program can be used by anyone if the copyright and this notice
remains intact in every file.
- If you modify this program, please e-mail patches to the the author.
- This is a Beta version of the program. You have been warned!
- This program is provided ``AS IS'', without any warranty.
- This program can cause huge file transfers and all the related effects.
- This program can fill data disks without notice.
- Neither the author nor UTFSM are responsibles for the use of this program.
- Bug reports, comments, questions and suggestions are welcome! But
please check first that you have the latest version!
If you (want to) use this program, please send
e-mail to the author.
He will try to notify you of any updates made to it.
perl interpreter (either 4.036 or 5.000 or later) with
perl library (
hostname program or script to get current host's name.
- TCP/IP connection and Sockets.
- Space on disk.
- A machine with all the above available.
Down-loading and Setting-Up:
- Make sure you have the previous System Requirements.
- Get the latest version of WebCopy from its home FTP server:
This is a
- Untar the file with the command:
tar -xzvf webcopy.tgz
(GNU version of
- Make sure you got the following files in a subdir called
- Read the License Agreement and Lack of Warranty
webcopy.txt, or in
using an HTML browser.
- Edit the first line of
webcopy if your
perl interpreter is not located at
webcopy to a suitable directory.
- Use it at your own risk!
- Register yourself (it's free)
and send feed-back!
If you cannot do
send e-mail to the author.
He will try to send you a
shar'ed copy of it :-)
Document last modified on 1995/06/14 by Víctor Parada