BW Whois −− A whois client by Bill Weinman
whois [options] request[@host[:port]] [ ... ]
This documents BW Whois version 5.5.2
BW Whois was originally designed to work with the new "Shared Registration System" whois introduced 1 December 1999. This new system has proved to be remarkably disorganized and inconsistent, resulting in tremendous confusion for those of us who need to find the ownership of a domain now and then.
This program mitigates most of that confusion by referring to a table of TLDs (Top−Level Domains) and associated registrars in the tld.conf file.
Over the past few years this program has evolved into the most full-featured whois client available providing features like a self-detecting CGI mode and SQL database caching, for those who need such features, while still maintaining a simple command-line interface for those who just need that.
The CGI mode can be secured against abuse by a number of different methods including "Referer:" headers, IP addresses, and a system of 128−bit hashed cookies. These security options can be tailored to suit the demands of a given installation using the whois.conf configuration file.
There are features to support a web-based whois service, including support for Apache-style server-side includes, and support for a distinct initial page a "domain not found" page.
An optional caching capability is provide for using an SQL database (currently MySQL, PostgreSQL and SQLite are supported). When configured for caching, requests are forwarded to the corresponding whois server only if the cache does not contain a result for the given request/server combination. Cached values are expired after a configurable amount of time.
When given a request, the program first checks the requested domain against the tld.conf file for an associated whois server. If not found the program will then submit the request to the "root" whois server (currently whois.crsnic.net) and wait for a referral to a registrar’s whois server.
If given a referral, the program will then submit the request a second time to the referred whois server.
The request can be a domain name, (e.g. whois bw.org) or any other entity that the given host can resolve (e.g. whois !firstname.lastname@example.org).
If request is an IP address (or part thereof), the ARIN whois server will be used as a root server (whois.arin.net).
If host is specified, the request will be sent literally to the specified host.
If both host and port are specified, the request will be sent to that host using the specified port instead of the normal whois port (43).
Multiple requests on a single command line are supported.
Self-detecting CGI Support
BW Whois detects CGI operation by looking for the standard "SCRIPT_NAME" environment variable. This behavior can be overridden by using the --nocgi switch.
In CGI mode the program attempts to make intelligent links out of IP addresses, domain names, and handles. It doesn’t always get it right, but it tries real hard!
You can also specify an optional whois.html file to create your own look. The HTML file will need a few simple "placeholders" in it. The placeholders are replaced at runtime with the various values which make this work. These placeholders are represented by text enclosed in ’$’ signs like this: "$PLACEHOLDER$"
Separate HTML files may be specified for an initial page and a "not found" page, if desired.
placeholders are described here:
The URI path of the program on your web server, taken from the value of the "SCRIPT_NAME" environment variable.
The domain that was last looked up, if any.
The result of the whois query from BW Whois.
You can get an example file from the program with:
whois --makehtml > whois.html
Optional Apache SSI Support
If you need to include other files into your HTML file dynamically, experimental support for Apache-style SSI (server−side includes) is provided with the bwInclude.pm module. This currently works only for "include virtual" and "echo var" directives.
Simply place the bwInclude.pm file with your other perl module files, or specify the directory that contains the module in the "use lib" line in the source code.
Optional TLD Table Support
Bcause of the unfortunate design of the Shared Registration System, only the .COM, .NET, and .ORG Top-Level Domains (TLDs) are referred by the "root" domain servers at whois.crsnic.net and whois.internic.net. If you want results for other TLDs you must know where to find them, and there is no central repository for current whois server referrals.
The optional tld.conf file includes whois servers for all known TLDs, and some second-level domains that are registrered separately (e.g. .net.au, .uk.com, etc.).
The format of the tld.conf file is as follows:
Lines that begin with "#" are ignored.
Token lines are like:
token token optional comments
The first token is the TLD , the leading dot (".") is required.
The second token is the fully-qualified domain name for the whois server that responds to requests for the given TLD .
The two tokens can be separated by spaces and/or tabs
Anything on the line after the second token is ignored.
A leading "#" for in-line comments is not required, but may be in the future.
The file is searched sequentially, so it’s important to have 2nd−level domains earlier in the file than corresponding top-level domains. (e.g. .net.au before .au).
Optional Support for Stripping Disclaimers
Most whois servers deliver a disclaimer along with thier whois results. The disclaimer generallly says something like "By submitting the request that you already submitted before you saw this agreement you have agreed to this binding contract. Haha!"
Many people who are not otherwise lawyers are annoyed by this. The stripdisclaimer option will remove the disclaimers before you see them.
This feature requires the sd.conf file.
The format of the sd.conf file is:
server "first line" "last line"
server is the DNS name of the whois server
"first line" and "last line" are regular expressions that match the first and last line (respectively) of the disclaimer to be stripped. The quotes are required.
This program attempts to find netblock requests. If a request is entirely numeric (e.g. 123.234), the program first checks with whois.arin.net ( ARIN ). If an ARIN record contains a referral to another whois system, (e.g. RIPE or APNIC ) the program will attempt to detect that and snatch the record from the referened whois system. Note: ARIN ’s records are very inconsistent in their formatting, so this may not always do something intelligent.
Packed IP addresses
If the request is a string of numbers without any other characters, the program will treat it as a 32−bit (packed) IP address. It will first unpack it into dotted-quad notation and then submit it to the ARIN whois server.
Packed IP addresses are often used by spammers in an attempt to confuse those who might try to report thier abuse. This feature makes it easy for you to decypher those addresses and find the owner of the netblock all in one step.
IP addresses are actually 32−bit integers (until we get IPv6 -- but that’s another story). The common notation represents the address as four separate 8−bit integers, like this: 184.108.40.206 (actually one of ARIN ’s servers). That’s called "dotted−quad" notaion. If you were to represent that address as one big 32−bit integer it would look like this: 3231054869. I call that a "packed" IP address.
Sometimes a spammer will use a packed IP address in a URL like this:
That address will work in a web browser, but it’s hard to look up. This program will accept a packed IP address like this:
The program will unpack it into dotted-quad notation, and submit it to the ARIN whois server just like a normal IP address.
Print a usage message.
Print the version information and exit.
Full path to the configuration file. Default: /etc/whois/whois.conf
Refresh the cache for this query. Forces the request to go to the whois server even if the result is cached. (Only valid if caching is configured.)
Full path/file name for tld.conf file. Default: /etc/whois/tld.conf
--host=host, -h host
Specify a specific host.
--port=port, -p port
Specify an alternate port.
Set the timeout to a number of seconds. The default is 60 seconds if this is not specified.
Be wery, wery quiet. I’m hunting wabbits. (--quiet overrides --verbose)
Show details of every step. (--quiet overrides --verbose)
Sets the stripdisclaimer mode. The program makes an attempt to strip off those inane disclaimers that so many registries are starting to include with their whois records. This feature requires the sd.conf file.
Writes a sample HTML file (for CGI mode use) to standard out.
Prevent CGI mode. This is useful if you have a script that used a legacy character-mode whois program.
Create HTML links of handles, IP addresses, and domains without using HTML in the rest of the output. Useful with --nocgi for using an external wrapper CGI program.
Allow japanese output from nic.ad.jp.
A sample whois.conf file is included with the BW Whois distribution. It is not necessary to use the whois.conf file to use the program.
If you want to use advanced features, such as caching or optional CGI security features, you will need to install the whois.conf file and configure it to reflect your preferences.
The standard location for whois.conf is in the /etc/whois directory. If you do not have access to that directory, or are running on a non-UNIX operating system that does not use the /etc directory, you may specify another location by setting the "WHOIS_CONF" environment variable or by editing the source code.
If you need to edit the source code, be sure you are using a plain text editor (not a word processor!) and that you save the file with appropriate line-endings for your system. If you do not understand those distinctions I highly recommend that you find a friend or hire a consultant who knows about such things. (The author is occasionally available for such small consulting tasks -- feel free to contact him if you need help.)
Format of the Config File
The config file format is very simple.
Lines that begin with "#" are considered comments and are ignored.
Anything after a "#" to the end of a line is considered a comment and ignored.
The format of each non-comment line is:
For logical values, "1" or "true" (without the quotes) are considered true. Anything else is considered false.
For options that take a list of values, the list is separated by colons (":") without spaces. Spaces are not currently supported in any value.
See the SECURITY section of this man page for more information about security features.
options are supported:
Strip off the disclaimer/header from the results returned by many registrars. This feature requires the sd.conf file.
Alternate location for the tld.conf file. Default: /etc/whois/tld.conf
Alternate location for the sd.conf file. Default: /etc/whois/sd.conf
The number of seconds to timeout if a result is not returned by a whois server. Default: 60 seconds.
A hostname to use as a default whois server if the TLD is not found in the tld.conf file. Default: whois.crsnic.net
An HTML file to use for queries and results. Default: internal
An HTML file to use for the initial page. This is the page displayed when no query is submitted. Default: htmlfile or internal
An HTML file to use for results that are not found. This is the page displayed when a query returns a negative response. It may be used to display a page indicating that a domain may be available for registration. Default: htmlfile or internal
An HTML file to use for results that are found. This is the page displayed when a query returns a positive response. It may be used to display a page indicating that a domain is not available for registration. Default: htmlfile or internal
An HTML file to use for error 403 (Forbidden) results. Default: internal
An HTML file to use for error 408 (Expired Session) results. Default: internal
This option enables logging and provides a path and filename for the log. Log entries look like this:
2002-12-11 20:06:00  (192.168.0.30) whois.cgi: cgi domain: bw.org (1)
Items are, from left to right:
Date and time ( UTC ) of the log entry.
The process ID , enclosed in square brackets.
The IP address of the CGI client, enclosed in parenthesis. This item only appears in CGI mode.
The process name, or the log_name (see below), followed by a colon.
The text of the log entry (in this case, "cgi domain: bw.org").
A log-level for this item. The log-level only appears if log_level (see below) is provided in the config file.
Make sure the user-ID that owns the whois process has permission to write the log file. This option is usually used when running in CGI mode. In that case, you need to ensure that the user-ID of the web server has permission to write to the log file.
level can be a number from 1−9.
This item specifies what level of logging you want. Without this item, events with log-levels higher than 1 will not be logged. For most purposes, that will be fine. The higher the number, the more events get logged.
This option provides a specific name for log entries. This will be used instead of the process-name in log entries.
This option enables database operations. The token can be mysql, pgsql, or sqlite3 corresponding to the database system you are using.
connect connect string
This option is required if database is used. It specifies the connection parameters used to access the database. The format is:
For example, if your database were named "whois" on the local machine, on the standard MySQL port (3306) and the user was "web" and the password was "foo.bar" you could use:
Note: if you are using SQLite 3 your connect string will only have the path to the database file, as in this example:
The name of the database table to use for the results cache. This also serves to enable results caching.
The number of seconds to hold a result before it is considered stale. Stale results will be refreshed when requested again. Default: 432000 seconds (five days).
The table name to use for security control records. This is required to enable security control features.
The name to use for control cookies. This also serves to enable the cookie control feature.
How many seconds a cookie is valid for. Default: 3600 seconds (one hour).
The number of hits allowed from one IP address within the ip_expire time. This also serves to enable the IP control feature.
The number of seconds required between hits from one IP address before that address is expired from the control table.
A list of valid hostnames to allow in the "Referer:" header. Use a value of * to turn off referer checking entirely. Default: The hostname in the HTTP "Host:" header.
Allow links to a whois record without a cookie or a referer. This is useful for providing a link in an email message. The number is how many seconds apart to allow linked hits from the same IP address. This requires control_table and ip_control.
A list of IP addresses that can be used for the outgoing connection. BW Whois will select an address from this list at random and bind to that for your outgoing connection. This will help with some whois servers that block based on number of connections from a given IP address. These are IP addresses ON YOUR SYSTEM . You must have these IP addresses configured in order for them to work.
The environment variable "WHOIS_CONF" may be used to specify an alternate path to the whois.conf file.
The environment variable "BW_WHOIS" is no longer supported.
This version of BW Whois contains features to help secure a web-accessable installation from abuse.
Over the past few months many users of BW Whois have sustained attacks from automated web clients (bad robots) that would rapidly request whois results, presumably for illicit purposes. My own server was attacked and queries from my server became disallowed by Verisign (ne Network Solutions).
When I first detected these attacks on my own site, I quickly implemented a simple control that kept a flat-file list of IP addresses and refused connections from an IP address after it was represented more than a given number of times in that file.
A few weeks later the attack started up again from a number of IP addresses too large to control in this manner. I was amazed, to say the least. My server was blocked again by NSI . This was a coordinated attack from a large number of hosts on a large number of disparate networks.
This time I buckled down and devised a set of controls that would require a lot more sophistication to subvert. So far these controls have been very successful on my server.
Three Types of Controls
There are three distinct types of controls. They can be used separately, but personally, I use all three and I recommend you do the same.
The referer controls are enabled by default and do not require that a database be installed.
If a request is received that does not provide an HTTP "Referer:" header, or provides a referer that does not match the hostname in the "Host:" header, the request is denied and a 403 (Forbidden) result code is returned.
So far the robots do not provide an HTTP "Referer:" header, but I expect they will soon if people rely on this control without the others. It would be a trivial addition to their code.
The IP control requires an SQL database. Currently MySQL, PostgreSQL, and SQLite are supported.
Whenever a request comes in from a web client, the database is queried to see if that IP address has visited recently. If not found, the request is allowed and a record is created.
If the IP address is found in the database, a counter is updated to reflect how many hits have arrived from that address. If the count is above the limit, the request is denied and a 403 (Forbidden) result code is returned. If more than "ip_expire" seconds have passed since the last hit from that IP address, the count is reset and the request is allowed.
This control will be difficult to subvert. The problem is that the count must be high enough to permit hits from clients behind proxy servers, such as AOL and Earthlink users.
The cookie controls also require an SQL database. Currently MySQL, PostgreSQL, and SQLite are supported.
When a first request comes in from a web client (e.g., a request for a web form, but not for data), a unique cookie is generated with a 128−bit pseudo-random hash, and given to the browser. The cookie is then stored in the database with a timestamp showing when it was generated.
When a web client makes a request that requires a data response, a registered cookie is required. If no cookie is provided a 403 (Forbidden) result code is returned. If an expired cookie is provided a 408 (Expired Session) result code is returned.
A new cookie is generated on each connection from each client.
In order to subvert this control, a robot would have to process and store actual cookies. So far, they don’t do that.
Some users have requested a way to provide links to individual whois records to their clients in email messages. A facility is provided to allow this practice without significant compromise to the system.
When the direct_link option is set in the whois.conf file, links are allowed with neiter a cookie nor a referer, but not if that IP address has been used within the number of seconds provided in the option line.
This has the same problem as the IP controls with proxy clients, but it should work under most circumstances.
Not all whois servers comply with RFC 954. Unfortunately that lack of compliance is so inconsistent that the same commands can produce wildly different results from server to server.
This client deals with the situation by sending fully-qualified requests only to NSI ’s servers, and the simplest form of request to other servers. This tactic is not entirely reliable.
RFC 954: NICNAME/WHOIS
An optional table of TLDs and associated whois servers.
A configuration file for optional flags and other configurable values.
A configuration file for optional stripdisclaimer feature.
The format of the tld.conf file changed in version 2.7. Please be sure your file has leading dots (e.g. .au) if you are using a current version of BW Whois.
The tld.conf file for versions 3.0 and above includes servers for the .COM, .NET, and .ORG domains. Older versions of the program did not support tld.conf file lookups for these domains.
The default location for all the configuration files was changed to /etc/whois/ in version 3.1.
The stripheader feature was changed to stripdisclaimer in version 3.1. This feature now requires the sd.conf configuration file.
The whois command first appeared in 4.3BSD. The BW Whois command first appeared 2 December 1999.
See the HISTORY file for more detail about the history of BW Whois.
Bill Weinman <http://bw.org/>
You can find the latest version of BW Whois at <http://whois.bw.org/>.
You can send email to Bill Weinman using the web form at <http://bw.org/contact/>.
Copyright 1999−2012 William E. Weinman
This program is free software. You may modify and distribute it under the same terms as perl itself.