This library deals with the analysis and construction of a URL,
Universal Resource Locator. URL is the basis for communicating locations
of resources (data) on the web. A URL consists of a protocol identifier
(e.g. HTTP, FTP, and a protocol-specific syntax further defining the
location. URLs are standardized in RFC-1738.
The implementation in this library covers only a small portion of the
defined protocols. Though the initial implementation followed RFC-1738
strictly, the current is more relaxed to deal with frequent violations
of the standard encountered in practical use.
- author
- - Jan Wielemaker
- - Lukas Faulstich
- deprecated
- - New code should use library(uri), provided by the
clib
package.
- global_url(+URL, +Base, -Global) is det
- Translate a possibly relative URL into an absolute one.
- Errors
- -
syntax_error(illegal_url)
if URL is not legal.
- is_absolute_url(+URL)
- True if URL is an absolute URL. That is, a URL that starts with
a protocol identifier.
- http_location(?Parts, ?Location)
- Construct or analyze an HTTP location. This is similar to
parse_url/2, but only deals with the location part of an HTTP
URL. That is, the path, search and fragment specifiers. In the
HTTP protocol, the first line of a message is
<Action> <Location> HTTP/<version>
- Arguments:
-
Location | - Atom or list of character codes. |
- csearch(+Attributes)//[private]
- cvalue(+Value)// is det[private]
- Construct a string from Value. Value is either atomic or a
code-list.
- cfragment(+Attributes)//[private]
- parse_url(?URL, ?Attributes) is det
- Construct or analyse a URL. URL is an atom holding a URL or a
variable. Attributes is a list of components. Each component is
of the format Name(Value). Defined components are:
- protocol(Protocol)
- The used protocol. This is, after the optional
url:
, an
identifier separated from the remainder of the URL using :.
parse_url/2 assumes the http
protocol if no protocol is
specified and the URL can be parsed as a valid HTTP url. In
addition to the RFC-1738 specified protocols, the file
protocol is supported as well.
- host(Host)
- Host-name or IP-address on which the resource is located.
Supported by all network-based protocols.
- port(Port)
- Integer port-number to access on the \arg{Host}. This only
appears if the port is explicitly specified in the URL.
Implicit default ports (e.g., 80 for HTTP) do not appear
in the part-list.
- path(Path)
- (File-) path addressed by the URL. This is supported for the
ftp
, http
and file
protocols. If no path appears, the
library generates the path /
.
- search(ListOfNameValue)
- Search-specification of HTTP URL. This is the part after the
?
, normally used to transfer data from HTML forms that
use the HTTP GET method. In the URL it consists of a
www-form-encoded list of Name=Value pairs. This is mapped to
a list of Prolog Name=Value terms with decoded names and
values.
- fragment(Fragment)
- Fragment specification of HTTP URL. This is the part after
the
#
character.
The example below illustrates all of this for an HTTP URL.
?- parse_url('http://www.xyz.org/hello?msg=Hello+World%21#x',
P).
P = [ protocol(http),
host('www.xyz.org'),
fragment(x),
search([ msg = 'Hello World!'
]),
path('/hello')
]
By instantiating the parts-list this predicate can be used to
create a URL.
- parse_url(+URL, +BaseURL, -Attributes) is det
- Similar to parse_url/2 for relative URLs. If URL is relative,
it is resolved using the absolute URL BaseURL.
- globalise_path(+LocalPath, +RelativeTo, -FullPath) is det[private]
- The first clause deals with the standard URL /... global paths.
The second with file://drive:path on MS-Windows. This is a bit
of a cludge, but unfortunately common practice is -especially on
Windows- not always following the standard
- absolute_url//[private]
- True if the input describes an absolute URL. This means it
starts with a URL schema. We demand a schema of length > 1 to
avoid confusion with Windows drive letters.
- uri(-Parts)//[private]
- schema(-Atom)//[private]
- Schema is case-insensitive and the canonical version is
lowercase.
Schema ::= ALPHA *(ALPHA|DIGIT|"+"|"-"|".")
- hier_part(+Schema, -Parts, ?Tail)//[private]
- query(-Parts, ?Tail)// is det[private]
- Extract &Name=Value, ...
- search_sep// is semidet[private]
- Matches a search-parameter separator. Traditionally, this is the
&-char, but these days there are `newstyle' ;-char separators.
- See also
- - http://perldoc.perl.org/CGI.html
- To be done
- - This should be configurable
- fragment(-Fragment, ?Tail)//[private]
- Extract the fragment (after the =#=)
- fragment_char(-Char)[private]
- Find a fragment character.
- pchar(-Code)//[private]
- unreserved|pct_encoded|sub_delim|":"|"@"
Performs UTF-8 decoding of percent encoded strings.
- lwalpha(-C)//[private]
- Demand alpha, return as lowercase
- sub_delim(?Code)[private]
- Sub-delimiters
- unreserved(+C)[private]
- Characters that can be represented without percent escaping
RFC 3986, section 2.3
- www_form_encode(+Value, -XWWWFormEncoded) is det
- www_form_encode(-Value, +XWWWFormEncoded) is det
- En/decode to/from application/x-www-form-encoded. Encoding
encodes all characters except RFC 3986 unreserved (ASCII
alnum
(see code_type/2)), and one of "-._~" using percent
encoding. Newline is mapped to %OD%OA
. When decoding,
newlines appear as a single newline (10) character.
Note that a space is encoded as %20
instead of +
.
Decoding decodes both to a space.
- deprecated
- - Use uri_encoded/3 for new code.
- www_encode(+Codes, +ExtraUnescaped)//[private]
- www_decode(-Codes)//[private]
- set_url_encoding(?Old, +New) is semidet
- Query and set the encoding for URLs. The default is
utf8
.
The only other defined value is iso_latin_1
.
- To be done
- - Having a global flag is highly inconvenient, but a
work-around for old sites using ISO Latin 1 encoding.
- url_iri(+Encoded, -Decoded) is det
- url_iri(-Encoded, +Decoded) is det
- Convert between a URL, encoding in US-ASCII and an IRI. An IRI
is a fully expanded Unicode string. Unicode strings are first
encoded into UTF-8, after which %-encoding takes place.
- parse_url_search(?Spec, ?Fields:list(Name=Value)) is det
- Construct or analyze an HTTP search specification. This deals
with form data using the MIME-type
application/x-www-form-urlencoded
as used in HTTP GET
requests.
- file_name_to_url(+File, -URL) is det
- file_name_to_url(-File, +URL) is semidet
- Translate between a filename and a file:// URL.
- To be done
- - Current implementation does not deal with paths that
need special encoding.