SWI-Prolog -- atom

LogicalCaptain said (2020-10-23T12:55:25):

Is atom_length∕2 efficient?

This test:

atom_length(A,1)

tests whether the length of A is 1.

It is (should be) equivalent to

atom_length(A,L), L=1.

i.e. the length of A is determined first, then unified with 1.

This may be efficient or inefficient depending on how the length of a string is or atom is handled (if it's a C string, it's inefficient).

Should there be a task-specific atom_length_test(A,L) predicate where L needs to be instantiated, which can fail-fast?

TODO: Write some test code that determines length of increasingly larger atoms to see what happens.

Doc and Design needs help or surgery

In fact, atom_length/2 takes "anytext" (see Predicates that operate on strings) as first argument:

                                          +--- emptylist: the empty list, and edge case (is it charlist or codelist? we don't know!)
                                          |
                         +--- textlist ---+--- charlist : nonempty list of chars ("characters", atoms of length 1)
                         |                |
                         |                +--- codelist : nonempty list of unicode code points (integers between 0 and 0x10FFFF)
                         |
            +--- text ---+
            |            |
            |            |               +--- atom : Prolog atoms, including the empty atom
            |            |               |
anytext  ---+            +--- stringy ---+
            |                            |
            |                            +--- string : SWI-Prolog strings, including the empty string
            |
            +--- number  acceptable because a number can be
                         transformed into text (according to some
                         unspecified convention...)

We read:

The SWI-Prolog version accepts all atomic types, as well as code-lists and character-lists. New code should avoid this feature and use write_length/3 to get the number of characters that would be written if the argument was handed to write_term/3.

This is not really right or a replacement, as write_length/3 works with "terms" not with "texty things" (anytext), and so writes out a list [a,b,c] as "[a,b,c]", yielding length 7 as opposed to atom_length/2 which serializes it as 'abc', yielding length 3.

The fact that there is a string_length/2 and atom_length/2 that also take arbitrary "texty things" as first arguments is relatively confusing.

How it should be done:

atom_length/2 should accept atoms or string (because of migration issues when strings were introduced)
string_length/2 shouldn't exist, it's just atom_length/2 pretending to care about strings
stringy_length/2 should exist as an alias to atom_length/2 with a conceptually clearer name as it accepts strings and atoms
texty_length/2 also accepts text lists
anytext_length/2 also accepts numbers and that is probably a step too far already. You cannot really pretend to sweep the formatting issue under the rag:

?- format(string(S),"~g",[pi]),atom_length(S,L).
S = "3.14159",
L = 7.

?- format(string(S),"~f",[pi]),atom_length(S,L).
S = "3.141593",
L = 8.

?- PI is pi, atom_length(PI,L).
PI = 3.141592653589793,
L = 17.

In all cases, the question of whether to throw or fail on out-of-type or out-of-domain values on first or second position stays open.

My instinct would be to

put as little cognitive load on the developer (no special cases) and
to make the program "brittle by default" (any fishy thing leads to an announcement by exception unless it has been explicity okayed by developer).

So always throw as default (throws can be caught & suppressed while fails leave you none the wiser and can never be transformed into meaningful errors), and add a "lenient version" that fails instead: atom_length_lenient/2.

Or better yet, add a Prolog extension similar to with_output_to/2. Something like with_lenient_length(SubGoal) that suppresses all exceptions thrown by length predicates in the prooftree for SubGoal.

That sounds suitably radical! I love it.

atom_length has tunable behaviour

current_prolog_flag says:

... if the system.wide flag iso is set then ...

atom_length/2 yields a type error if the first argument is a number.

Otherwise it serializes the number into a atom (according to some unspecified convention) and reports the length of that. This may be practical but seems odd, because it's no longer a purely syntactic operation:

?-  atom_length(1,N).
N = 1.

?- atom_length(1.334,N).
N = 5.

?- atom_length(1r3,N).
N = 3.

However, as the manual says: "New code should avoid this feature and use write_length/3 to get the number of characters that would be written if the argument was handed to write_term/3."

What is a 'character' aka. 'char'?

char(A) :- atom(A),atom_length(A,1).

Note this little inconsistency

?- atom_length(a,-1).
false.

So atom_length/2 is lenient regarding the presence of a negative length.

But length/2 (i.e. list length) is not:

?- length([1],-1).
ERROR: Domain error: `not_less_than_zero' expected, found `-1'

In fact, the ISO standard says that atom_length/2 should throw on negative length.

On the other hand, atom_length/2 is demanding on the first argument:

?- atom_length(fg(h),-1).
ERROR: Type error: `text' expected, found `fg(h)' (a compound)

?- atom_length([g,7],-1).
ERROR: Type error: `character' expected, found `7' (an integer)

?- atom_length([7,g],-1).
ERROR: Type error: `character_code' expected, found `g' (an atom)

And otherwise on the second argument:

?- atom_length([7],foo).
ERROR: Type error: `integer' expected, found `foo' (an atom)

Prolog is full of subtleties but this makes it hard to think about.

A list is fine too

It also takes lists of characters:

?- atom_length([a,b,c],N).
N = 3.

?- atom_length([a,b,12.33],N).
ERROR: Type error: `character' expected, found `12.33' (a float)

?- atom_length([h,e,l,l,o],N).
N = 5.

?- atom_length([h,e,l,l,o,555],N).
ERROR: Type error: `character' expected, found `555' (an integer)

And list of character codes (unicode codepoints):

?- atom_length([0'1,0'2,0'.,0'3],N).
N = 4.

?- X=`hello`.
X = [104, 101, 108, 108, 111].

?- atom_length(`hello`,N).
N = 5.

However it refuses to work if the list isn't ground (although in principle it could say 3 in the following case, it would be at least tentatively right):

?- atom_length([a,Y,c],X).
ERROR: Arguments are not sufficiently instantiated

Prolog is full of subtleties

Unit test code

Some unit test code (including unit test code for length/2) can be found here:

test_length_against_iso_prolog_wg17.pl

This is based on

https://www.complang.tuwien.ac.at/ulrich/iso-prolog/length

Replacement

Here is simple code for a unique stringy_length/2 which refers to atom_length/2 and string_length/2 as needed. This already happens, but there is still this troubling naming specificity, which I would like to get rid of:

stringy_length/2 and friends

See also