| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.98.0/docs/handbook/_sources/libraries/stemming.rst.txt |
.. _library_stemming:
stemmingThis library provides word stemming predicates for English text, with support for different word representations: atoms, character lists, or character code lists.
The library includes implementations of two well-known stemming algorithms:
Open the `../../apis/library_index.html#stemming <../../apis/library_index.html#stemming>`__ link in a web browser.
To load all entities in this library, load the loader.lgt file:
::
| ?- logtalk_load(stemming(loader)).
To test this library predicates, load the tester.lgt file:
::
| ?- logtalk_load(stemming(tester)).
The stemming predicates are defined in parametric objects where the parameter specifies the word representation:
atom - words are represented as atomschars - words are represented as lists of characterscodes - words are represented as lists of character codes
The parameter must be bound when sending messages to the objects.
Porter Stemmer
To stem a single word using atoms: :: | ?- porter_stemmer(atom)::stem(running, Stem). Stem = run yes To stem a list of words: :: | ?- porter_stemmer(atom)::stems([running, walks, easily], Stems). Stems = [run, walk, easili] yes Using character lists: :: | ?- porter_stemmer(chars)::stem([r,u,n,n,i,n,g], Stem). Stem = [r,u,n] yes Lovins Stemmer
To stem a single word using atoms:
::
| ?- lovins_stemmer(atom)::stem(running, Stem).
Stem = run
yes
To stem a list of words:
::
| ?- lovins_stemmer(atom)::stems([running, walks, easily], Stems).
Stems = [run, walk, eas]
yes
.. _porter-stemmer-1:
Porter Stemmer
The Porter stemming algorithm, developed by Martin Porter in 1980, is one of the most widely used stemming algorithms for the English language. It operates through a series of steps that progressively remove suffixes from words: 1. **Step 1a**: Handle plurals (e.g., "caresses" â "caress", "ponies" â "poni") 2. **Step 1b**: Handle past tense and progressive forms (e.g., "agreed" â "agree") 3. **Step 1c**: Replace terminal "y" with "i" when preceded by a vowel 4. **Steps 2-4**: Remove various suffixes based on the "measure" of the stem 5. **Step 5**: Clean up final "e" and double consonants The algorithm uses the concept of "measure" (m), which counts vowel-consonant sequences in the stem, to determine when suffixes can be safely removed. **Reference**: Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137. .. _lovins-stemmer-1: Lovins Stemmer
The Lovins stemming algorithm, developed by Julie Beth Lovins in 1968, was one of the earliest stemming algorithms. It takes a different approach from Porter:
The Lovins algorithm tends to be more aggressive than Porter, sometimes producing stems that are not actual words but are consistent across related word forms.
Reference: Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 22-31.
Both algorithms are designed for English text only.