This SRFI is currently in ``draft'' status.
To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
`<srfi minus 113 at srfi dot schemers dot org>`

. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.

- Received: 2013/05/31
- Draft: 2013/06/01-2013/08/01

*Sets* and *bags* (also known as multisets) are unordered collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated *equality predicate*; bags do not. *Integer sets* are a variant of sets that can only contain non-negative exact integers that are less than a maximum value specified when the integer set is created, and that always use numerical equality as their equality predicate.

16. What provisions should we have for controlling which element of a set is preserved when a new element equal to an existing element is added?

Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Racket provides general sets similar to those of this proposal, though with fewer procedures, and there is a Chicken egg called `sets`

(unfortunately undocumented), which provides a minimal set of procedures.

Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and can be optimized.

Although sets subsume integer sets, providing them separately allows for increased implementation efficiency. There are two Chicken eggs that provide integer sets: `iset`

, which is conceptually similar to the integer sets described here, but using bignums rather than bytevectors; and `cis`

, which uses lists of integer intervals.

Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme. However, `size`

is used instead of `length`

to express the number of elements in the collection, because `length`

implies order.

Although it's possible to use the general sets of this SRFI to contain characters, the use of SRFI 14 is recommended instead. The names in this SRFI are harmonized with SRFI 14, except that SRFI 14 does not contain analogues of the `set>>?`

, `set<=?`

, `set>=?`

, `set-remove`

, or `set-partition`

procedures.

Sets do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.

The interface to general sets and bags depends on SRFI 114 comparators, despite that SRFI having a higher number than this one (for hysterical raisins). Comparators conveniently package the equality predicate of the set with the hash function or comparison procedure needed to implement the set efficiently.

During the discussion of this SRFI, a new expression type `enumeration-case`

similar to `case`

was proposed. It would dispatch based on a member of an enumeration set, with enumeration-specific error checking? Specifically, an error would be signaled if the cases are not exhaustive, or if any clause datum is not a member of the enumeration set. Unfortunately, due to phasing problems, `enumeration-case`

would only work with enumerations defined using `define-enum`

, and it would be awkward to implement it with `syntax-rules`

only. Therefore, it was not included.

Sets, bags, and integer sets are mutually disjoint, and disjoint from other types of Scheme objects.

It is an error for any procedure defined in this SRFI to be invoked on sets or bags with different comparators.

The procedures of this SRFI, by default, are "pure functional" — they do not alter their parameters. However, this SRFI defines a set of "linear-update" procedures, all of whose names end in `!`

. They have hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation.

It is an error to rely upon these procedures working by side effect. For example, this is not guaranteed to work:

(let* ((set1 (set 'a 'b 'c)) ; set1 = {a,b,c}. (set2 (set-adjoin! 'd))) ; Add d to {a,b,c}. set1) ; Could be either {a,b,c} or {a,b,c,d}.

However, this is well-defined:

(let ((set1 (set 'a 'b 'c))) (set-adjoin! set1 'd)) ; Add d to {a,b,c}.

So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified character set (hence the term "linear update").

There are two benefits to this convention:

Implementations are free to provide the most efficient possible implementation, either functional or side-effecting.

Programmers may nonetheless continue to assume that sets are purely functional data structures: they may be reliably shared without needing to be copied, uniquified, and so forth.

In practice, these procedures are most useful for efficiently constructing sets in a side-effecting manner, in some limited local context, before passing the character set outside the local construction scope to be used in a functional manner.

Scheme provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions — there's no linear type checker or run-time mechanism for detecting violations.

Note that if the implementation is entirely pure functional, it is allowed to return existing sets rather than newly allocated ones, even where this SRFI explicitly says otherwise.

Constructors:

`set`

,`set-adjoin`

,`set-delete`

,`set-unfold`

Predicates:

`set?`

,`set-contains?`

,`set-empty?`

,`set-disjoint?`

Accessor:

`set-value`

Linear update procedures:

`set-adjoin!`

,`set-delete!`

,`set-intern!`

The whole set:

`set-size`

,`set-find`

,`set-count`

,`hash-table-any?`

,`hash-table-every?`

Mapping and folding:

`set-map`

,`set-for-each`

,`set-fold`

,`hash-table-filter`

,`hash-table-filter!`

,`hash-table-remove`

,`hash-table-partition`

Copying and conversion:

`set-copy`

,`set-empty-copy`

,`set->list`

,`list->set`

,`list->set!`

Subsets:

`hash-table=?`

,`hash-table<?`

,`hash-table>?`

,`hash-table<=?`

,`hash-table>=?`

Set operations:

`hash-table-union!`

,`hash-table-intersection!`

,`hash-table-difference!`

Additional bag procedures:

`bag-element-count`

,`bag-for-each-unique`

,`bag-fold-unique`

,`bag-increment!`

,`bag-decrement!`

,`bag->set`

,`set->bag`

,`set->bag!`

Additional integer set procedures:

`integer-set`

,`make-universal-integer-set`

,`integer-set-complement`

,`integer-set-complement!`

,`integer-set-min`

,`integer-set-max`

`(set `

*comparator* *element* ... `)`

Returns a newly allocated empty set. The *comparator* argument is a SRFI 114 comparator. The *elements* are used to initialize the set.

`(set-adjoin `

*set*` `

*element* ... `)`

Returns a newly allocated set that uses the same comparator as *set* and contains all the values of *set*, and in addition each *element* unless it is already the same (in the sense of *set's* equality predicate) as an existing member. It is an error to add an element to *set* that does not return `#t`

when passed to the type test procedure of the comparator with which *set* was defined.

`(set-delete `

*set*` `

*element* ... `)`

Returns a newly allocated set containing all the values of *set* except for the *elements*. Any *element* that is not a member of the set is ignored.

`(set-unfold `

*comparator continue? mapper successor seed*`)`

Create a newly allocated set as if by `set`

using *comparator*. If the result of applying the predicate *continue?* to *seed* is `#f`

, return the set. Otherwise, apply the procedure *mapper* to *seed*. The value that *mapper* returns is added to the set. Then get a new seed by applying the procedure *successor* to *seed*, and repeat this algorithm.

`(set? `

*obj*`)`

Returns `#t`

if *obj* is a set, and `#f`

otherwise.

`(set-member? `

*set*` `

*element*`)`

Returns `#t`

if *element* is a member of *set* and `#f`

otherwise.

`(set-empty? `

*set*`)`

Returns `#t`

if *set* has no elements and `#f`

otherwise.

`(set-disjoint? `

*set _{1} set_{2}*

`)`

Returns `#t`

if *set _{1}* and

`#f`

otherwise.`(set-value `

*set*` `

*element*`)`

Returns the element of *set* that is equal, in the sense of *set's* equality predicate, to *element*. If *element* is not a member of *set*, it is returned.

`(set-adjoin! `

*set*` `

*element*`)`

Returns a set that uses the same comparator as *set* and contains all the values of *set*, and in addition each *element* unless it is already a member. It is an error to add an element to *set* that does not return `#t`

when passed to the type test procedure of the comparator with which *set* was defined.

`(set-delete! `

*set*` `

*element*`)`

Returns a set containing all the values of *set* except for the *elements*. Any *element* that is not equal (in the sense of*set's* equality predicate) to some member of the set is ignored.

`(set-intern! `

*set*` `

*element*`)`

Returns two values. If there is an element of *set* that is equal, in the sense of *set's* equality predicate, to *element*, then that element is returned as the first value and *set* is returned as the second value. If there is no such element, then the first value is *element* and the second value is the result of `(set-add! `

*set element*`)`

.

`(set-search! `

*set element failure success*`)`

The *set* is searched for *element*. If it is not found, then the *failure* procedure is tail-called with two continuation arguments, *insert* and *ignore*, and is expected to tail-call one of them. If *element* is found, then the *success* procedure is tail-called with the matching element of *set* and two continuations, *update* and *remove*, and is expected to tail-call one of them.

The effects of the continuations are as follows (where *obj* is any Scheme object):

Invoking

`(`

*insert obj*`)`

causes*element*to be inserted into*set*.Invoking

`(`

*ignore obj*`)`

causes*set*to remain unchanged.Invoking

`(`

*update new-element obj*`)`

causes*new-element*to be inserted into*set*.Invoking

`(`

*remove obj*`)`

causes the matching element of*set*to be removed from it.

In all cases, two values are returned: the possibly updated *set* and *obj*.

`(set-size `

*set*`)`

Returns the number of elements in *set* as an exact integer.

`(set-count `

*predicate*` `

*set*`)`

Returns the number of elements of *set* that satisfy *predicate* as an exact integer.

`(set-find `

*predicate set failure*`)`

Returns an arbitrarily chosen element of *set* that satisfies *predicate*, or the result of invoking *failure* with no arguments if there is none.

`(set-any? `

*predicate*` `

*set*`)`

Returns `#t`

if any element of *set* satisfies *predicate*, or `#f`

otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.

`(set-every? `

*predicate*` `

*set*`)`

Returns `#t`

if every element of *set* satisfies *predicate*, or `#f`

otherwise. Note that this differs from the SRFI 1 analogue because it does not return a useful value.

`(set-map `

*comparator*` `

*proc*` `

*set*`)`

Applies *proc* to each element of *set* in arbitrary order and returns a newly allocated set, created as if by `set`

, which contains the results of the applications. For example:

(set-map string-ci=? symbol->string (set eq? 'foo 'bar 'baz)) => (set string-ci=? "foo" "bar" "baz")

`(set-for-each `

*proc*` `

*set*`)`

Applies *proc* to *set* in arbitrary order, discarding the returned values. Returns an unspecified result.

`(set-fold `

*proc*` `

*nil*` `

*set*`)`

Invokes *proc* on each member of *set* in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, *nil* is used as the second argument. Returns the result of the last invocation.

`(set-filter `

*predicate*` `

*set*`)`

Returns a newly allocated set, created as if by `set-empty-copy`

, containing just the elements of *set* that satisfy *predicate*.

`(set-filter! `

*predicate*` `

*set*`)`

A linear update procedure that returns a set containing just the elements of *set* that satisfy *predicate*.

`(set-remove `

*predicate*` `

*set*`)`

Returns a newly allocated set, created as if by `set-empty-copy`

, containing just the elements of *set* that do not satisfy *predicate*.

`(set-partition `

*predicate*` `

*set*`)`

Returns two values: a newly allocated set, created as if by `set-empty-copy`

, containing just the elements of *set* that satisfy *predicate*, and another newly allocated set, created as if by `set-empty-copy`

, containing just the elements of *set* that do not satisfy *predicate*.

`(set-copy `

*set*`)`

Returns a newly allocated set containing the elements of *set*, and using the same comparator.

`(set-empty-copy `

*set*`)`

Returns a newly allocated set containing no elements, and using the same comparator as *set*.

`(set->list `

*set*`)`

Returns a newly allocated list containing the members of *set* in unspecified order. However, repeated calls to this procedure will return a list in the same order until the set is mutated.

`(list->set `

*comparator list*`)`

Returns a newly allocated set, created as if by `set`

using *comparator*, that contains the elements of *list*.

`(list->set! `

*comparator list*`)`

Returns a set using *comparator* that contains the elements of *list*.

`(set=? `

*set* ...`)`

Returns `#t`

if each *set* contains the same elements.

`(set<? `

*set* ...`)`

Returns `#t`

if each *set* other than the last is a proper subset of the following *set*, and `#f`

otherwise.

`(set>? `

*set* ...`)`

Returns `#t`

if each *set* other than the last is a proper superset of the following *set*, and `#f`

otherwise.

`(set<=? `

*set* ...`)`

Returns `#t`

if each *set* other than the last is a subset of the following *set*, and `#f`

otherwise.

`(set>=? `

*set* ...`)`

Returns `#t`

if each *set* other than the last is a superset of the following *set*, and `#f`

otherwise.

`(set-union `

*set _{1}*

` `

`)`

`(set-intersection `

*set _{1}*

` `

`)`

`(set-difference `

*set _{1}*

` `

`)`

`(set-xor `

*set _{1}*

` `

`)`

Returns a newly allocated set that is the union, intersection, asymmetric difference, or symmetric difference of the *sets*. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

`(set-union! `

*set _{1}*

` `

`)`

`(set-intersection! `

*set _{1}*

` `

set`)`

`(set-difference! `

*set _{1}*

` `

`)`

`(set-xor! `

*set _{1}*

` `

`)`

Linear update procedures returning a set that is the union, intersection, asymmetric difference, or symmetric difference of the *sets*. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

Bags are like sets, but can contain the same object more than once. However, if two elements that are the same in the sense of the equality predicate, but not in the sense of `eqv?`

, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different comparators.

The procedures for creating and manipulating bags are the same as those for sets, except that `set`

is replaced by `bag`

in their names, and that adding an element to a bag is effective even if the bag already contains the element. If two elements in a bag are the same in the sense of the equality predicate, the implementation may in fact store just one of them. There are no procedures named `bag-xor`

or `bag-xor!`

.

`(bag-element-count `

*bag*` `

*element*`)`

Returns an exact integer representing the number of times that *element* appears in *bag*.

`(bag-for-each-unique `

*proc*` `

*bag*`)`

Applies *proc* to each unique element of *bag* in arbitrary order, passing the element and the number of times it occurs in *bag*, and discarding the returned values. Returns an unspecified result.

`(bag-fold-unique `

*proc*` `

*nil*` `

*bag*`)`

Invokes *proc* on each unique element of *bag* in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, *nil* is used as the third argument. Returns the result of the last invocation.

`(bag-increment! `

*bag element count*

`)`

`(bag-decrement! `

*bag element count*

`)`

Linear update procedures that return a bag with the same elements as *bag*, but with the element count of *element* in *bag* increased or decreased by the exact integer *count* (but not less than zero).

`(bag->set `

*bag*`)`

`(set->bag `

*set*`)`

The `bag->set`

procedure returns a newly allocated set containing the unique elements of *bag*. The `set->bag`

procedure returns a newly allocated bag containing the elements of *set*. The comparator of the result is the same as the comparator of the argument.

The elements of an integer set are non-negative exact integers less than the set's *limit*, which is specified when it is created. Except as noted below, the procedures for creating and manipulating integer sets are the same as those for sets, except that `set`

is replaced by `integer-set`

in their names.

Wherever a newly allocated integer set is returned, it has the same limit as the source sets. It is an error for a single procedure to be invoked on integer sets with different limits.

There are no `integer-set-value`

or `integer-set-intern!`

procedures.

`(integer-set `

*limit element* ...`)`

Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to *limit* - 1, where *limit* is an exact non-negative integer. The set contains the *elements*.

`(universal-integer-set `

*limit*`)`

Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to *limit* - 1, where *limit* is an exact non-negative integer. The set contains all possible elements.

`(range->-integer-set `

*limit low high*`)`

Returns a newly allocated integer set. The possible elements of the set are the exact integers from 0 to *limit* - 1, where *limit* is an exact non-negative integer. The set contains the elements from *low* (inclusive) to *high* (exclusive).

`(integer-set-adjoin-range `

*integer-set low high*`)`

Returns a newly allocated integer set with the same limit as *integer-set* and that contains all its values, and in addition each integer from *low* (inclusive) to *high* (exclusive) unless it is already equal to an existing member.

`(integer-set-adjoin-range! `

*integer-set low high*`)`

A linear update procedure that returns an integer set with the same limit as *integer-set* and that contains all its values, and in addition each integer from *low* (inclusive) to *high* (exclusive) unless it is already equal to an existing member.

`(integer-set-complement `

*integer-set*`)`

Returns a newly allocated integer set that is the complement of *integer-set*.

`(integer-set-complement! `

*integer-set*`)`

A linear update procedure that returns a set that is the complement of *integer-set*.

`(integer-set-min `

*integer-set*`)`

`(integer-set-max `

*integer-set*`)`

Returns the smallest or largest integer in *integer-set*, or `#f`

if there is none.

`(integer-set-delete-min! `

*integer-set*`)`

`(integer-set-delete-max! `

*integer-set*`)`

Linear update procedures that return two values: the smallest or largest integer in *integer-set* or `#f`

if there is none, and a set which contains all the elements of *set* except the integer being returned.

The implementation places the identifiers defined above into the `sets`

library.

Sets and bags are implemented as a thin veneer over hashtables, and integer sets over bytevectors. For clarity, the integer set implementation stores just one value into each bytevector element rather than the eight that would be possible.

The sample implementation contains the following files:

`sets-impl.scm`

– implementation of general sets`bags-impl.scm`

– implementation of bags`isets-impl.scm`

– implementation of integer sets`count.scm`

– provides macros for conveniently looping over bytevectors`srfi-4-shim.scm`

– minimal implementation of R7RS bytevectors on top of SRFI 4's u8vectors`sets.sld`

– an R7RS library named`(sets)`

`sets.scm`

– a Chicken library`sets-test.scm`

– test suite

The test suite will work with the Chicken `test`

egg, which is provided on Chibi as the `(chibi test)`

library.

Copyright (C) John Cowan 2013. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Mike Sperber