Splitting: Using Multiple Threads in FISH

Definition

FISH splitting is passing an aggregate type (a list, an array, a container of objects, etc.) as a FISH function argument using the splitting operator (::), causing the function to be repeatedly performed on all members of the aggregate type. This is a more syntactically concise alternative to loop statements.

Consider:

local a = list
loop foreach local pnt data.scalar.list
   a = list.append(a,data.scalar.pos(pnt)->x)
endloop

This builds a list (a) of \(x\)-positions of all scalars on the global data scalar list. In an equivalent split version of the above, data.scalar.pos is repeatedly called on each member of the data.scalar.list — all in one line:

Significantly, splitting is performed on all available threads automatically. Considering that looping over lists of model objects and other aggregate types is quite commonly necessary in FISH, the advantage here is obvious.

Implementation

  • In order to make a split call, affix the split operator :: to one or more arguments of the function (or operator or library call). Arguments that are not split will be the same in every execution of the function. If multiple arguments are split, they must contain the same number of elements.

  • Any iterable FISH type may be split. FISH iterable types include: pointers to containers, strings, vectors, tensors, matrices, lists, maps, and arrays.

  • A splitting operation on an intrinsic—regardless of that function’s original return value—returns a list.

  • FISH library functions must be thread-safe to use splitting (these are identified with the notation := in function reference documentation). Otherwise the splitting operation will still be performed, just sequentially on a single thread.

Tip

Splitting intrinsics is fast but generally it is not as efficient in a multi-threaded environment as a FISH operator (defined below). An operator using a single split will provide maximal efficiency (see the Another Comparison example below).

Most often splitting will be done on lists or on pointers to containers of model objects (such as grid points, balls, or blocks). The number of threads made available to splitting can be controlled using the global.threads intrinsic, though by default the program determines the optimal number of threads for maximum performance.

Use

Splitting

Splitting, as defined above and seen in the examples below, is passing a “split” argument to a FISH function that is thread-safe (and will thereby accept such an argument).

To illustrate the concept a bit further, splitting is a kind of abbreviation of a loop statement, which itself is an abbreviation of repeated single operations. The example above, if written as individual lines, might look like:

a = list
a = list.append(a,list.at(0,data.scalar.pos)->x)
a = list.append(a,list.at(1,data.scalar.pos)->x)
a = list.append(a,list.at(2,data.scalar.pos)->x)
; and so on up to n ...

One can readily see the programmatic desirability for loop constructs to handle range-based (e.g., 0-\(n\) as here) repeated function calls. Where loops reduce the repetition of function calls, splitting eliminates the need for loop statements themselves, leaving behind the core elements of concern: the range that governs the repetition, and the function to be repeated. The comparison example below illustrates.

Effectively using splitting requires a certain change of perspective and approach from traditional sequential programming. But once the user becomes comfortable, the reward is quickly and efficiently performing operations on large quantities of data using a relatively small amount of code.

List Filtering

Splitting in combination with boolean list filtering can be used to quickly find a sub-list of objects selected by a specific criteria. For example, the following code will find the list of all data scalars that are tagged with the group ‘surface’.

This can be done as a single line of FISH as follows:

Splitting on Assignment

Splitting may also be performed on assignment to a library function. In this case, the user must indicate whether the right-hand-side of the assignment (after the equals sign) will be split or not. If not, the same value will be assigned to each split assignment to the function. If split, the elements of the list on the right will be assigned sequentially to each call to the function.

Right-hand splitting is indicated by appending the splitting operator :: to any of the assignment operators =, +=, -=, *=, or /=. So =::, +=::, -=::, *=::, and /=::.

For example, the following line of FISH will increment a random value from 0.0 to 1.0 to the x-coordinate of every data scalar in the model.

The concepts described above are further explored in the Examples section below.

Operators: Writing Functions for Multiple Threads

A FISH operator is a user-written function designed to be executed in a multi-threaded environment — that is, one that will accept splitting, as described above.

Because a function must be safe when multiple threads are running simultaneously, operators dwell in a restricted environment. A number of rules constrain operators differently from normal FISH functions:

  1. Operators must always take at least one argument.

  2. Operators cannot call normal FISH functions. They may call other operators.

  3. Operators may only call FISH library functions that are thread safe. Library functions are tagged as thread safe if the ‘=’ sign is preceded by a ‘:’ in the reference documentation.

  4. Operators may only write to global symbols when the lock statement is used.

  5. Operators cannot modify the values passed as arguments.

  6. Operators cannot call library functions on assignment (on the left hand side of an equality operator) and pass a pointer to an object, unless that object was passed in as an argument.

  7. Operators cannot call other operators and pass a pointer to an object, unless that object was passed in as an argument.

Be aware that reading or writing to global symbols must be synchronized across all executions running simultaneously, and can therefore severely affect overall performance. It is recommended that local variables be used exclusively where possible.

Operators are created using the fish operator command, with arguments following just like fish define. The FISH lines in the definition are the same as for a normal function, subject to the restrictions above.

In order to give examples of operators, we will first generate some data to operate upon. The following FISH function will generate 100,000 data scalars at random points in a 10x10x10 cube and give them random values from 0.0 to 10.0.

The following example operator determines if the x-coordinate of a particular scalar object falls within a given range, and if so it both assigns it to the group inside in slot mark and returns true, or assigns it to the group outside and returns false.

To execute the operator, call it as a normal FISH symbol with a split argument. Since it is an operator, the repeated calls will be executed on all available threads.

Note the use of the return statement in the operator. Since an operator is also a global symbol (just like a function), assigning a value to it would require the use of a lock statement and incur significant synchronization overhead. Instead, operators should always use the return statement to return values from the operation. As with all splitting, the return values are collected into a list.

There are occasions when reading and/or writing to a global symbol is unavoidable. In such cases, the fwd statement needs to be used inside an operator to allow a global symbol to be written to. The following example uses a FISH operator to scan the scalar data for values inside a box given by a cartesian extent, and returns the maximum, minimum, accumulated, and last value found in global symbols.

Finally, one of the most common and important uses of FISH operators (and indeed their primary reason for creation) is to use during cycling. Otherwise a single threaded FISH function that checks or changes all objects in a model will easily dominate the run time of the system.

To assign an operator to execute during cycling, use either the fish callback command or the fish-call keyword to the model solve command. Specify the name of the operator, followed by the argument(s) that will be passed for each execution.

Generally the argument assigning the list of objects to iterate over is indicated using a FISH library function, which is not directly recognized by the command processor. This must therefore be indicated using inline fish. The :: prefix must be used to indicate that the argument is to be split, as always (see Splitting). Note that these values are evaluated when the command is processed, and stored for execution during cycling.

An example of this is below:

Examples

Basic Splitting Operations

model new
zone create brick

fish define dosplitting
    ; split a container of model objects
    local pos = gp,pos (::gp,list) ; Pos is a list of positions of
                                    ; all the grid points in the model

    ; create a vector, then split it to get its largest square root
    ; then print the value
    local v = vector (l, 2, 3)
    local v2max = list.max(math.sqrt(::v))
    io,out (v2max)

    ; create a matrix and use list.xx functions to convert it
    ; to a list and obtain its max value
    local m = matrix(math,random.uniform(9), 3, 3) ; Convert 9 random values
                                                   ; to a 3x3 matrix
    local lm = list.max(list(m)) ; Find the maximum value of the matrix
                                 ; by converting into a list
end
[dosplitting]

Splitting is most often done on containers of model objects, or lists obtained from them. However, it can be done all iterative types. And in all cases these types can be converted easily to lists—which makes them amenable to operations with the c list utility functions.

Find all grid points on group ‘Surface’, then sum the reaction force

fish define SurfaceForce(groupName)
    local ingroup = gp.isgroup(::gp.list,groupName) ; Boolean list, true if
                                                    ; on face group Surface
    local gpsin = gp.list(ingroup) ; Only those gridpoints that are
                                   ; part of face group 'Surface'
    local forces = gp.force.unbal(::gpsin)->z
    return list.sum(forces)

    ; Or all in one line
    return ...
     list.sum(gp.force.unbal(::gp.list(gp.isgroup(::gp.list,groupName)))->z)
end
[SurfaceForce ('Surface')]

Assignment by Splitting

model new
zone create brick
zone cmodel assign elast ic
fish define splittingright
    ; Add the same value to the XX stress of every zone in the model
    zone.stress(::zone.list)->xx += 500
    ; Add a different random value from 0 to 1 to the x position
    ; every grid point in the model
    gp.pos(::gp.list )->x ::+= math.random.uniform(list.size(gp.list))
end
[splittingright]

Comparison Example

Both functions here calculate and store 24 (\(x\),\(y\)) values of a unit circle in 15 degree increments.

Old-Style

model new
fish define circle
    global store = list.create(24)
    loop local angle (15,360, 15)
        local radian = angle * math.degrad
        local v = vector (math.cos(radian), math.sin(radian))
        store(angle//15) = v
    end_loop
end
[circle]
fish list contents [store]

Splitting Intrinsics

model new
fish define circle
    local angles = list.range(0,360,15) * math.degrad
    global store = vector(::math.cos(::angles), ::math.sin(::angles))
end
[circle]
fish list contents [store]

Another Comparison: Old-Style vs. Splitting Instrinsics vs. Operators

The following three functions achieve identical results, but are progressively faster in execution. Example runtimes from one computer are shown for a model with two million zones.

Traditional (no splitting) (51 seconds)

fish define ground_freezing
    loop foreach local zone zone.list
        local porosity = zone.fluid.prop(zone,'porosity') ; Note: A
        local expansion = porosity * 0.09 * 1.0; Porosity * water
        local bulk = zone.prop(zone,'bulk')
        local stress_inc = bulk * expansion ; Amount to increment
        local bulk_inc * (8.96/2.16) * porosity ; Ratio of ice/water
        zonc.prop(zone, 'bulk') * bulk + bulk_inc
        zone.stress.xx(zone) = zone.stress.xx(zone) stress_inc ;
        zone.stress.yy(zone) = zone.stress.yy(zone) stress_inc
        zone.strcss.zz(zone) = zone.stress.zz(zone) stress_inc
        zone.group(zone,'state') = 'frozen'
    endloop
end

Split Intrinsics (11.5 seconds)

fish define freeze_zone
    local porosity = zone.fluid.prop(::zone.list,'porosity') ; Note: Assumin
    local expansion = porosity * 0.09 * 1.0; Porosity* water expansion * sat
    local bulk = zone.prop(::zone.list,'bulk')
    local stress_inc = bulk * expansion ; Amount to increment stress
    local bulk_inc = porosity * (8.96/2.16) ; Ratio of ice/water bulk * porosi
    zone.prop (:: zone.list, 'bulk') =:: bulk + bulk_inc
    zone.stress.xx(::zone.list) =:: zone.stress.xx(::zone.list)-stress inc ;
    zone.stress.yy(::zone.list) =:: zone.stress.yy(::zone.list)-stress inc
    zone.stress.zz(::zone.list) =:: zone.stress.zz(::zone.list)-stress inc
    zone.group (::zone.list, 'state' ) = 'frozen'
end

Operator (6.4 seconds)

fish operator freeze_zone(zone)
    local porosity = zone.fluid.prop( zone,'porosity') ; Note:
    local expansion = porosity * 0.09 * 1.0; Porosity * water
    local bulk = zone.prop(zone,'bulk')
    local stress_inc = bulk * expansion ; Amount to increment
    local bulk_inc = (8.96/2.16) * porosity ; Ratio of ice/wat
    zone.prop(zone, 'bulk') = bulk + bulk inc
    zone.stress.xx(zone) = zone.stress.xx(zone) - stress inc ;
    zone.stress.yy(zone) = zone.stress.yy(zone) - stress inc
    zone.stress.zz(zone) = zone.stress.zz(zone) - stress inc
    zone.group(zone,'state') = 'frozen'
end
[freeze_zone (::zone.list)]

Looking at the version that utilizes split intrinsics, the repeated (and solely appearing) split of zone.list is a strong indicator that refactoring the function as an operator will be advantageous.