Cluster Option

The Cluster Option allows users to speed up the solving process by launching multiple connected FLAC3D instances in cluster computing environments.

Cluster Computing in FLAC3D

Introduction

Cluster computing is a type of computing where multiple computers are connected as a cluster and execute operations together as a single entity. Software on each node (i.e., a computer acting as a server in a cluster) does the same task simultaneously and coordinates the use of many processors using parallel computing techniques, which provides enhanced computational power and resolves the demand for content criticality and process services faster.

The Cluster option in FLAC3D implements a cluster computing mode under a leader-follower architecture (Figure 1). Each node in the cluster network runs a FLAC3D instance, and all the FLAC3D instances are connected through a Message Passing Interface (MPI) for data communication and management.

../../../../../_images/scheme.svg

Figure 1: FLAC3D cluster computing architecture.

The leader is the head node where all the MPI and FLAC3D commands are initiated. The FLAC3D instance on the leader node manages all general FLAC3D data processing, including:

  • saving and restoring the model state,

  • creating and modifying the topology (zones, gridpoints, attaches, zone joints, etc.),

  • configuring physical properties, and

  • initiating FISH calls.

The leader also handles many MPI-specific functionalities, including:

  • utilizing a domain partition algorithm to split the model for parallel computing,

  • distributing model data to followers,

  • gathering model data from followers, and

  • managing global data exchange between cluster nodes.

The followers are non-leader nodes in the cluster environment that run concomitant FLAC3D instances. The FLAC3D instances on follower nodes do not take user input. They are passive while not cycling and are activated when data is distributed from the leader.

After domain partitioning and model distribution, all the nodes (leader and followers) in the cluster environment hold one portion of the original model (Figure 2). All nodes cycle and intra-cluster data synchronization takes place between each node and every other node during cycling.

../../../../../_images/plot1.png
../../../../../_images/plot2.png

Figure 2: Original model (above) and one model portion on a node after domain partitioning (below).

Utilizing the above architecture, significant speedup can be achieved for the solving process. Figure 3 shows the results of a strong scalability test to solve a specific mechanical model with 8 million zones on multiple nodes using the cluster option. The model performs about 20 times faster on 32 nodes and scales from four to 32 nodes following a mostly linear pattern.

../../../../../_images/speedup.svg

Figure 3: Strong scalability of speedup: test performed on Amazon Web Services (AWS), node type m5a.4xlarge, with 16 CPU Cores and 64 GB RAM per node.

The cluster option in FLAC3D does not affect the results: results are identical to FLAC3D runs without the cluster option, with the same level of accuracy and determinism. Save files are cross-compatible between FLAC3D with and without the cluster option.

Supported FLAC3D Features

Not all features in FLAC3D are supported for cluster computing at this time. Below is a list of currently supported features. Support for additional features will be added in future updates.

Supported Features

  • Continuous mechanical analysis with both small and large strain configurations.

  • Discontinuous mechanical analysis with zone joints (in small strain only at this time).

  • Open pit excavation with apply relax conditions and backfill.

  • Dynamic analysis, including Rayleigh/Maxwell damping settings and quiet/free-field boundary condition.

  • Fluid undrained pore pressure response (both`` model configure fluid-flow`` and model fluid active off required).

  • General FISH callback functionalities. Note: nulling or deleting model topology during cycling is not yet supported.

  • All Itasca constitutive models.

  • Factor-of-safety analysis.

  • General FLAC3D functionalities: create/modify models, save/restore, pre-processing and post-processing (history, plot, etc.).

  • Local model partial visualization: the local model partials on every node can be visualized and plotted during cycling (GUI version of FLAC3D needed).exit

  • Result determinism:

Features planned as future updates include:

  • structural elements,

  • thermal analysis, and

  • fluid analysis with active fluid flow.

Using the Cluster Option in FLAC3D

To use FLAC3D in cluster computing environments, it is important to ensure that all FLAC3D instances can be launched correctly, and that the model has been configured for the Cluster option.

Launching FLAC3D in Cluster Mode

There are three things to check before launching FLAC3D in a cluster environment.

  1. A compatible MPI implementation must be installed on all the nodes (see List of Supported MPI Implementations below).

  2. The same version of FLAC3D must be used on all nodes.

  3. SSH connection between nodes must be available.

To use the Cluster option, launch the FLAC3D instances using the MPI job launcher mpirun/mpiexec (or job schedulers compatible with mpirun) with an extra commandline argument mpi. The typical command to launch multiple FLAC3D console instances in a cluster environment with OpenMPI is:

$ mpirun -n <number-of-processes> --host <list-of-cluster-nodes> \
flac3d9_console mpi “datafile”

It is also possible to launch the GUI version of FLAC3D on the leader node (for visualization, etc) while using the console version on the followers:

$ mpirun -n 1 --host <leader> flac3d9_gui mpi “datafile” : \
-n <number-of-processes-1> --host <list-of-followers> flac3d9_console mpi

The Cluster option can be used in cloud computing environments in the same way as local clusters. Itasca has been cooperating with Rescale to provide a pre-configured, easy-to-use cloud-based solution. A video tutorial on using Rescale is available. Users can also create their own solutions using any cloud provider of their choosing, such as Amazon Web Services (AWS) or Microsoft Azure.

Configuring the Model for Cluster Mode

To run the model in a cluster environment, it is necessary to configure the model by adding the command model configure cluster at the start of the project datafile. This command activates the FLAC3D instances for cluster mode and initiates the model compatibility checks. An example is shown below.

Note that model configure cluster can be given even if a run is not being done under MPI to allow checking a model for MPI compatibility ahead of time.

model new
model configure cluster ; necessary configuration for cluster mode
;model configure thermal ; incompatible with cluster mode

zone create brick size 100 100 100
zone cmodel assign mohr-coulomb
zone property bulk 3e8 shear 2e8 coh 1e6 fric 15 dens 1000
zone face skin
zone face apply velocity-normal 0 range group 'West'
zone face apply velocity-normal 0 range group 'South'
zone face apply velocity-normal 0 range group 'Bottom'
model gravity 10

model cycle 100
exit

If the FLAC3D instances have been launched correctly in cluster mode but the model is not configured, users receive an error message when trying to cycle (Figure 4).

../../../../../_images/configure.png

Figure 4: Error message for cluster mode configuration.

If unsupported FLAC3D features are used within cluster mode, users will receive an error message during command execution (Figure 5).

../../../../../_images/configure2.png

Figure 5: Error message for an unsupported feature in cluster mode.

No further datafile edits would be necessary for cluster mode should the user not intend to use unsupported cluster mode features.

Cluster Option Licenses

Itasca provides three licensing tiers for the Cluster option, based on the number of FLAC3D instances used:

Cluster-Basic: up to 8 instances.

Cluster-Advanced: up to 16 instances.

Cluster-Professional: up to 32 instances.

If the user would like to run FLAC3D more than 32 instances simultaneously, please contact Itasca for further assistance.

Cluster Option Examples

Open Pit Excavation in Cluster Mode

This example takes the base model from Open Pit Mine with Faults, restores the initial save file, and runs the excavation stage in cluster mode with four nodes. A comparison is made with the original (without cluster mode) run to compare results. The only necessary change made to the excavate-cluster datafile is the model configure cluster command.

excavate-cluster.dat

model restore 'initial'
model configure cluster ; necessary for cluster mode
zone gridpoint initialize displacement 0 0 0
zone gridpoint initialize velocity 0 0 0
zone initialize state 0

; take history of point at the edge of the pit
zone history displacement position  2465,-400,-77

zone relax excavate step 500 range group 'pit'

model solve cycles 501 and ratio 1e-5

model save 'excavate-cluster'

Vertical displacement in the normal and cluster models is shown in Figure 6. The results coincide with each other. A datafile to check result compatibility is provided below.

../../../../../_images/excavate.png
../../../../../_images/excavate-cluster.png

Figure 6: z-displacement for normal (above) and cluster (below) excavation.

model restore "excavate"
fish define get_disp
    global _vd1 = gp.disp.z(gp.near(2465,-400,-77))
end
[get_disp]
model restore "excavate-cluster" SKIP FISH
fish define check_disp(value,tol)
    global _vd2 = gp.disp.z(gp.near(2465,-400,-77))
    global _err = math.abs(_vd1-_vd2)/math.abs(value)
        if _err > tol then
        system.error = 'Z-displacemnet test failed'
    end_if
end
[check_disp(_vd1, 1e-6)]

Maxwell Damping Analysis in Cluster Mode

This example takes the base model from Site Response using Maxwell Damping, restores the static stage save file, and runs the Maxwell damping stage in cluster mode with four nodes. The only necessary change made to the datafile is the model configure cluster command.

model restore "static"
model configure cluster ; necessary for cluster mode
zone gridpoint initialize displacement (0,0,0)
zone gridpoint initialize velocity (0,0,0)
zone initialize state 0
;
model dynamic active on
;
; Histories
history delete
model history name='time' dynamic time-total
zone history name='a-acc-x' acceleration-x position 600 0 10
zone history name='a-dis-x' displacement-x position 600 0 10
zone history name='b-acc-x' acceleration-x position 440 0 10
zone history name='b-dis-x' displacement-x position 440 0 10
zone history name='c-acc-x' acceleration-x position 305 0 2.5
zone history name='c-dis-x' displacement-x position 305 0 2.5
zone history name='d-acc-x' acceleration-x position 170 0 -5
zone history name='d-dis-x' displacement-x position 170 0 -5
zone history name='e-acc-x' acceleration-x position 80 0 -5
zone history name='e-dis-x' displacement-x position 80 0 -5
;
;;; Target 5% Maxwell Damping
zone dynamic damping maxwell 0.0385 0.5 0.0335 3.5 0.052 25.0
;
zone face apply-remove
zone face apply quiet range group 'Bottom'
table 'acc' import "Coyote.acc"
[table.as.list('vel') = table.integrate('acc')]
[global mf = -den1*Vs1*g] ; with g since acceleration is in g. 
zone face apply stress-xz [mf] table 'vel' time dynamic range group 'Bottom'
zone face apply stress-xy 0.0 range group 'Bottom'
zone face apply stress-zz 0.0 range group 'Bottom'
zone dynamic free-field plane-x
zone gridpoint fix velocity-z range group 'Bottom'
zone dynamic multi-step on
;
history interval 40
model dynamic timestep fix 5.0e-4 ; so that output time interval is 2e-3
model solve time-total 26.83 
model save "dynamic-maxwell-cluster"
../../../../../_images/siteResponse.png

Figure 7: Maximum shear strain increment for cluster run.

Frequently Asked Questions

List of Supported MPI Implementations

The cluster option in FLAC3D relies on MPI for data communication. The Message Passing Interface (MPI) is a standardized and portable communication protocol used for programming parallel computers. It allows multiple processes to communicate with each other by sending and receiving messages, which is essential for cluster computing architectures.

As MPI is only a protocol, users also need the corresponding implementation in binary form. Among the many implementations available for MPI, FLAC3D cluster option officially supports:

  1. OpenMPI, version 4.1.2.

  2. Microsoft MPI (MS-MPI), version 10.1.3.

Note: It is possible that the FLAC3D cluster option could also work with other MPI implementations and versions, however, only those listed above are formally supported for the current Cluster option implementation.

Supported Operating Systems

The FLAC3D Cluster option officially supports both Ubuntu Linux 22.04 LTS and Windows.

Since the OpenMPI v4.1.2 and SSH server are both system-level dependencies on Ubuntu Linux 22.04 LTS, no extra action is needed for deployment. For this reason, among others, it is generally recommended to run the FLAC3D Cluster option under Linux, if possible.

It is also possible to use the FLAC3D cluster option on computers running Windows. This requires installing the MS-MPI and SSH server system components on all the nodes in a cluster environment. Note that the mpirun job launcher commands for MS-MPI may differ from those from those of OpenMPI.

The FLAC3D Cluster option requires all instances to be launched from the same operating system. It is not possible to connect a Windows FLAC3D instance with a Linux one.

Hardware Requirements for Cluster Option

There ares no additional hardware requirements for the FLAC3D cluster option. However, it is up to the user to make sure that the nodes used for the cluster option have enough RAM to hold the model and cycle.

Unfortunately, there is no way for Itasca to make accurate RAM usage predictions for every use case. But some general rules apply.

  1. FLAC3D cluster mode utilizes a leader-follower architecture. Since model processing and domain partitioning are both managed on the leader node, the leader node needs enough RAM to hold the whole model and extra RAM space is also required for domain partitioning and data distributing/gathering.

  2. The follower nodes only hold one partition of the original model each. They do not need as much memory as the leader node.

The cluster mode also benefits from faster inter-node connection, as it reduces the communication overhead and speeds up the performance.

How to Prepare the Model for Cluster Mode

While the Cluster option and normal FLAC3D runs generate identical results, one notable difference between them is that the cluster mode has an limitation on model input: it requires the input zone model to be topologically contiguous.

  1. Gaps between model elements (zones and gridpoints) are not allowed.

  2. Overlapping model elements are not acceptable.

It is possible to cycle a model with a non-contiguous mesh in FLAC3D without cluster mode, but cycling in cluster mode produces an error during domain partitioning (Figure 8).

../../../../../_images/contiguous.png

Figure 8: Non-contiguous mesh error.

Use the Cluster Option Effectively

While the FLAC3D Cluster option has been optimized for performance, the ultimate speedup obtained using it depends on many factors. Ineffective usage of the cluster mode can slow down the solving process significantly. A few points are listed below to help users in their efforts to achieve best performance.

  • Create a quality mesh: while non-conformal meshes and bad quality zones can work in cluster mode, they usually introduce more model elements (attach conditions, joints, etc.), which increases memory usage and slows down cycling.

  • Avoid unnecessary cycle commands: while there are use cases for doing small step model cycles as validation, every cycle command leads to a complete pair of data distribution/gathering in cluster mode, which introduces unnecessary overhead and slows down the performance.

  • Launch FLAC3D instances in cluster mode wisely: all the examples in this documentation use one FLAC3D instance per node. However, there is no actual limitation on how many instances can be launched on a single node. While there are certain situations in which more than one instance per node could fit better, the user should always handle this strategy with caution. As every FLAC3D instance also runs multi-threaded, launching more than one instance on a single node can easily add extra communication overhead and slow down the solving process.