Tutorial#

Input files#

TomoATT requires 3 input files :

  1. input parameter file (setup file for simulation parameters)

  2. source receiver file (source and receiver definitions and observation arrival times)

  3. initial model file (3d initial model)

It is recommended to use TomoATT’s python API for creating the template project directory and input files. This library is available from here.

This library is not yet available on PyPI, so you need to clone the repository and install it manually.

git clone --branch devel https://github.com/MIGG-NTU/PyTomoATT/tree/devel
cd PyTomoATT
pip install -e .

After instaling PyTomoATT, you can create a template project directory with the following command

pta init_pjt path/to/project_dir

Below is the explanation of each input file.

1. input parameter file#

All the necessary parameters for setuping a calculation are described in input parameter file in yaml format.

Below is an example of input parameter file for making a forward simulation.

version : 2

domain :
  min_max_dep : [5740.6370,5790.6370] # depth in km
  min_max_lat : [45.0,55.0]           # latitude in degree
  min_max_lon : [35.0,45.0]           # longitude in degree
  n_rtp : [40,40,40]                  # number of nodes

source :
  src_rec_file : 'src_rec_test.dat' # source receiver file

model :
  init_model_path : './test_model_init.h5' # path to initial model file

inversion :
  run_mode : 0 # 0 for forward simulation only, 1 for inversion

parallel :
  n_sims : 1         # number of simultaneous run
  ndiv_rtp : [2,2,1] # number of subdomains
  nproc_sub : 1      # number of subprocess used for each subdomain

calculation :
  convergence_tolerance : 1e-6
  max_iterations : 100
  stencil_order : 3 # 1 or 3
  sweep_type : 1   # 0: legacy, 1: cuthill-mckee with shm parallelization

output_setting :
  # output the calculated field of all sources. 1 for yes; 0 for no; default: 1
  is_output_source_field : 0
  # output internal parameters, if no, only model parameters are out. 1 for yes; 0 for no;  default: 0
  is_verbose_output : 0           
  # output model_parameters_inv_0000.dat or not. 1 for yes; 0 for no; default: 1
  is_output_model_dat : 0         

There are categories and sub-categories with setup parameters. Below is the explanation of each category and sub-category. If a category tag or sub-category tag is missing from the input parameter file, the default value will be used.

domain :#

The domain category is for setting a global domain of the simulation.

  • min_max_dep : [dep1, dep2] minimum and maximum depth of the domain in kilo meter

  • min_max_lat : [lat1, lat2] minimum and maximum latitude in degree

  • min_max_lon : [lon1, lon2] minimum and maximum longitude in degree

  • n_rtp : [nr, nt, np] number of computation nodes for r (depth),t (latitude) and p (longitude) direction

source :#

Source category is used for setting about source and receiver definitions.

  • src_rec_file : for specifying a path to a source receiver file (details of this file is in the following section). The calculated travel time at receivers will be output as a new source receiver file in OUTPUT_FILES directory.

  • swap_src_rec : 0 or 1. Set 1 for using receivers as sources, which reduces computation time when the number of sources are larger than the number of receivers. (usual when using a large dataset.)

model :#

  • init_model_path : File path for 3D initial model file. Details will be explained in the following section.

inversion :#

  • run_mode :

    • 0 for running only a forward simulation.

    • 1 for do inversion.

    • 2 run earthquake relocation. For using this mode, sources : swap_src_rec need to be set to 1. Otherwise the program will exit with an error message.

  • n_inversion_grid : the number of inversion grid.

  • n_inv_dep_lat_lon : the numbers of inversion grids for r, t and p direction.

  • min_max_dep_inv : [dep1, dep2] minimum and maximum depth of inversion grid in kilo meter.

  • min_max_lat_inv : [lat1, lat2] minimum and maximum latitude of inversion grid in degree.

  • min_max_lon_inv : [lon1, lon2] minimum and maximum longitude of inversion grid in degree.

Currently, TomoATT provide two ways for defining the inversion grid. One is a regular grid (even intervals for all axis) type and another uses used defined intervals for each axis.

For setting the user defined intervals, please use the flags below.

  • type_dep_inv : 0 for regular grid (default), 1 for user defined intervals.

  • type_lat_inv : 0 for regular grid (default), 1 for user defined intervals.

  • type_lon_inv : 0 for regular grid (default), 1 for user defined intervals.

and the coordinates where the main inversion grids are placed,

  • dep_inv : [z1,z2,z3,...] for user defined intervals.

  • lat_inv : [y1,y2,y3,...] for user defined intervals.

  • lon_inv : [x1,x2,x3,...] for user defined intervals.

other parameers for inversion seetting.

  • max_iterations_inv : The limit of iteration number for inversion.

  • step_size : Maximum step size ratio for updating model.

parallel :#

  • n_sims : number of simulutaneous run

  • ndiv_rtp : [ndiv_r, ndiv_t, ndiv_p] number of domain decomposition on r, t and p direction

  • nproc_sub : number of processes used for sweeping parallelization

The total number of mpi processes (i.e. mpirun -n NUMBER) must be n_sims*ndiv_r*ndiv_t*ndiv_p*nproc_sub, otherwise the code will stop instantly.

calculation :#

  • convergence_tolerance : convergence criterion for forward and adjoint run.

  • max_iterations : number of maximum iteration for forward and adjoint run

  • stencil_order : 1 or 3. The order of stencil for sweeping.

  • sweep_type : 0or 1. 0 is for sweeping in legacy order (threefold loops on r,t and p), 1 for cuthill-mckee node ordering with sweeping parallelization.

  • output_file_format : 0 or 1 for selecting input and output file format. 0 is for HDF5 format, 1 is for ASCII format.

output_setting :#

  • is_output_source_field : 0(no) or 1(yes). if output the calculated field of all sources.

  • is_verbose_output : 0 or 1. if output internal parameters, if no, only model parameters are out.

  • is_output_model_dat : 0 or 1. if output model_parameters_inv_0000.dat or not.

2. source receiver file#

Source receiver file is a file which defines source and receiver positions and arrival times.

Below is an example:

1   1992  1  1  2 43  56.900    1.8000     98.9000 137.00  2.80    8    305644 <- source 1
1      1      PCBI       1.8900     98.9253   1000.0000  P   18.000     <- receiver 1
1      2      MRPI       1.6125     99.3172   1100.0000  P   19.400     <- receiver 2
1      3      HUTI       2.3153     98.9711   1600.0000  P   19.200
     ....

Source line : id_src year month day hour min sec lat lon dep_km magnitude num_recs id_event (weight)
Receiver line : id_src id_rec name_rec lat lon elevation_m phase arrival_time_sec (weight)

num_recs (number of receivers for this source) need to be the same with the number of receiver lines. The last column of both source and receiver line is for put weight (on objective function). These are the optional column and set to be 1.0 if not written. name_rec need to be different for each station. lon and lat are in degree.

If the source position is out of the global domain (defined in input parameter file.), the code will flag this event as a teleseismic event and run the dedicated routine for teleseimic event. For teleseismic event, swap_src_rec will be ignored for this event (as the teleseismic case, a source is not a point but boundary surfaces).

3. initial model file#

Initial model file is used for defining parameters of input mode. Necessary parameters are vel (velocity), eta, xi, zeta.

inital model file in HDF5 format#

In HDF5 I/O mode (output_file_format: 0 in input parameter file), all the necessary parameters should be saved in one single .h5 file, with exactly the same name of dataset with parameter name written above. The dimension of dataset should be the same with n_rtp in input parameter file. Please refer the examples/inversion_small/make_test_model.py for details.

initial model file in ASCII format#

In ASCII I/O mode (output_file_format: 1 in input parameter file), all the necessary parameters should be save in one single ASCII file. The number of rows in the file need to be equivalent with the number of global nodes (i.e. n_rtp[0]*n_rtp[1]*n_rtp[2]).

The node order should be:

    # write nodes in rtp
    for ir in range(n_rtp[0]): # number of nodes on r direction
        for it in range(n_rtp[1]): # number of nodes on t direction
            for ip in range(n_rtp[2]): # number of nodes on p direction
                # write out parameters
                # eta xi zeta fun fac_a fac_b fac_c fac_f

Complete example may be found examples/inversion_small_ASCII/make_test_model.py.


Output files#

Calculated travel times at the stations will be writen in (source receiver file)_out.dat on the column for travel times.

Volumetric result data files are saved in OUTPUT_FILES directory.

As the node order in the output file is not in the global domain but each subdomains, it is necessary to do a small post-processing for extracting slices. utils/tomoatt_data_retrieval.py includes functions for this post processing. Please refer the concrete example in inversion_small/data_post_process.py for HDF5 mode, and inversion_small_ASCII/data_post_process.py for ASCII mode.

Only the final iteration result will be saved in 3D matrix form as final_model.h5 or final_model.dat, thus it is not necessary to do post-processing for the final result.

HDF5 I/O mode#

In HDF5 mode, the code will carry out collective writing from all MPI processes into one single output file, which will try to maximize the I/O bandwidth for efficient I/O.

TomoATT produces output files like below:

  • out_data_grid.h5 : grid coordinate and connectivity data

  • out_data_sim.h5 : field data

  • out_data_sim.xmf : XDMF index data for visualizing 3D data. This may be open by Paraview.

  • final_model.h5 : final model parameters

Travel time field for i-th source may be visualized by reading OUTPUT_FILES/out_data_sim_i.xmf. All the inversed parameters from all the sources and receivers are saved in out_data_sim_0.xmf.

Internal composition of .h5 data may be indicated by h5ls -r command. The composition of out_data_grid.h5 is :

/                        Group
/Mesh                    Group
/Mesh/elem_conn          Dataset {21609, 9}  # node connectivity used for visualization
/Mesh/node_coords_p      Dataset {26010}     # longitude [degree]
/Mesh/node_coords_r      Dataset {26010}     # radious [km]
/Mesh/node_coords_t      Dataset {26010}     # latiude [degree]
/Mesh/node_coords_x      Dataset {26010}     # xyz coordinate in WGS84
/Mesh/node_coords_y      Dataset {26010}
/Mesh/node_coords_z      Dataset {26010}
/Mesh/procid             Dataset {26010}     # mpi processor id

out_data_sim.h5 is :

/model                   Group
/model/eta_inv_000j      Dataset {25000} # eta at j-th inversion
/model/xi_inv_000j       Dataset {25000} # xi
/model/vel_inv_000j      Dataset {25000} # velocity field at j-th inversion

then final_model.h5 is :

eta                      Dataset {10, 50, 50}
vel                      Dataset {10, 50, 50}
xi                       Dataset {10, 50, 50}

The internal composition of the final_model.h5 is exactly the same with the initial model file. Thus it is possible to use the final model file as the initial model file for the next run by specifying initial_model_file in input parameter file.

ASCII I/O mode#

In ASCII mode, code will be do independent writing, (i.e. each MPI process do I/O process sequencially) in a single output file.

The files that TomoATT creates in OUPUT_FILES directory are :

out_grid_conn.dat                    # node connectivity
out_grid_ptr.dat                     # grid coordinate lon (degree), lat (degree), radius (km)
out_grid_xyz.dat                     # grid coordinate in WGS84
eta_inv_000j.dat                     # eta
xi_inv_000j.dat                      # xi 
vel_inv_000j.dat                     # velocity