SHDOM - Spherical Harmonic Discrete Ordinate Method for 3D Atmospheric Radiative Transfer OVERVIEW This implementation of SHDOM computes polarized monochromatic or spectral band radiative transfer in a one, two, or three-dimensional medium for either collimated solar and/or thermal emission sources of radiation. The properties of the medium can be specified completely generally, i.e. the extinction, single scattering albedo, coefficients of the scattering phase matrix, and temperature for the particular wavelength or spectral band may be specified at each input grid point. SHDOM is superior to Monte Carlo radiative transfer methods when many radiative quantities are desired, e.g. the radiance field across the domain top. Radiances in any direction, hemispheric fluxes, net fluxes, mean radiances, and net flux convergence (related to heating rates) may be output anywhere in the domain. For highly peaked phase functions the delta-M method may be chosen, in which case the radiance is computed with an untruncated phase function single scattering correction. A correlated k-distribution approach is used for the integration over a spectral band. There may be uniform or spatially variable Lambertian reflection and emission from the ground surface. Several types of bidirectional reflection distribution functions (BRDF) for the surface are implemented, and more may be added easily. SHDOM may be run on a single processor or on multiple processors (e.g. an SMP machine or a cluster) using the Message Passing Interface (MPI). SHDOM may be run efficiently for unpolarized (scalar) radiative transfer by choosing NSTOKES=1, or for polarized (vector) radiative transfer problems with NSTOKES=3 or 4 Stokes parameters. SHDOM is implemented in Fortran, and a Fortran 90 compiler is now required. METHOD IN BRIEF SHDOM uses an iterative process to compute the source function (including the scattering integral) on a grid of points in space. The angular part of the source function is represented with a spherical harmonic expansion. Solving for the source function instead of the radiance field saves memory, because there are often parts of a medium where the source function is zero or angularly very smooth (hence few spherical harmonic terms). The other reason for using spherical harmonics is that the scattering integral is more efficiently computed than in discrete ordinates. A discrete ordinate representation is used in the solution process because the streaming of radiation is more physically (and correctly) computed in this way. An adaptive grid that chooses where to put grid points is useful in atmospheric radiative transfer because the source function is usually rapidly varying in some regions and slowly varying in others. The iterative method is equivalent to a successive order approach. For each iteration 1) the source function is transformed to discrete ordinates at every grid point, 2) the integral form of the radiative transfer equation is used to compute the discrete ordinate radiance at every grid point, 3) the radiance is transformed back to spherical harmonics, and 4) the new source function is computed from the radiance in spherical harmonics. As with all order of scattering methods the number of iterations increases with the single scattering albedo and optical depth. A sequence acceleration method is used to speed up convergence, which is typically achieved in under 50 iterations. During the solution process, the grid cells with the integral of the source function difference above a certain limit are split in half, generating new grid points. The number of spherical harmonic terms kept at each grid point also changes as the iterations proceed (i.e. adaptive spherical harmonic truncation). TERMINOLOGY The medium properties (extinction, single scattering albedo, expansion coefficients of the phase function/phase matrix elements, temperature) are input at grid points from a "property file". These fields are defined spatially (X, Y, and Z spacing) by the "property grid", which is defined by the input file and is not the actual spatial grid used for computatations. The internal grid used for computation has two components: the "base grid" is regularly spaced (even in X and Y, even or not in Z) and is defined by input parameters NX, NY, NZ, and GRIDTYPE. The "adaptive grid" is all the grid cells including those defined by subdividing the base grid cells. New cells are made by "cell splitting", which is the process of figuring out which cells need to be subdivided and dividing them in half to create new grid points. The "sequence acceleration" is a method to reduce the number of iterations by extrapolating the source function to where it is hoped the ultimate solution is. The iterations proceed until the "solution criterion" is reached, when the source function is changing by sufficiently small amounts that convergence is declared. The "spherical harmonic terms" are the series expansion of the angular dependence of the source function. The angular dependence is also represented by "discrete ordinates", which are discrete directions (mu_i, phi_i) at which the source function or radiance is specified. Angles are specified by mu, which is the cosine of the zenith angle, and phi, which is the azimuth angle, counter-clockwise from the X axis. "Delta-M scaling" is a standard method of reducing the forward scattering peak in the phase function, by scaling the phase function, single scattering albedo, and extinction so the solution to the radiative transfer equation is the same. The "independent pixel" approximation is a standard method of solving for horizontal domain averaged radiative properties of an inhomogeneous medium by averaging the results of 1D radiative transfer on separate columns. POLARIZATION Polarization in SHDOM has been implemented with the standard Stokes parameters: I, Q, U, and V. I is the regular radiance or intensity, Q and U measured the amount and orientation of linear polarization, and V measures the amount and sign of circular polarization. The particular definitions of the Stokes parameters are as in Mishchenko and Travis 1997. The Q and U parameters depend on their reference plane, which rotates from the incoming meridional plane, to the scattering plane, and then to the outgoing meridional plane. This coupling between Q and U causes most of the complexity in implementing polarization in spherical harmonics. The spherical harmonics basis implemented in SHDOM is the "real generalized spherical harmonics" of Doicu et al. 2013. The key subroutines implementing this spherical harmonics basis were adapted from a research code kindly provided by Adrian Doicu. Even for unpolarized calculations the spherical harmonics terms are different in the new polarized SHDOM code from the old SHDOM code (due to a different azimuth angle basis). For polarized radiative transfer with randomly oriented particles having a plane of symmetry, there are six unique elements of the 4x4 scattering phase matrix. One aspect of using the real generalized spherical harmonics basis is that the phase matrix elements as a function of scattering angle are expressed in Wigner d-function series. For the I-I (P11) and V-V (P44) elements of the phase matrix these series are the same as Legendre series, while for the other four elements (P22, P33, P12, P34) the Wigner d-function series are not equivalent to Legendre series. There is a new polarized property file input format for SHDOM, which is based on the tabulated phase function format, but including the coefficients for the six Wigner d-function series expansions of the phase matrix elements. The old SHDOM property file formats may still be used for unpolarized (NSTOKE=1) calculations. The SHDOM output types that output polarized results are radiance (R), visualization images (V), and source function (J). The hemispheric flux (F), heating rate (H), spherical harmonics (S), and medium properties (M) output unpolarized results as before. An unpolarized radiative transfer calculation is chosen with NSTOKES=1, in which case special subroutines are used so there is little loss of efficiency compared to the old (unpolarized) SHDOM for 3D problems. Polarized radiative transfer is chosen with NSTOKES=3 for I, Q, U or NSTOKES=4 for I, Q, U, V Stokes parameters. For most atmospheric radiative transfer problems the NSTOKES=3 approximation is very good because V is very small. INPUT PARAMETERS The parameters are input in the routine USER_INPUT or NAMELIST_INPUT. USER_INPUT uses READ statements from stdin in response to written messages. Because not all parameters are relevant at once, the complete list is not input. Run shdom interactively for the actual sequence. The USER_INPUT routine echos the input values so that a useful "log" file output is produced when run non-interactively. To use the namelist style input, comment in the call to NAMELIST_INPUT in the main routine. On some computers the NAMELIST input requires a space before the parameters. All file name strings are a maximum of 80 characters. Parameter Description RUNNAME label for the run (also for multiple processor log file names) PROPFILE name of the input medium property file SFCFILE name of the input surface property file (or NONE) CKDFILE name of the input correlated k-distribution file (or NONE) INSAVEFILE name of the input binary save file (or NONE) OUTSAVEFILE name of the output binary save file (or NONE) NSTOKES number of Stokes parameters (1 for I, i.e. unpolarized; 3 for I, Q, U; or 4 for I, Q, U, V) NX, NY, NZ base grid size in X, Y and Z (NZ is one more than the number grid cells vertically; NX and NY are the number of grid points horizontally; for periodic boundaries there is actually an extra plane of grid points on the horizontal boundaries) NMU, NPHI number of discrete ordinates covering -10 is upwelling and mu must not be zero), and phi in degrees (0 is exitting in +X direction) V camera mode: 1, nbytes, scale, X,Y,Z, theta, phi, rotang, nlines, nsamps, delline, delsamp nbytes is number of bytes per pixel in PGM output images (nbytes=2 outputs most significant byte first, -2 outputs LSB first) scale is the scaling from radiance to pixel value in image X, Y, Z is the camera location theta is the zenith angle [deg] of the pointing direction phi is the azimuth angle [deg] of the pointing direction rotang is the angle [deg] camera is rotated around pointing nlines is the number of pixels in vertical nsamps is the number of pixels in horizontal delline is the angle [deg] between pixels in vertical delsamp is the angle [deg] between pixels in horizontal V cross track mode: 2, nbytes, scale, X1,Y1,Z1, X2,Y2,Z2, spacing, scan1, scan2, delscan nbytes is number of bytes per pixel in PGM output images scale is the scaling from radiance to pixel value in image X1, Y1, Z1 is the starting camera location X2, Y2, Z2 is the ending camera location spacing is the along track distance between scans scan1 is the min cross track scan angle [deg] (left side) scan2 is the max cross track scan angle [deg] (right side) delscan is the angle [deg] between pixels in the scan F Format [, Z level, X spacing, Y spacing] Format: 1 - flux at top (up) and bottom (down) of medium at horizontal base grid locations 2 - up and down flux at Z level height at locations with X and Y horizontal spacing, 3 - domain averaged vertical profile, 4 - output at every base grid point. 5 - output at every grid point. Only format 2 inputs the parameters after the format. H Format 1 - domain averaged vertical profile. 2 - output every base grid point: -Div(Fnet). 3 - output every grid point S Format 1 - output every base grid point 2 - output every grid point Output is mean radiance, X, Y, and Z net fluxes. J Format, mu, phi 1 - output every base grid point 2 - output every grid point Source function in direction mu, phi(degrees), mu may be 0. M Format 1 - output every base grid point 2 - output every grid point Output is medium properties: extinction, single scatter albedo, asymmetry parameter, temperature. OUTFILES(*) output file name (or NONE for no ascii file) OutFileNC netcdf output file name (or 'NONE' or '' for no netcdf file). The netcdf file has all NUMOUT outputs for the OUTTYPES(*) specified above. Memory parameters: MAX_TOTAL_MB approximate maximum memory to use (MB for 4 byte reals) ADAPT_GRID_FACTOR ratio of total grid points to base grid points NUM_SH_TERM_FACTOR ratio of average number of spherical harmonic terms to total possible (NLM) CELL_TO_POINT_RATIO ratio of number of grid cells to grid points CHOOSING INPUT PARAMETERS See Unix scripts included with the distribution for examples of running SHDOM. Also read propgen.txt for how to prepare optical property files. The main difficulty in choosing parameters is determining the desired tradeoff between accuracy and computation cost. The angular resolution parameters (NMU, NPHI, SHACC) and the spatial resolution parameters (NX, NY, NZ, SPLITACC) are the primary ones that determine both the accuracy and the running time. See ACCURACY ISSUES for a discussion of this. NMU must be even (rounded up internally). For NPHI greater than about 14, SHDOM will run faster if NPHI has small prime factors, due to the FFT part of the spherical harmonic transform. The base grid size is often the same size as the property grid, but can have more grid points for higher spatial resolution or even can have less grid points than the property grid. PROPFILE always needs to be specified, but INSAVEFILE and OUTSAVEFILE seldom do, and CKDFILE is only for doing a k-distribution. SFCFILE is given for a Lambertian surface with variable albedo or temperature or a different BRDF, but otherwise is NONE or NO. Set BCFLAG=0 for periodic boundary conditions, and BCFLAG=3 for open boundary conditions. See boundary condition section for explanation. Set IPFLAG=0 for normal radiative transfer. IPFLAG=3 is for the independent pixel approximation. The independent scanline approximation is 2D radiative transfer in the solar plane for a 3D medium; it can be chosen by setting the solar azimuth to 0 and IPFLAG=2 (or SOLARAZ=90 degrees and IPFLAG=1). Bit 2 of IPFLAG can be set to try an approximation where the direct beam is computed in 3D, but the diffuse transfer is done on indendent pixels or 2D slices. Note, IPFLAG=4 is the same as IPFLAG=0. Generally for solar (collimated) problems with highly peaked phase functions DELTAM should be set to True; this is because the phase function must be truncated because a limited number of discrete ordinates can be used. If you want the internal base grid to have the same height levels as the input property grid then GRIDTYPE='P'. This option allows the vertical grid spacing to be nonuniform, e.g. more in a cloud layer and less in clear sky. Another way to have the vertical base grid not be uniform is the GRIDTYPE='F' (file) option which will read the Z levels free format from a file with the name "zgrid.inp". The other alternative is to have the base grid levels evenly spaced between the bottom and top of the domain (first and last level in property grid). For an even grid GRIDTYPE='E', but keep in mind that there are NZ-1 grid cells, so that for doubling the vertical resolution use NZ=2*(NZP-1)+1 where NZP is the number of grid points vertically in the property file. The SKYRAD parameter is usually 0, but is nonzero when the problem includes diffuse radiation incident on the top boundary. This could be the cosmic background radiation for microwave wavelengths, or diffuse skylight for solar problems where the domain top is actually in the lower atmosphere. The UNITS parameter is only relevant for thermal emission without a k-distribution. The solution acceleration is usually turned on (ACCELFLAG=.TRUE.). If the number of iterations is small because the problem is optically thin or highly absorbing, then the sequence acceleration can be turned off, and the memory for the DELSOURCE array freed up (by giving the array a length of 1). The solution accuracy (SOLACC) is the rms change in the source function during an iteration normalized by the rms of the source function. There is no need to make this much, much smaller than the general level of accuracy (related to the spatial and angular resolution). Because it is the rms change in an iteration, it should be smaller than the desired accuracy. Try starting with SOLACC=1E-4. See section on ARRAY SIZES AND MEMORY MANAGEMENT for how to choose the four memory parameters (MAX_TOTAL_MB, ADAPT_GRID_FACTOR, NUM_SH_TERM_FACTOR, CELL_TO_POINT_RATIO) used in shdom.f90 or to adjust the array sizes in shdom.f to accomodate your problem size and computer memory. The four memory parameters do not affect the SHDOM solution (unless SHDOM runs out of memory); they only control the array sizes. See section on PROPERTY FILE FORMAT for how the medium properties are input. The medium properties are specified at points on a grid and are assumed to vary linearly in between. This is different from how Monte Carlo radiative transfer codes usually define the medium, which is by grid cells of uniform properties. The medium properties in the model vary trilinearly across the grid cells. The internal representation of the medium (on the base grid) is trilinearly interpolated from the input properties of the medium. So if a sharp boundary is desired in the medium (e.g. at cloud top) it is a good idea to have the two different optical properties (e.g. cloud and clear) specified at two very close grid levels. Otherwise the linear interpolation of the medium will smear the transition over the resolution of the input property file (nature is seldom perfectly sharp). SURFACE REFLECTION MODELS There are currently five types of surface reflection: Lambertian, RPV-original, Ocean, Wave-Fresnel, and Diner. The RPV and Ocean reflection models are unpolarized, so NSTOKES must be 1. Wave-Fresnel is used for polarized ocean modeling, and Diner for polarized land surface modeling. It is simple to add other surface types (see SOURCE CODE OPTIONS). A Lambertian surface reflects radiance isotropically, and is specified with the surface albedo. The RPV parameterization is used to model the general bidirectional reflectance function (BRF) characteristics of land surfaces (soils and vegetation) with three parameters (rho0, k, Theta) (Rahman, Pinty, Verstraete, 1993: J. Geophys. Res., 98, 20791). The Ocean BRDF parameterization models the glint specular reflection averaged over wave facets and the internal reflection from sea water, and is specified by the wind speed and chlorophyl-alpha pigment concentration. The Ocean reflectance model is obtained from the 6S model and modified by Norman Loeb at NASA Langley. High SHDOM angular resolution (Nmu, Nphi) is needed at low wind speeds to resolve the broadened specular reflection peak, so it is important to do a test with increasing angular resolution until the reflection is stable. The Wave-Fresnel model calculates the polarized Fresnel reflection from a dielectric interface with a Gaussian distribution of slopes, including shadowing. The complex index of refraction of the dielectric (negative imaginary part for an absorbing medium) and the wind speed are input. The mean square surface slope (s^2 or SIGMA2 in Mishchenko and Travis) is obtained from the Cox and Munk formula in the paper. (Mishchenko, M. I., and L. D. Travis, 1997: Satellite retrieval of aerosol properties over the ocean using polarization as well as intensity of reflected sunlight. J. Geophys. Res. 102, 16989-17013.) As with the Ocean model it is important to do a test with increasing angular resolution until the reflection is stable for low wind speeds, due to the nearly specular sunglint peak. The Diner et al. model is used to calculate polarized reflection from land surfaces. It has two terms: 1) completely depolarizing volumetric scattering with the modified RPV model, and 2) Fresnel reflection from randomly oriented microfacets with either a uniform or Gaussian distribution of slopes. The three parameters of the modified-RPV model are A, K, and B. The weight given to the Fresnel-microfacet term is ZETA (if ZETA=0 then only the REFLECT(1,1) element is non-zero with the modified-RPV model). SIGMA is the standard deviation of microfacet slopes for the Gaussian orientation distribution, but if SIGMA<=0 the uniform orientation distribution is used. The complex index of refraction of the Fresnel microfacets is fixed in the subroutine at m=1.5+0i. The Diner et al. (2012) model leaves out the hotspot factor of the modified RPV model (and the original RPV), but the hotspot factor is included here. The model is described in Diner, D. J., F. Xu, J. V. Martonchik, B. E. Rheingans, S. Geier, V. M. Jovanovic, A. Davis, R. A. Chipman, S. C. McClain, 2012: Exploration of a polarized surface bidirectional reflectance model using the ground-based multiangle spectropolarimetric imager. Atmosphere 2012, 3, 591-619; doi:10.3390/atmos3040591. The modified RPV model is from Martonchik et al., 1998 (IEEE-TGARS, 36, 1266). See the SURFACE FILE FORMAT section for how to specify the parameters. The temperature and reflection properties of these models may vary arbitrarily across the surface. If the surface reflection parameters in the surface file are constant, and thus the surface has horizontally uniform reflection, then the surface reflection matrix is precomputed for the discrete ordinate incident and outgoing directions, which can save considerable computation. The temperature does not need to be uniform for this. Lambertian surfaces are always computationally efficient, and so this precomputation is not performed in that case. BOUNDARY CONDITIONS The upper boundary condition is open, i.e. there is no reflection from the top of the domain. Isotropic incident radiance at the top boundary may be specified. The surface reflection models are described above. There are two options for the horizontal boundary conditions. The first is periodic and the second is open. These are specified for X and Y directions independently using the BCFLAG: bit 0 for X and bit 1 for Y. Thus BCFLAG=0 is periodic in X and Y, BCFLAG=1 is open in X and periodic in Y, and BCFLAG=3 is open in X and Y. The independent pixel flag (IPFLAG) overrides the boundary condition flag, since an independent pixel does not have a horizontal boundary condition. Periodic boundaries mean that when rays exit one side they wrap around to enter the opposite side. The periodic domain size is NX*DELX by NY*DELY; for example if NX=64 and DELX=0.050 km, then X=3.2 km is equivalent to X=0 km (left hand side). The open horizontal boundary condition means that radiation leaving the sides does not wrap around and is not reflected back in. However, there may be radiation entering the sides. For open BCs the property file grid points on the edges are used in independent pixel mode to compute the boundary radiation. For example, in 2D (X and Z) the first and last X property grid columns are computed with plane-parallel (IPA) radiative transfer. These grid columns then provide the boundary radiances for the interior of the domain. The interior domain is (NX-1)*DELX wide. In 3D the four corner columns are computed with 1D (plane-parallel) radiative transfer, while the four side slices are done with 2D radiative transfer. If you want the boundary columns to produce plane-parallel results, simply make sure the input optical properties for the columns on the boundaries are the same. The radiance output calculation can be done for locations beyond the horizontal domain; this is the purpose of the "X and Y offsets" in the radiance output specification. For the open boundary conditions the rays are traced back through the regions beyond the domain using plane-parallel radiative transfer, and then may enter the domain. This allows the lower part of the domain to be seen through the domain "sides" for slanted paths. Open boundary conditions are not allowed with MPI. GRID CELL STRUCTURE To implement the adaptive grid there is a data structure that specifies the relation between grid cells and grid points. Grid cells are rectangular volumes (e.g. cubes in 3D, rectangles in 2D). Grid points are the points at which the medium properties and the radiances are defined. Each grid cell has 8 grid points at its corners (in 1D and 2D there are still 8 pointers to grid points though they are redundant). Grid cells have pointers to neighboring cells, so that rays may be traced from cell to cell. The adaptive grid is implemented by splitting cells in half (in any direction) and using pointers to parent and child cells. See VARIABLES section for details. For periodic boundary conditions there are NX*NY*(NZ-1) grid cells and (NX+1)*(NY+1)*NZ grid points. The IX=NX+1 column is the same as the IX=1 column, because of the periodicity, but the grid points are repeated. The last grid cell on the "right" has the last column of grid points on the "left" and the repeated "virtual" set of grid points on the "right". For open boundary conditions in X and Y there are (NX+1)*(NY+1)*(NZ-1) grid cells and NX*NY*NZ grid points. Open boundary conditions are done with the lowest "left most" and "right most" X and Y columns being independent "pixels" with their own grid cells (zero width), therefore one extra grid cell is required. Independent pixel mode has NX*NY*(NZ-1) grid cells and NX*NY*NZ grid points OUTPUT FORMATS The output goes to a number of plain text files (except for the visualization and netcdf output), each containing a header listing the input parameters. The header also lists the number of iterations taken, which for a k-distribution is the total number over the distribution sum. There are seven output types: R - radiance, V - visualization, F - hemispheric flux, H - net flux convergence (heating rate), S - spherical harmonic output, J - source function, and M - medium properties. Any number of ascii files maybe output during a single run. A single netcdf output file may be specified in addition to, or instead of, the ascii output files. The netcdf file contains all the output types of the specified ascii files. If the ascii files are not desired, then use NONE for each of the ascii file names. The output in the netcdf file is arranged somewhat differently from the ascii files. For all of the output on a grid, X is the fastest moving index, and Z is the slowest. The total downward flux and downward diffuse flux are output, rather than the downward diffuse and direct flux separately, as in the ascii files. Only those output formats allowed with multiple processors are supported in the netcdf output. Specifically, the V, J, and M formats are not supported, nor is the F 2 format, nor any output on the adaptive grid (i.e. F, H, and S outputs are only on the base grid). All of the information in the header of the ascii files is included as global attributes in the netcdf file. The original unpolarized shdom_netcdf module was written by Robert Pincus (December 2008). The radiance output computes the radiance at a single vertical level for a regular grid of locations for a series of directions. There are two visualization output modes, both of which make binary (P5) PGM format images. The first one simulates a fixed position camera, while the second simulates a cross track scanning instrument flying on an airplane. For the camera mode the inputs are the position and pointing direction of the camera, the number of pixels vertically and horizontally, and the pixel spacing. For the cross track scanning mode the inputs are the starting and ending aircraft positions, the distance along track beween scans, and the range of scan angles (<0 for left side, > 0 for right side). The PGM image format has a human readable ascii header with the SHDOM input parameters listed, followed by the pixel values in straight binary format (from left to right and then top to the bottom of the image). The pixel format may be byte (values 0 to 255) or two byte integers. PGM images may be viewed with many image viewers, such as "display" from ImageMagick. For polarized output with NSTOKES>1, the NSTOKES output images are stacked in the line (Y) direction: I (radiance) on top, then degree of linear polarization (0 to 1), angle of linear polarization (-180 to 180 degrees), and (for NSTOKES=4) degree of circular polarization (-1 to +1). The scaling to the pixel values is done automatically (see the PGM header for the scaling for each polarization component). There are several ways to output hemispheric flux, including at the top and bottom of the domain, on a regular horizontal grid at any level, or for every grid point. The net flux convergence may be output for each grid point or for a domain averaged vertical profile. The spherical harmonic output is the mean radiance and net flux vector (proportional to the first 4 terms in the spherical harmonic expansion) and for solar problems includes the direct beam contribution. The source function output lists the extinction and source function computed for a specified direction at every grid point. This can be useful for use by another program such as to compute limb radiances in a spherical geometry. The medium properties output lists the (potentially delta-M scaled) extinction, single scattering albedo, asymmetry parameter, and temperature at the grid points, and is useful for making sure the medium was input correctly. There is a debugging output file format containing GLE graphics commands to draw the grid cells and arrows showing the neighboring cell relationships for an XZ plane. If this is desired, comment in the call to VISUALIZE_CELLS after the solution iteration loop. BINARY SAVE FILES The source function solution and associated state may be saved in binary files. The adaptive grid cell structure, the source function, upwelling and downwelling fluxes, and lowest order terms of the spherical harmonic radiance field are stored. These files can be used to continue a run to higher accuracy or to save the solution so more output files can be generated (e.g. radiances at more angles). For a k-distribution, the input save file is used for the first k and the output save file is from last k. PROPERTY FILE FORMAT The properties of the medium may be input in four different formats; the first three are for unpolarized radiative transfer and the fourth is for polarized calculations. 1) the "standard" format specifies the extinction, scattering albedo, temperature, and Legendre phase function expansion at every grid point; 2) the extinction only format only specifies extinction at every grid point, while the other quantities are given in the header part; 3) the tabulated phase function format specifies all properties except the phase function for each grid point, while many phase functions are tabulated in the header; and 4) the polarized tabulated phase function format tabulates the six Wigner d-function expansion coefficients for each phase matrix in the header. All file types are plain ascii and read by Fortran free format (so most "lines" may be broken across lines). With a small code change (commenting in calls to read_property_size_netcdf and read_properties_netcdf in shdom.f90) the two tabulated phase function format property files may be read in netcdf format. Note: Chi1 is 3 times the asymmetry parameter. The standard file format has two header lines: the first line has the size of the grid; the second line contains the X and Y grid spacing and the Z levels. There is then one line per grid point giving the properties. The format is as follows: Nx Ny Nz delX delY Z1 ... Zn IX IY IZ Temp Extinct Albedo NumL Chi1 . . . ChiL . . . where IX, IY, and IZ are the spatial indices (1,...,Nx, etc.), Temp is the grid point temperature in Kelvin, Extinct is the extinction in units inverse of the grid spacing units, Albedo is the single scattering albedo, NumL is the Legendre series degree, and Chi(n) are the coefficients of the Legendre expansion of the phase function. Chi1 is 3*g, where g is the asymmetry parameter. The extinction only format has five header lines: the first line must begin with 'E'; the second has the grid size; the third has the grid spacing and Z levels; the fourth has the temperatures in Kelvin of the Z levels; the fifth has the single scattering albedo, the degree of the Legendre expansion of the phase function, and then the Legendre coefficients. There is one line for every grid point containing the grid point indices and the extinction; if the NY is 1 then only the X and Z indices are given. The format is as follows: E Nx Ny Nz delX delY Z1 ... Zn T1 ... Tn Albedo NumL Chi1 ... ChiL IX IZ Extinct or IX IY IZ Extinct . . . The tabulated phase function format has 4+NUMPHASE header lines, where NUMPHASE is the number of phase functions: the first line must begin with 'T'; the second has the grid size; the third has the grid spacing and Z levels; the fourth has the number of phase functions; and the following "lines" have the phase functions containing the degree of the Legendre expansion and then the Legendre coefficients. There is one line for every grid point containing the grid point indices, the temperature (Kelvin), the extinction, scattering albedo, and phase function index (1 to NUMPHASE) to the table in the header. The format is as follows: T Nx Ny Nz delX delY Z1 ... Zn numphase NumL Chi1 ... ChiL (for each phase function) . . . IX IY IZ Temp Extinct Albedo Iphase The polarized tabulated phase function format is the same as the tabulated phase function format, but has has 4+6*NUMPHASE header lines: the first line must begin with 'P'; the second has the grid size; the third has the grid spacing and Z levels; the fourth has the number of phase functions; and the rest of the header has six "lines" for each phase matrix. These six lines start with number 1 to 6, indicating which Wigner series, then the degree of the Wigner expansion, followed by the Wigner d-function coefficients, starting with the zeroth term (i.e. Chi0, not Chi1 as for the other formats). The first Z level is the bottom surface and the last is the top surface. The Legendre coefficients are normalized so that Chi0 (which is not listed) is always 1. The phase function coefficients do not actually need to be all on one line; in fact, propgen uses multiple lines for phase functions with more than 200 terms. SURFACE FILE FORMAT There are currently five types of surface property files: Lambertian, RPV-original, Ocean, Wave-Fresnel, and Diner. The file format allows the temperature and reflection properties to vary arbitrarily across the surface. The surface properties are specified on a regular, evenly spaced grid. There should be Nx*Ny points in the surface file. The properties are bilinearly interpolated to the domain bottom grid point locations. The surface domain is also periodic in X and Y, so that the domain is delX*Nx by delY*Ny in size. Thus there are Nx cells across, and the routine that reads in the surface properties assigns the properties of the left side of the domain (IX=1) to the right side of the last cell (IX=NX+1). A uniform surface can be specified by having Nx=1 and Ny=1 and making delX and delY large enough to cover the whole domain. The Lambertian surface file format is as follows: L Nx Ny delX delY IX IY SfcTemp SfcAlbedo . . . The RPV surface is specified with three parameters (rho0, k, Theta): R Nx Ny delX delY IX IY SfcTemp rho0 k Theta . . . The Ocean surface is specified with the near surface wind speed (m/s) and the chlorophyl-alpha pigment concentration (mg/m^3): O Nx Ny delX delY IX IY SfcTemp WindSpeed PigmentConc . . . The Wave-Fresnel surface is specified with the complex index of refraction of the dielectric surface (Mre, Mim) and the near surface wind speed (m/s): W Nx Ny delX delY IX IY SfcTemp Mre Mim WindSpeed . . . The Diner surface is specified with five parameters (a, k, b, zeta, sigma) R Nx Ny delX delY IX IY SfcTemp a k b zeta sigma . . . The IX and IY are the surface grid numbers (1 to Nx, 1 to Ny). The surface temperature is specified in Kelvin. CORRELATED K-DISTRIBUTION FILE FORMAT The CKD file has the correlated k-distribution information for a series of spectral bands for a particular atmospheric profile. This file is made by another program from a profile. Each band, i.e. a range of wavenumbers, has a k-distribution associated with it. The k-distibution is performed as a discrete sum over probability intervals (the delta g in the notation of Fu and Liou, 1993). A radiative transfer calculation is done for each "k" in the sum, which represents the gaseous absorption. The distribution of gases in the atmosphere is assumed to horizontally uniform, and so the absorption coefficients depend only on height. Thus the CKD file specifies the absorption coefficient (in km^-1) for each height, "k" in the sum, and each band. The delta g's are the weights with which the separate radiative transfer results are added together. There are three sections in the CKD file. 1. The band information. 2. The atmosphere information. 3. The absorption coefficients for each band and g range. The format is: Comment line Number of bands Comment line For each band: band num, waveno1, waveno2, solar flux, number of k's, all the delta g's Number of z levels Two comment lines For each atmospheric level from the top down: height, (other profile information is not used) Comment line For each band and level: band number, level number, all the k_i's Notes: waveno1 and waveno2 are the starting and ending wave numbers in cm^-1 of the bands; solar flux is the top of the atmosphere solar irradiance in W/m^2 for overhead sun. The level numbers start with 1 at the top of the atmosphere. The levels in the CKD file are completely independent of the levels in the SHDOM run, though they should cover the SHDOM levels. The k-distributions should start with the lowest k's (they are used in the reverse order in the program). DETAILED METHOD OF OPERATION/REFERENCES See the journal papers: Evans, K. F., 1998: The spherical harmonic discrete ordinate method for three-dimensional atmospheric radiative transfer. J. Atmos. Sci., 55, 429-446. Pincus, R., and K. F. Evans, 2009: Computational cost and accuracy in calculating three-dimensional radiative transfer: Results for new implementations of Monte Carlo and SHDOM. J. Atmos. Sci., 66, 3131-3146. Doicu, A., D. Efremenko, T. Trautmann, 2013: A multi-dimensional vector spherical harmonics discrete ordinate method for atmospheric radiative transfer. J. Quant. Spectrosc. Radiat. Transfer, 118, 121-131. Mishchenko, M. I., and L. D. Travis, 1997: Satellite retrieval of aerosol properties over the ocean using polarization as well as intensity of reflected sunlight. J. Geophys. Res. 102, 16989-17013. OPERATING STRATEGY The first step is to compile the code. A makefile is provided, with options to compile with or without MPI (for multiple processors) and with or without the netcdf library. SHDOM was originally written in Fortran 77 with common extensions. Later, parts of SHDOM were written in Fortran 90, and now all of the SHDOM-specific code (shdom.f90 and shdomsub?.f) requires a Fortran 90 compiler. Fortran 90 uses allocatable arrays and frees the user from having to set most of the parameters controlling the array sizes. While a few algorithm and coding bugs may be left in the code, the most common source of errors is incorrect input. Be sure to check the input parameters against the documentation and the USER_INPUT routine. The correctness of the input property file format can be found by using the 'M' - medium property output option. There are routines that check the input parameters and property file for validity and issue warnings for common mistakes. Conservation of energy can be used as a check for conservatively scattering medium, but remember that the multiple reflections between the surface and medium will introduce "extra" flux. The collimated (solar) flux input parameter is the flux on a horizontal surface (rather than the flux perpendicular to the beam), so for conservative scattering and no surface reflection the reflected plus transmitted flux over the domain should equal this parameter. The delta-M scaling procedure is very useful for reducing the angular resolution needed. It is generally used for collimated (solar) source problems. One should remember that the delta-M scaling procedure changes the extinction, single scattering albedo, and Legendre phase function fields, as can be seen with the medium output option. This means that the direct solar flux is increased with delta-M scaling and must be added to the diffuse transmitted flux for meaningful results. For putting in the rest of the atmosphere: a coarse grid outside the cloud region works because the rest of the atmosphere does not interact much with the cloud layer (little reflection back to the cloud layer). The fluxes, which depend on the cell-to-cell discrete ordinate integration, will not be as reliable above the cloud level. The radiances, however, which are computed by integrating the source function back all the way to the boundary will be fine at any level. Output during Iterations During the iterations, some useful information is output to stdout. Six terms are output on a line each iteration: Iterations, Log10(solution criterion), Cell splitting criterion, Number of grid points, Average number of SH terms, Ratio of number of SH terms to total possible Using these parameters you can track the convergence towards the solution and the behavior of the adaptive grid process. As the current cell splitting accuracy parameter is reduced the number of grid points increases and the cell splitting criterion decreases. The cell splitting criterion is the maximum over the whole medium. The adaptive spherical harmonic truncation may also be monitored with the average number of SH terms. The solution criterion often does not converge monotonically. When the cell splitting accuracy parameter is lowered and many adaptive cells are created the solution criterion typically increases. The sequence acceleration method may cause the solution criterion to increase at a particular step, though the overall convergence is usually speeded. The adaptive spherical harmonic truncation (parameter SHACC) may also interfer with a smooth convergence. If you have trouble with convergence try adusting one of these parameters (SPLITACC, SHACC, ACCEL). ACCURACY ISSUES Like all numerical solutions to the radiative transfer equation, the operation of SHDOM entails a tradeoff between computational speed and accuracy. One difference between 1D and 3D radiative transfer is that it is very expensive to get very high accuracy (better than 1%) in 3D. This is a result of the five dimensional nature of the problem. This is also the case for Monte Carlo methods, but it is worse for explicit radiative transfer methods. The strategy issues for running SHDOM involve how to achieve a required accuracy in a minimum CPU time. The grid spacing needed is determined by some assumptions of SHDOM: 1) the variation in the extinction and the product of the source function and extinction across a grid cell is linear, 2) the source function across a grid face is accurately interpolated with the bilinear method, and 3) the radiance across a grid face is accurately interpolated with the bilinear method. The first and second assumptions should be satisfied by the source function based adaptive cell splitting algorithm, and also would be by having the optical depth across the grid cells small compared to one. The third assumption is satisfied by not using too coarse a base grid, and probably only matters as it affects the computation of the source function. It can make sense to have a coarse base grid where there is little scattering (clear sky), but then the output fluxes in those regions will be of low accuracy, though the output radiances (computed by tracing back) will be good. Because the accuracy can be compromised if the optical depth across a grid cell is too large, a warning is given if the property grid has cells with optical depth larger than 2. It is the internal grid that matters, but most people use the same grid so the check is done there. There are pathological cases in which a completely erroneous solar radiative transfer solution can occur if the optical depth across the cloud top grid cell is very large. The accuracy is also determined by the angular resolution, specified by the number of discrete ordinates (NMU and NPHI). To a lesser degree the adaptive spherical harmonic truncation parameter SHACC also determines the angular resolution. Most of the benefit from the adaptive truncation occurs for low values of SHACC, so it is recommended to keep this parameter small. A value of SHACC=0 still gives the benefit of not allocating source function spherical harmonic terms where there is no scattering. Then NMU and NPHI will determine the angular resolution. The angular resolution needed depends on the desired output: for solar beam problems, net flux convergence (heating rates) requires the least, followed by hemispheric fluxes, and then radiances. For heating rates Nmu=8, Nphi=16 is enough, for hemispheric fluxes Nmu=8, Nphi=16 is often adequate, and radiances may need Nmu=16, Nphi=32 or more depending on the phase function. Choosing the cell splitting accuracy: Remember that the adaptive parameters are in absolute units, not relative! The actual accuracy is not linearly related to the cell splitting accuracy. The cell splitting accuracy usually should be set substantially higher than the desired absolute flux accuracy. Look at the examples in the 1998 journal paper for guidance. The only sure way to know how to set the spatial and angular resolution is to test how the desired solution behaves with increasing resolution, or make selected comparisons with an independent radiative transfer method. It is not useful to increase the angular and spatial accuracy much beyond one another. Tests have shown that the accuracy is not controlled by the cell splitting accuracy alone; both the base grid (NX,NY,NZ) and the splitting accuracy (SPLITACC) have to be considered. Thus, setting the splitting accuracy so small that the number of adaptive grid points is many times the number of base grid points, will probably not achieve any higher accuracy. Unlike for plane-parallel situations, the delta-M method does not assure high accuracy for downwelling solar flux in 2D and 3D as it does for 1D, thus the distribution of downwelling solar flux may not be accurate for highly peaked phase functions unless high angular resolution is used. This is because in 3D, sharp variations in the extinction field mean that the downwelling solar flux is not averaged over all angles in the same way as for a uniform medium. This limitation appears to not affect upwelling flux, upwelling radiance, or heating rates. Because of the thresholding nature of the cell splitting and the spherical harmonic truncation, very small changes in the inputs can amplify into larger changes in the output results. For example, running the same case on different machines can lead to noticeably different results. These variations, however, should be within the overall accuracy of the results governed by the angular and spatial resolution. Upwelling hemispheric fluxes are slightly more accurately computed by using the double Gaussian quadrature discrete ordinate set. This is choosen by setting ORDINATESET=3 in SOLVE_RTE. With this option, radiances, heating rates, and net fluxes may be slightly less accurate. RUNNING SHDOM WITH MULTIPLE PROCESSORS SHDOM may be run on multiple processors, either in a shared memory system (e.g. SMP) or in a distributed memory system (e.g. a cluster), using the Message Passing Interface (MPI) version 1. All of the SHDOM code that calls MPI routines is in the shdom_mpi.f90 file. A set of equivalent (mostly) dummy routines is provided in shdom_nompi.f, which allows SHDOM to be compiled without an MPI libarary. Using the parallel processing aspects of SHDOM requires that the MPI system be installed on the target computer. Then the usual way to compile and run a program with MPI is: mpif90 -O2 -o shdom shdom.f90 shdom_mpi.f90 shdom_nonetcdf.f90 \ shdomsub1.f shdomsub2.f shdomsub3.f ocean_brdf.f fftpack.f mpirun -np 4 shdom90 < shdom.inp where shdom.inp has the user input parameters and "-np 4" specifies that four processors will be used. The MPI shared memory device (e.g. ch_shmem from the MPICH implementation) will be somewhat more efficient on an SMP machine than a device (e.g. ch_p4) that uses a network communication protocol (such as ssh). The SHDOM computation is divided up on the multiple processors with domain decomposition, meaning that each processor works on a portion of the horizontal domain. MPI routines determine the 2D Cartesian topology that maps the full domain onto the processors, but usually four processors will be mapped to two in X by two in Y. The boundaries between processors are at property grid lines, and the internal base grid points are repeated on both sides of the boundaries. Thus a specified 128x128 base grid (NX=128, NY=128) run on four processors will result in a 65x65xNZ base grid in each processor. The SHDOM output to stdout includes the position of each processor in the horizontal domain. Using SHDOM with multiple processors requires a periodic domain in X and Y; open boundary conditions will be turned off if specified. The BCFLAG parameter (user input for open boundary conditions) is also used internally to indicate if there are multiple processors in the X (BCFLAG bit 2) and Y (BCFLAG bit 3) directions. Thus, multiple processors in both directions will output BCFLAG=12. MPI calls are needed to pass data between neighboring processors for three parts of the SHDOM algorithm: 1) the direct beam solar flux calculation at each internal grid point, 2) the radiative transfer equation integrations along each discrete ordinate during the solution iterations, and 3) the integrations along the viewing directions for the radiance output. MPI calls are also made to move data between the master processor (MPI "rank" 0) and all the other processors. The master processor performs all the file input/output, except for the save/restore files. The binary save/restore files are written and read locally by each processor, using names with the processor number appended to the specified file names. Thus the slave processors do not need to share disk files with the master processor. The master processor reads the user input parameters, and the surface parameter and k-distribution files (if specified) and broadcasts the parameters to the other processors. The master processor reads the property input file and sends the appropriate subdomain of the optical properties to each processor. The master processor also gathers the output (e.g. fluxes, net flux convegence, and radiances) from the other processors and writes the output files. The user input parameters or namelist format is the same whether using multiple processors or using a single processor. The input files for multiprocessing are all the same, but the "standard" format input property files, with the phase function Legendre series specified at every point, are not allowed. The "log" output is written to stdout by the master processor. The other processors write their output to RUNNAME//"???.log" (where RUNNAME is a user input string and ??? is the processor number). The lines written each iteration during the solution procedure are for the subdomain operated on by each processor; therefore the number of grid points decreases as more processors are used. The solution criterion (represented in the "Log(Sol)" column), however, is the global solution criterion, calculated from the source function vectors in all processors (so that all the processors stop at the same iteration). For programming convenience, not all output formats are allowed for multiple processors: only base grid points (no adaptive grid points) are output for F, H, S output files; flux (F) format 2 (i.e. hemispheric fluxes interpolated to a user defined grid on one Z plane) is not allowed; and source function (J) and medium (M) outputs are not supported. The output file formats do not change when using multiple processors. The NPTS, NCELLS, and NSH parameters in the file headers are the sum of the number of grid points, grid cells, and spherical harmonic terms over all the processor subdomains. Running SHDOM on N processors does not mean that the elapsed time will be a factor of N smaller because there is overhead to the parallelization. One aspect of the parallelization overhead is the increase in the number of grid points due to the repeated boundary points. Another aspect is the time needed to pass data between processors using MPI and to wait for the data to arrive at all processors. The third overhead source is that the discrete ordinate radiances (computed during the solution iterations) at *all* the processor boundaries need to be traced back to the previous Z level; while this only occurs at one X and one Y boundary for a single processor. This latter source of overhead is larger for domains with large Z grid spacing and small horizontal grid spacing. All these sources of parallelization overhead are reduced in a fractional sense by having the processor boundary grid points be a smaller fraction of the total number of grid points. Tests on an SMP machine with boundary layer cloud domains show that the overhead is reasonable (<30%) for processor subdomains having 32x32 or more horizontal grid points. SHDOM running on multiple processors does not give exactly the same results as on a single processor due to differences in the discrete ordinate integrations in the iterative solver. The differences, however, are small compared to the overall SHDOM accuracy. Since small computational differences result in different cell splitting patterns, even the number of iterations can change slightly with the number of processors SHDOM uses. For optimal memory use on an SMP machine it is necessary to specify the four memory parameters (MAX_TOTAL_MB, ADAPT_GRID_FACTOR, NUM_SH_TERM_FACTOR, CELL_TO_POINT_RATIO) separately for each processor because a processor subdomain with more cloud will have more adaptive grid points and thus larger values of the memory parameters. When SHDOM is run on multiple processors it tries to read a file with the name RUNNAME//"_mem_params.inp". The format of this file is Comment line Number of processors One line for each processor (0 to Nproc-1) with the four parameters: MAX_TOTAL_MB ADAPT_GRID_FACTOR NUM_SH_TERM_FACTOR CELL_TO_POINT_RATIO If there is no input memory parameter file, or it is for a different number of processors, then the default memory parameters input by the user are used. At the end of the SHDOM computation the actual memory used and the three memory ratios are output to the log file for each processor. These may be used as guidance for selecting the memory parameters for another similar calculation using the same number of processors. A Unix script (run_multproc) illustrating this procedure is in the distribution. If some processors have more computation to do than the others, then there is a loss of efficiency because the other processors spend much of their time waiting. The relative load on each processor in SHDOM is nearly proportional to the total number of grid points (base plus adaptive) in each subdomain. If a substantial fraction of the total number of grid points are adaptive, and the adaptive grid points are very unevenly distributed, then the processor load will be unbalanced. This could occur, for example, if there is high cloud fraction on one side of the full domain, and low cloud fraction on the other side. SHDOM has the ability to perform processor load balancing. Normally, this feature is turned off, but may be used by setting the logical parameter "LoadBalance=.TRUE." at the beginning of shdom_mpi.f90. The load balancing procedure requires that a previous SHDOM run be made with the same property file, so that the total number of grid points can be estimated. This prior run can be done at lower angular resolution (NMU/NPHI) and only needs to iterate until the cell splitting is finished (at SOLACC=1.0E-3). If the load balancing is turned on, the master process of MPI-enabled SHDOM writes a file (in subroutine END_SHDOM_MPI), RUNNAME//"_load_balance.out", with the number of grid points in a column around each property grid point. To implement load balancing the user must copy the appropriate RUNNAME//"_load_balance.out" file to the input file RUNNAME//"_load_balance.inp", which SHDOM reads (in subroutine OPTIMIZE_PROCESSOR_DOMAINS) and uses if the file exists and is compatible with the input property file. SHDOM uses a simulated annealing algorithm to try to equalize the number of grid points in each processor subdomain by adjusting the boundary lines between the processors. If a load balance optimization is performed, each processor writes the optimum boundary lines (in the property grid) to stdout. In a heterogeneous clustered environment, the processors on different machines may have different speeds. If load balancing is being used (by setting LoadBalance=.TRUE.), then another file with the relative speeds of the processors may be input to SHDOM. This file has the name "shdom_proc_times.inp", and lists the relative times (one processor time per line) running SHDOM takes on each processor. It is important to understand the order in which MPI allocates the processors in your cluster. For example, suppose the cluster has two computers, each with two SMP processors. Then the MPI machine file might look like this: fast:2 slow:2 Typically, MPI will allocate the processors as follows: 0 on "fast", 1 on "slow", 2 on "fast", and 3 on "slow". If SHDOM is twice as fast on "fast" as on "slow", then shdom_proc_times.inp could be 1.0 2.0 1.0 2.0 When SHDOM with load balancing is run with four processors on MPI, the Y boundary line will be adjusted so that each processor on computer "fast" has twice the number of grid points as each processor on "slow". SOURCE CODE OPTIONS This is a list of options in the operation of SHDOM that one may select by making trivial code changes. Except for selecting the NAMELIST option, choosing netcdf property file input, adding a new surface type, the higher order radiance terms, or turning on processor load balancing, most of these would only be changed to experiment with the operation of the SHDOM algorithm. Namelist input: main program (shdom.f or shdom90.f90); Comment out the call to USER_INPUT and uncomment the call to NAMELIST_INPUT. NetCDF property file input: main program (shdom90.f90); Comment out calls to READ_PROPERTY_SIZE, and READ_PROPERTIES and uncomment calls to read_property_size_netcdf and read_properties_netcdf. Use the makefile to compile with the netcdf library. Adding surface reflection types: in shdomsub2.f write a BRDF Fortran function modeled on RPV_REFLECTION and add a call in SURFACE_BRDF to the new function. In shdomsub3.f modify subroutine READ_SURFACE by adding a block to read in the new surface type parameters. Discrete ordinate angle set: SOLVE_RTE (shdomsub1.f); The ORDINATESET variable before call to MAKE_ANGLE_SET is the type of angle set: 1 - full Gaussian set, 2 - reduced Gaussian set (default), 3 - reduced double Gaussian. Using the double Gaussian angular set will give more accurate hemispheric fluxes, e.g. from a Lambertian surface. Output rms of higher order radiance spherical harmonic terms: at top of main program (shdom.f); set HIGHORDERRAD=.TRUE. for outputing the rms of the higher order radiance terms (above L=1,M=1) normalized by the mean radiance (S output mode). This is useful for determining regions in the diffusion domain, where the higher order terms are zero. Since all the radiance spherical harmonic terms are kept the running time may be somewhat longer, and the memory use (RADIANCE array) can be much larger. Multiple processor load balancing: Change LoadBalance=.TRUE. at the beginning of shdom_mpi.f90 (see the last paragraph of the multiple processing section). k-distribution initialization: at top of main program (shdom.f); Change BASEOUT = .TRUE. to initialize with and output only the base grid during the iteration over k's, rather than the default of initializing with the previous k's adaptive grid structure. Cell splitting scheduling: SOLVE_RTE (shdomsub1.f); The range of solution criterion during which the adaptive grid cell splitting occurs can be chosen by changing ENDADAPTSOL and STARTADAPTSOL; other aspects of the scheduling can be changed as well. Adaptive grid smoothing: SPLIT_GRID (shdomsub1.f); The call to subroutine GRID_SMOOTH_TEST may be commented out to stop the adaptive grid smoothing. This will produce somewhat fewer adaptive cells, but make a somewhat more irregular grid. Cell splitting visualization: SOLVE_RTE (shdomsub1.f); The subroutine VISUALIZE_CELLS may be commented in after the solution loop to produce a GLE graphics file showing the adaptive cell structure. This can be changed easily for other graphics languagues. The subroutine OUTPUT_CELL_SPLIT may be commented in to output the medium extinction and adaptive cell splitting criterion for X, Y, and Z. Cutoff transmission for back tracing: PATH_INTEGRATION (shdomsub1.f); The TRANSMIN variable can be changed (e.g. from 1.0 to 0.1) to have the back ray tracing continue until the specified transmission is reached. This can eliminate the grid radiance interpolation error, but can take much more CPU time. ARRAY SIZES AND MEMORY MANAGEMENT The main program (shdom.f90) determines the sizes of most arrays and allocates memory for them. Thus the user does not usually have to recompile the program for different problems. As there is no way to determine the OUTPARM array size before calling USER_INPUT, the MAXOUT and MAXPAR constants are still defined. Since the adaptive grid grows during the solution procedure, the maximum size of many arrays can only be guessed at. The approach taken is to define the maximum adaptive grid array sizes in terms of the base grid. Thus the following three parameters must be input: ADAPT_GRID_FACTOR ratio of the max number of internal grid points to the number of base grid points NUM_SH_TERM_FACTOR ratio of average number of spherical harmonic terms to total possible (NLM) CELL_TO_POINT_RATIO ratio of number of cells to number of grid points In addition, to prevent SHDOM from grabbing all the available memory, the maximum memory is specified with MAX_TOTAL_MB, which is the maximum total megabytes SHDOM should take (per processor), where a memory word is assumed to by 4 bytes. The splitting factor is reduced to fit within the total memory limit. The appropriate value of these parameters for a particular problem can be estimated from previous SHDOM output. At the end of a run, shdom outputs the actual values needed for the four memory parameter values followed by the string "Actual MAX_TOTAL_MB, ADAPT_GRID_FACTOR, NUM_SH_TERM_FACTOR, CELL_TO_POINT_RATIO". These values can then be used for the input memory parameters for similar SHDOM runs. The memory parameters may also be estimated from a fast lower angular resolution run (smaller NMU/NPHI), though the MAX_TOTAL_MB will then have to be multiplied by the ratio (high res to low res) of the number of spherical harmonics terms (NLM), and all the memory parameters will need to be increased a little beyond those output from the low resolution run. For NSTOKES=1 and tabulated phase functions the major memory use in 4 byte words is: MEMWORD = 16.5*MAXIC + 33*MAXIG + 3*MAXIV + MAXIDO where MAXIG max number of internal grid points MAXIC max number of grid cells, between 1 and 2 times grid points MAXIV max words for source function and radiance arrays; needed size is average spherical harmonic truncation times number of grid points. MAXIDO max words for temporary discrete ordinate azimuthal array; needed size is max azimuths times number of grid points. For Nmu=8 Nphi=16 the most space needed is MEMWORD = (33+33+192+16)*MAXIG = 274*MAXIG VARIABLES The following is a list of important variables. Variables Description/Comments First the variables used in the adaptive cell structure: NPTS total number of grid points NCELLS total number of cells GRIDPOS(3,IP) x,y,z coordinate values of grid point locations GRIDPTR(8,IC) pointer to 8 grid points belonging to a cell. Order of pointers is based on binary digits zyx: IOCT = 1 + BITX + 2*BITY + 4*BITZ, where BITc is 0 for negative side and 1 for positive side. NEIGHPTR(6,IC) pointer to 6 neighboring cells of a cell. Order of pointers are -X,+X,-Y,+Y,-Z,+Z (1-6). Is 0 if there is no neighbor cell (boundary). Is positive if there is just one cell across face of current cell (unique neighbor). Is negative (-index) if there is more than one cell across the face of current cell. In this case the cell pointed to is the parent cell that has a has same or larger face adjoining current cell. TREEPTR(2,IC) pointer to parent and children cells in tree structure. First is pointer to parent cell, 0 for a base cell. Second is pointer to first of the two children cells. This pointer is 0 if there are no children cells. The children cells are in successive order, with the negative side first. See CELLFLAGS for direction of division of split. CELLFLAGS(IC) a collection of binary flags for the cells: bit For bit set 0 Do independent pixel for X 1 Do independent pixel for Y 2,3 Direction cell is split (0=no split, 1=X, 2=Y, 3=Z) Input variables: PROPFILE name of the input medium property file SFCFILE name of the input surface property file (or NONE) CKDFILE name of the input correlated k-distribution file (or NONE) INSAVEFILE name of the input binary save file (or NONE) OUTSAVEFILE name of the output binary save file (or NONE) NSTOKES number of Stokes parameters (1, 3, or 4) NX, NY, NZ base grid size in X, Y and Z NMU, NPHI number of discrete ordinates covering -10. NPHI0MAX The maximum number of azimuth angles actually used; for NCS=1 (cosine modes only) NPHI0=INT((NPHI+2)/2), otherwise NPHI0=NPHI. NPHI0(NMU) The number of azimuth angles actually used for each zenith angle (for the reduced gaussian set less azimuthal angles are used near zenith and nadir). NSTLEG The number of phase matrix elements used (1 for NSTOKES=1 or 6 for NSTOKES>1); also it is the number of unique elements of the real generalized spherical harmonics matrices. NPX,NPY,NPZ Size of the input property arrays. Need not be same as NX,NY,NZ NPXT, NPYT Size in grid cells of the full input property domain grid for multiple processors NX, NY, NZ Size in grid cells of the internal base grid. NXT, NYT Size in grid cells of the full domain grid for MPI DELX,DELY The horizontal grid spacing of the property arrays. ZLEVELS The input Z levels from the property file. XGRID,YGRID The grid locations for each dimension. To facilitate the ZGRID periodic horizontal boundaries, there are NX+1 values in XGRID, NY+1 in YGRID, NZ values in ZGRID. Currently the grid locations are evenly spaced in X and Y. The variable for the input (variable) surface properties. SFCTYPE The surface type (two characters) is set according to whether there is a surface file and its format: 'FL' fixed Lambertian; 'VL' variable Lambertian, 'VR' variable RPV, etc. NXSFC,NYSFC Size of surface arrays (internally NXSFC+1 by NYSFC+1) DELXSFC,DELYSFC Spacing of regular surface arrays SFCPARMS(NSFCPAR,*) Surface parmeters on regular sfc grid (temperature first) NSFCPAR Number of surface parameters (2 for Lambertian [temp,albedo]) SFCGRIDPARMS(NSFCPAR,*) Surface parameters interpolated to boundary grid points first one is Planck function from interped temperature. NTOPPTS,NBOTPTS Number of boundary grid points at top and bottom BCPTR(*,2) Pointers to grid points that are on top (1) and bottom (2) BCRAD(*) Boundary radiances: first NTOPPTS are isotropic top radiances; next NBOTPTS are upwelling bottom radiances; for general BRDF the next NBOTPTS*NANG/2 are downwelling bottom radiances at discrete ordinates (for doing integral over incident radiance). The arrays with the input property fields. TEMPP,EXTINCTP Temperature (K), extinction (km^-1), ALBEDOP Single scattering albedo LEGENP Phase function: Chi(1)...Chi(NLEG) are in LEGENP(1:NLEG,..). IPHASEP Pointer to phase functions in LEGENP for tabulated phase funcs EXTDIRP Extinction field (perhaps scaled) for direct beam computation The arrays with the internal/adaptive grid fields. EXTINCT Extinction field (scaled if delta-M) ALBEDO Single scattering albedo (scaled if delta-M) TEMP Temperature PLANCK Planck function times (1-ALBEDO) for thermal sources DIRFLUX For solar sources, the direct beam flux F*exp(-tau) (F=solar flux, tau=optical path to sun). LEGEN(NSTLEG,0:NLEG,*) The phase function coefficients Chi(0)...Chi(NLEG) are in LEGEN(1,0:NLEG,) = Chi(l)/(2*l+1), e.g. LEGEN(1,0,)=1. IPHASE Pointer to phase functions in LEGEN for tabulated phase funcs Correlated k-distribution arrays: DELG(NG) k-distribution weights ZCKD Heights read in from CKD file KABS Absorption coefficients (km^-1) for all k's and CKD heights GASABS Absorption coefficients for current k and CKD heights Spherical harmonic arrays (the three biggest ones): See YLMALL routine for the order of the SH terms. These three arrays have adaptive SH terms, and thus require pointers to index into them. The pointers (e.g. SHPTR) point to the element before the first SH term of a grid cell; e.g. SHPTR(I)+1 points to the first SH term of the I'th grid point in SOURCE, and SHPTR(I+1) points to the last SH term of the I'th grid point. SOURCE(NSTOKES,*) Spherical harmonic terms of the source function SHPTR(NPTS) Pointer for SOURCE DELSOURCE(NSTOKES,*) The difference between successive source functions in iteration OSHPTR(NPTS) Pointer for DELSOURCE RADIANCE(NSTOKES,*) Spherical harmonic terms of the radiance field RSHPTR(NPTS) Pointer for RADIANCE WORK Work array for path integration; radiance field, Nazimuth*Npts WORK1/SWEEPORD Work array for path integration; grid point sweeping order WORK2/GRIDRAD Work array for path integration; single angle radiance Output arrays: FLUXES(2,) The two hemispheric diffuse fluxes for each grid point computed from the discrete ordinate radiance field. SUMFLUXES Running k-distribution sum array for fluxes SUMDIRFLUX Sum array for direct beam flux SUMFLUXDIV Sum array for net flux divergence SUMSHTERMS Sum array for spherical harmonic output SUMRADOUT Sum array for radiance output (actual size needed is unknown) MU(NMU) The Gaussian quadrature cosine zenith angles (mu<0 first) WTMU(NMU) The Gaussian integration weights WTDO(NMU*NPHI0) Discrete ordinate integration weights PHI(NMU*NPHI0) Discrete azimuths, evenly spaced for each mu FFTFLAG(NMU) Logical array, true to do FFT instead of DFT with CPHIn CMU1(NSTLEG,NLM,NMU) The spherical harmonic/discrete ordinate transform CMU2(NSTLEG,NMU,NLM) coefficients. The CMUn are for the zenith angle transform, CPHI1(-16:16,32,NMU) and the CPHIn are for the azimuthal angle transforms. CPHI2(32,-16:16,NMU) The 1 coefficients are for the SH to DO transform, and the 2 coefficients (which have the integrations weights) are for the DO to SH transform. YLMSUN(NSTLEG,NLM) The spherical harmonic terms for the solar beam direction, which are for computing the single scattering source. FIXSH Logical, if true than adaptive SH truncation is fixed OUTOFMEM Logical, true if out of memory for splitting cells CURSPLITACC Splitting accuracy used at a particular iteration SPLITCRIT Splitting criterion currently achieved STARTADAPTSOL, ENDADAPTSOL Starting and ending solution criterion for splitting TRANSMIN Transmission cutoff for back tracing through cells INRADFLAG Logical variable, true if there is an input save file NEWGRIDFLAG Logical, true to make new grids, otherwise use last ones BASEOUT Logical, true to initialize/output with base grid for k-dist ITER The number of iterations done for one SOLVE_RTE TOTITER The total number of iterations done (over a k-dist) SOLCRIT The final solution criterion achieved NRAD Total number of output radiance locations/angles NVIS Total number of output visualization pixels LIST OF ROUTINES The following is a list of all the subroutines, where they are, and what they do. The first column gives the file (0 is shdom.f90, 1 is shdomsub1.f, 2 is shdomsub2.f, 3 is shdomsub3.f, M is shdom_mpi.f90, F is fftpack.f). The second column indicates the calling routine. The major solving routines are in shdomsub1.f, other routines are in shdomsub2.f, all the regular I/O routines are in shdomsub3.f, the MPI calling routines are in shdom_mpi.f90, and fftpack.f has the FFTPACK routines from NCAR. Routine Description 0 TRILIN_INTERP_PROP Interpolates medium properties to base/adaptive grid 0 DIRECT_BEAM_PROP Computes direct beam flux at a point 1 0 SOLVE_RTE (S) Main solver routine 1 CALC_SOURCE_PNT Calculates the SH source function at one grid point 1 CALC_SOURCE_PNT_UNPOL CALC_SOURCE_PNT for NSTOKES=1 1 S COMPUTE_SOURCE Computes the SH source function from the radiance 1 S RADIANCE_TRUNCATION Computes the next radiance truncation to use 1 S ACCELERATE_SOLUTION Accelerates the solution sequence 1 S PATH_INTEGRATION (P) Integrates the source function to get radiance 1 P BOUNDARY_PNTS Makes BCPTR of list of top and bottom boundary points 1 P SURFACE_PARM_INTERP Interpolates variable surface params to adaptive grid 1 S CHECK_UNIFORM_SFC Determines if the surface file reflection is uniform 1 P COMPUTE_TOP_RADIANCES Makes the top boundary radiances 1 P FIXED_LAMBERTIAN_BOUNDARY Makes the bottom boundary upwelling radiances 1 P VARIABLE_LAMBERTIAN_BOUNDARY Makes the bottom boundary upwelling radiances 1 P VARIABLE_BRDF_SURFACE Makes the upwelling radiances for general BRDF 1 S MAKE_SFC_BRDF_DO_MATRIX Makes the uniform surface BRDF function for discrete ordinates 1 P UNIFORM_BRDF_SURFACE Makes the upwelling radiances for uniform BRDFs 1 P SH_TO_DO Spherical harmonic to discrete ordinate transform 1 P SH_TO_DO_UNPOL SH_TO_DO for NSTOKES=1 1 P DO_TO_SH Discrete ordinate to SH transform 1 P DO_TO_SH_UNPOL DO_TO_SH for NSTOKES=1 1 P SWEEPING_ORDER Makes the point sweeping order for 2,4, or 8 octants 1 P BACK_INT_GRID3D (B) 3D source function integration routine for one angle 1 P BACK_INT_GRID3D_UNPOL BACK_INT_GRID3D for NSTOKES=1 1 P BACK_INT_GRID2D (B) 2D source function integration routine for one angle 1 P BACK_INT_GRID1D (B) 1D source function integration routine for one angle 1 B NEXT_CELL Gets next cell to go to adjacent to this face 1 SWEEP_BASE_CELL Returns next base grid cell in sweep 1 SWEEP_NEXT_CELL Returns next grid cell in sweep 1 S SPLIT_GRID (G) Main routine for adaptive cell splitting 1 G INTERPOLATE_POINT Interpolates medium, source for new grid points 1 G DIVIDE_CELL (D) Divides a single cell, adjusts the adaptive structure 1 D MATCH_NEIGHBOR_FACE Resolves the neighbor pointers for a new cell 1 INHERIT_NEIGHBOR Updates a cell's new neighbor cell 1 D NEW_GRID_POINTS Makes two new points and sets up the pointers 1 MATCH_GRID_POINT Find whether neighboring cells have matching gridpoints 1 G CELL_SPLIT_TEST Computes splitting criterion and direction to split 1 G GRID_SMOOTH_TEST Determines if cell will be split for a "smoother" grid 2 NEW_GRIDS Makes the internal X, Y, and Z grids 2 0 INIT_CELL_STRUCTURE Make initial grid data structure for the base grid 2 S INTERP_GRID Interpolates medium from property to internal grid 2 S MAKE_DIRECT Makes direct beam solar flux (calls DIRECT_BEAM_PROP) 2 S PREPARE_PROP Does delta-M scaling; makes PLANCK 2 S INIT_RADIANCE Does an Eddington (two-stream) initializatin 2 EDDRTF Solves Eddington radiative transfer for flux 2 TRIDIAG Tridiagonal solver used by EDDRTF 2 S MAKE_ANGLE_SET Makes angle set for the discrete representation 2 S MAKE_SH_DO_COEF Makes the SH/DO transform coefficients 2 SURFACE_BRDF Computes the reflection for the one of the BRDF types 2 WAVE_FRESNEL_REFLECTION Routine for Wave-Fresnel polarized reflection 2 DINER_REFLECTION Routine for Diner polarized reflection 2 RPV_REFLECTION Function for Rahman, Pinty, Verstraete reflection 2 0 SUM_OUTPUT Sums an output array over the k-distribution. 2 0 COMPUTE_NETFLUXDIV Computes the net flux divergence everywhere 2 0 COMPUTE_SH Computes the SH output arrays 2 0 INTERP_OUTPUT Interpolates the output data to the new grid points 2 S INTERP_RADIANCE Interpolates SH radiance array to the new points 2 0 VISUALIZE_RADIANCE Computes radiances for the two visualization modes 2 INTEGRATE_1RAY Integrates the source function for one ray 2 FIND_BOUNDARY_RADIANCE Gets the interpolated radiance at the boundary 2 COMPUTE_SOURCE_1CELL Computes source times extinction for one cell 2 COMPUTE_SOURCE_1CELL_UNPOL COMPUTE_SOURCE_1CELL for NSTOKES=1 2 PRECOMPUTE_PHASE Precomputes phase function for all tabulated functions 2 ROTATE_POL_PLANE Rotates the plane of polarization for single scattered Stokes vector 2 0 COMPUTE_RADIANCE (R) Computes the desired output radiance values 2 R COMPUTE_ONE_SOURCE Computes the source function for 1 angle 2 SOLAR_SINGSCAT_VECT Compute Stokes vector for a single scattering of sunlight 2 R INTEGRATE_SOURCE Integrates the source function backward for radiance 2 INTERPOLATE_FIELD Trilinearly interpolates field given a grid cell 2 LOCATE_GRID_CELL Locates grid cell containing the specified point 2 LEGENDRE_ALL Computes a set of Legendre polynomials for one angle 2 S YLMALL Computes a set of real generalized spherical harmonic functions 2 S YLMALL_UNPOL YLMALL for NSTOKES=1 2 WIGNERFCT02P2M_NORMALIZED Computes some Wigner d-functions 2 DMM1_N0 Computes other Wigner d-functions 2 WIGNERFCT_DM0 Computes more Wigner d-functions 2 DM_M10_N0 Computes more Wigner d-functions 2 WIGNERFCT Computes more Wigner d-functions 2 PLMALL Computes more Wigner d-functions 2 PLANCK_FUNCTION Monochromatic Planck function routine 2 INTEGRATE_PLANCK Band integrates the Planck function 2 GAUSQUADS Gaussian quadrature routine 2 DGAUSQUADS Double Gaussian quadrature routine 2 SSORT Sorting routine for adaptive cell criterion 3 0 USER_INPUT Gets the input parameters from stdin 3 0 NAMELIST_INPUT Gets the input parameters from a namelist 3 0 CHECK_INPUT_PARMETERS Checks input parameters for validity/reasonableness 3 0 READ_PROPERTY_SIZE Gets the size of the property file 3 0 READ_PROPERTIES Reads the property file 3 0 CHECK_PROPERTY_INPUT Checks the values read in from the property file 3 0 READ_SURFACE_SIZE Gets the size of the surfae file 3 0 READ_SURFACE Reads the surface file 3 0 READ_CKD_SIZE Gets the size of the k-distribution file 3 0 READ_CKD Reads the correlated k-distribution file 3 0 RESTORE_STATE Reads the binary state file 3 0 SAVE_STATE Writes the binary state file 3 S OUTPUT_CELL_SPLIT Outputs the cell splitting criterion for every cell 3 S VISUALIZE_CELLS Outputs GLE graphics of cell structure 3 0 OUTPUT_RESULTS Selects and outputs the results 3 0 OUTPUT_RESULTS_PAR Outputs results when using MPI 3 0 OUTPUT_IMAGE Outputs the visualization PGM images with labels M 0 START_MPI Initializes the MPI system M 0 MAP_SHDOM_MPI Sets the mapping of processors for the SHDOM domain M OPTIMIZE_PROCESSOR_DOMAINS Calculates the location of this processor M LOAD_OBJ_FUNC The objective function to minimize in the load balancing optimization M ran Uniform random number generator M 0 BROADCAST_USER_INPUT Broadcasts the user input parameters from the master processor M 0 BROADCAST_PROPERTY_SIZE Broadcasts the property file size information M 0 SCATTER_PROPERTIES Sends the subdomain of optical properties to each processor M 0 BROADCAST_SURFACE_SIZE Broadcasts the surface array sizes from the master processor M 0 BROADCAST_SURFACE_PARMS Broadcasts the surface file parameters from the master processor M 0 BROADCAST_KDIST_SIZE Broadcasts the k-distribution array sizes from the master processor M 0 BROADCAST_KDIST_PARMS Broadcasts the k-distribution file parameters from the master processor M 0 READ_BROADCAST_MEM_PARAMS Reads and broadcasts the *_mem_params.inp file M 0 SUM_CPU_TIME Sums the CPU time used across all processors M 0 GATHER_OUTPUT Gathers the base grid output data from all the processors M 1 TOTAL_ALBEDO_MAX Finds the max ALBMAX over all processors M S UNIFY_SPLITTING Turns on cell splitting (DOSPLIT=.TRUE.) for all processors if any want to M 1 TOTAL_SPLITCRIT_MAX Finds the max SPLITCRIT over all processors M 2 MAKE_DIRECT_PAR Calculates the grid point direct solar fluxes with multiple processors M P FIND_BOUNDARY_POINTS Makes the pointers to and positions of the boundary grid points of the neighboring processors M B CALC_BOUNDARY_RADIANCES Calculates the discrete ordinate boundary radiances M INTEGRATE_RAY Does the RTE integrations for CALC_BOUNDARY_RADIANCES M 0 COMPUTE_RADIANCE_PAR Computes the output radiances using multiple processors M 0 VISUALIZE_RADIANCE_PAR Computes the visualization output using MPI M S CALC_ACCEL_SOLCRIT Calculates full domain acceleration parameter (& solcrit) M 0 END_SHDOM_MPI Shuts down MPI, writes load balancing file M M ABORT_SHDOM_MPI Prints error message and aborts MPI F COSTI Sets up for the cosine FFT F COST Does the cosine transform FFT F RFFTI Sets up for the real FFT F RFFTB Does the real FFT to discrete azimuths F RFFTF Does the real FFT to Fourier coefficients