getf v0.1

Note: this is complete re-write of the getf tool for use on HPCX with the Tivoli IBM archiving system

getf is a programme that automatically extracts a given field or fields from the HPCX archive of Unified Model output. If you are trying to extract data from CSAR UM experiments that are now archived on HPCX you need to use getf_old

When automatic archiving is switched on in a UM job, model output is automatically archived to tape by the Tivoli archiving system at HPCX. getf automatically extracts a given set of fields from a given job in this archive and returns individual netcdf files for each field requested.

usage

There are three ways getf can be used:

getf -diags STASH_FILE [USAGE] 
getf -list EXPT JOB MODEL [TYPE] [SEASON] [-date date]
getf -extract EXPT JOB MODEL TYPE [SEASON] "field list" OUTPUT_FILENAME [-date date]

To extract data from the archive we must specify the EXPERIMENT (eg xbaq), the JOB within that experiment (eg c), the MODEL (a=atmosphere, o=ocean) and the TYPE (m=monthly mean, s=seasonal mean, y=annual mean or one of the a-j streams for those diagnostics such as daily mean PPT etc). We must also specify a list of FIELDS to extract (eg PPT, MSLP, U, V etc). These fields are specified by field numbers - the position of the given field in the model output file. The first two forms of getf enable the user to examine the archive to discover these parameters.


getf -diags STASH_FILE [USAGE]

The -diags flag allows the user to find the field number for a given field (eg PPT or GPH) within a given experiment. This requires the original stash table from the UMUI for that job. To create this table: within the UMUI - go to STASH and then Diagnostics_Output-Tale-To-File. The table will normally be written to $HOME/umui_jobs. For example: /home/dan/umui_jobs/xchcc.A.diags is the (A)tmosphere stash table for Experiment XCHC, JOB C. Transfer this file to HPCX (eg using scp or sftp) and then do, eg

getf -diags xchcc.A.diags
the output will look something like:
Creating Temporary Directory /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp
______________________________________________________________________________________
 diagnostics in xchcc.A.diags. Refer to left hand column for field index.

Key:
  I: Included or excluded
  P:  Package and include status +A for include
  A: Available
  US: User or system diagnostic

Sec Itm Diagnostic Name                     Time     Domain   Usage   I P  A US
-------------------------------------------------------------------------------
 0   1 PSTAR AFTER TIMESTEP                TDAYMN   DIAG     UPMEAN   Y +  Y S
 0  10 SPECIFIC HUMIDITY AFTER TIMESTEP    TDAYMN   DA19     UPMEAN   Y +  Y S
 0  10 SPECIFIC HUMIDITY AFTER TIMESTEP    TDAYMN   DA19V    UPMEAN   Y +  Y S
 0  24 SURFACE TEMPERATURE AFTER TIMESTEP  TDAYMN   DIAG     UPMEAN   Y +  Y S
 0  31 SEA ICE FRACTION AFTER TIMESTEP     TDAYMN   DIAG     UPMEAN   Y +  Y S
 1 201 NET DOWN SURFACE SW FLUX: SW TS ONL TDAYMN   DIAG     UPMEAN   Y +  Y S
 1 207 INCOMING SW RAD FLUX (TOA): ALL TSS TDAYMN   DIAG     UPMEAN   Y +  Y S
 1 208 OUTGOING SW RAD FLUX (TOA)          TDAYMN   DIAG     UPMEAN   Y +  Y S
 5 216 TOTAL PRECIPITATION RATE     KG/M2/ TDAYMN   DIAG     UPD      Y +  Y S 
15 201 U COMPNT OF WIND ON PRESSURE LEVELS TDAYMN   DP17Z    UPD      Y +  Y S
15 201 U COMPNT OF WIND ON PRESSURE LEVELS TDAYMN   DP3      UPD      Y +  Y S
15 202 V COMPNT OF WIND ON PRESSURE LEVELS TDAYMN   DP17Z    UPD      Y +  Y S
15 202 V COMPNT OF WIND ON PRESSURE LEVELS TDAYMN   DP3      UPD      Y +  Y S
______________________________________________________________________________________
If you now specify a USAGE based on this output (eg UPMEAN). eg
getf -diags xchcc.A.diags UPMEAN
You will get:
Creating Temporary Directory /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp
______________________________________________________________________________________
UPMEAN diagnostics in xchcc.A.diags. Refer to left hand column for field index.

0 :  0   1 PSTAR AFTER TIMESTEP                TDAYMN   DIAG     UPMEAN   Y +  Y S
1 :  0  10 SPECIFIC HUMIDITY AFTER TIMESTEP    TDAYMN   DA19     UPMEAN   Y +  Y S
2 :  0  10 SPECIFIC HUMIDITY AFTER TIMESTEP    TDAYMN   DA19V    UPMEAN   Y +  Y S
3 :  0  24 SURFACE TEMPERATURE AFTER TIMESTEP  TDAYMN   DIAG     UPMEAN   Y +  Y S
4 :  0  31 SEA ICE FRACTION AFTER TIMESTEP     TDAYMN   DIAG     UPMEAN   Y +  Y S
5 :  1 201 NET DOWN SURFACE SW FLUX: SW TS ONL TDAYMN   DIAG     UPMEAN   Y +  Y S
6 :  1 207 INCOMING SW RAD FLUX (TOA): ALL TSS TDAYMN   DIAG     UPMEAN   Y +  Y S
7 :  1 208 OUTGOING SW RAD FLUX (TOA)          TDAYMN   DIAG     UPMEAN   Y +  Y S
______________________________________________________________________________________
Now we can read of the field list number from the left (eg Total PPT (05216) is '2'.

In this case UPMEAN is a monthly mean (this can be discovered by examining the definition if UPMEAN within the STASH in the UMUI) we can now extract monthly means using:

getf -extract xbvy d a m "2" netcdf_outputfile
See below for more explanation. If we don't have access to the STASH table then we need to probe the archive directly using the -list option.
getf -list EXPT JOB MODEL [TYPE] [SEASON] [-date date]

If we don't have access to the STASH table then the -list option probes the archive directly.
The command:

getf -list xchc c a
(EXPT=xchc JOB=c MODEL=a (atmosphere) )
will list all the files in the archive produced by the atmosphere model in experiment XCHC, job C. The output will look like this:
Creating Temporary Directory /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp
______________________________________________________________________________________
Files:
Feb12_2007_13.16.09/xchcca.pdg0dec 
Feb12_2007_13.16.09/xchcca.pdg1apr 
Feb12_2007_13.16.09/xchcca.pdg1feb 
Feb12_2007_13.16.09/xchcca.pdg1jan 
Feb15_2007_22.20.42/xchcca.psg4mam 
Feb15_2007_22.20.42/xchcca.psg4son 
Feb15_2007_22.20.42/xchcca.pyg4c10 
______________________________________________________________________________________
extracting /hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb15_2007_22.20.42/xchcca.pyg4c10 .
 Please wait, this may take several minutes.
rdsmc retrieve -verbose -subdir=yes "/hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb15_2007_22.20.42/xchcca.pyg4c10 " /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pyg4c10 
______________________________________________________________________________________
Diagnostics in xchcca.pyg4c10 .
Refer to left hand column for field index.

file /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pyg4c10 is a 64 bit ieee um file 

 0    : 96    73    1     1     Pressure
 1    : 96    73    19    1     Specific humidity q
 2    : 96    73    1     1     Specific humidity q
 3    : 96    73    1     1     Temperature T
 4    : 96    73    1     1     Fractional sea-ice cover
 5    : 96    73    1     1     Net short wave radiation flux
 6    : 96    73    1     1     Downward solar
 7    : 96    73    1     1     Upward solar
 8    : 96    73    1     1     Clear-sky flux (type II) solar up
 9    : 96    73    1     1     Clear-sky flux (type II) solar down
 10   : 96    73    1     1     Clear-sky flux (type II) solar up
______________________________________________________________________________________

The first part of the output lists all files in the archive that belong to this EXPT,JOB and MODEL. The second part lists all the fields present in the last file in the previous list (in this case the annual mean file: xchcca.pyg4c10). If we want to extract the the fields from this last file (eg xchcca.pyg4c10) then we can now read off the field numbers (eg "4" for fractional sea ice cover). If we wish to examine monthly means only then we can specify this by using the TYPE option:
getf -list xchc c a m
will list only the monthly means. The output will be:
Creating Temporary Directory /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp
______________________________________________________________________________________
Files:
Feb12_2007_13.16.09/xchcca.pmg0dec 
Feb12_2007_13.16.09/xchcca.pmg1apr 
Feb12_2007_13.16.09/xchcca.pmg1feb 
Feb12_2007_13.16.09/xchcca.pmg1jan 
Feb12_2007_13.16.09/xchcca.pmg1mar 
Feb12_2007_13.16.09/xchcca.pmg1may 
Feb15_2007_22.20.42/xchcca.pmg4oct 
Feb15_2007_22.20.42/xchcca.pmg4sep 
______________________________________________________________________________________
extracting /hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb15_2007_22.20.42/xchcca.pmg4sep .
 Please wait, this may take several minutes.
rdsmc retrieve -verbose -subdir=yes "/hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb15_2007_22.20.42/xchcca.pmg4sep " /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pmg4sep 
______________________________________________________________________________________
Diagnostics in xchcca.pmg4sep .
Refer to left hand column for field index.

file /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pmg4sep is a 64 bit ieee um file 

 0    : 96    73    1     1     Pressure
 1    : 96    73    19    1     Specific humidity q
 2    : 96    73    1     1     Specific humidity q
 3    : 96    73    1     1     Temperature T
 4    : 96    73    1     1     Fractional sea-ice cover
 5    : 96    73    1     1     Net short wave radiation flux
 6    : 96    73    1     1     Downward solar
 7    : 96    73    1     1     Upward solar
 8    : 96    73    1     1     Clear-sky flux (type II) solar up
 9    : 96    73    1     1     Clear-sky flux (type II) solar down
 10   : 96    73    1     1     Clear-sky flux (type II) solar up

______________________________________________________________________________________
From this output we can read off the field numbers for a given field within the monthly mean files. If we specify SEASONAL means then we must also specify a season. eg:
getf -list xchc c a s djf

The -date option allows us to specify only those files output on a specific date. For example:
getf -list xchc c a m -date Feb12_2007_13.16.09
produces:
Creating Temporary Directory /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp
______________________________________________________________________________________
Files:
Feb12_2007_13.16.09/xchcca.pmg0dec 
Feb12_2007_13.16.09/xchcca.pmg1apr 
Feb12_2007_13.16.09/xchcca.pmg1feb 
Feb12_2007_13.16.09/xchcca.pmg1jan 
Feb12_2007_13.16.09/xchcca.pmg1mar 
Feb12_2007_13.16.09/xchcca.pmg1may 
______________________________________________________________________________________
extracting /hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb12_2007_13.16.09/xchcca.pmg1may .
 Please wait, this may take several minutes.
rdsmc retrieve -verbose -subdir=yes "/hpcx/devt/n02/n02-ncas/lrdlrh/um_archive/xchcc/Feb12_2007_13.16.09/xchcca.pmg1may " /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pmg1may 
______________________________________________________________________________________
Diagnostics in xchcca.pmg1may .
Refer to left hand column for field index.

file /hpcx/devt/n02/n02-ncas/lrdlrh/getf_tmp/xchcca.pmg1may is a 64 bit ieee um file 

 0    : 96    73    1     1     Pressure
 1    : 96    73    19    1     Specific humidity q
 2    : 96    73    1     1     Specific humidity q
 3    : 96    73    1     1     Temperature T
 4    : 96    73    1     1     Fractional sea-ice cover
 5    : 96    73    1     1     Net short wave radiation flux
 6    : 96    73    1     1     Downward solar
 7    : 96    73    1     1     Upward solar
 8    : 96    73    1     1     Clear-sky flux (type II) solar up
 9    : 96    73    1     1     Clear-sky flux (type II) solar down
 10   : 96    73    1     1     Clear-sky flux (type II) solar up
______________________________________________________________________________________

and hence selects only the files written into that dated subdirectory.


getf -extract EXPT JOB MODEL TYPE [SEASON] "field list" OUTPUT_FILENAME_STEM [-date date]

After discovering the relevant parameters using the -diags or -list option we are now in a position to extract the required fields as netcdf files. All the parameters used in the -extract option have been described above. -extract requires the an additional parameter - the output filename stem. Each netcdf file will be of the form OUTPUT_FILENAME_STEM-FIELD_NUMBER.nc
The -extract also allows you to use the -date option as described above.

Examples:

Extract monthly mean fields Pressure (0) and Temperature (3) from the atmosphere model from experiment XCHC, job C:

getf -extract xchc c a m "0 3" output_netcdf
output:
output_netcdf_0.nc
output_netcdf_3.nc
Extract djf seasonal mean fields Upward Solar (7) from the atmosphere model from experiment XCHC, job C:
getf -extract xchc c a s djf  "7" output_netcdf
output:
output_netcdf_7.nc

Location

getf is located in /hpcx/home/n02/n02/lrdlrh/bin/getf on login.hpcx.ac.uk

Submitting getf as a batch job

Sometimes a getf job can take a considerable ammount of time to process depending on the ammount of data to extract and the load on the archiving system. Consequentlty, it may be better to submit a getf job to the batch queue on HPCX. Here's how to do this:

  1. Create a file (e.g. xbchd.getf) containing the getf commands. eg:
    /hpcx/home/n02/n02/lrdlrh/bin/getf -extract xchc D A s "djf" "6" xchcd -date "Feb13*"
    
  2. chmod +x xbchd.getf
  3. create a (loadleveller) script (eg xbchd.getf.batch) of the form:
    #@ shell = /bin/ksh
    #
    #@ job_name = xchcd.getf
    #
    #@ job_type = serial
    #@ node_usage = shared
    #
    #@ wall_clock_limit = 12:00:00
    #@ account_no = n02-ncas
    #
    #@ output = xchcd.getf.log
    #@ error  = xchcd.getf.log
    #@ notification = complete
    #@ notify_user = me@my.email.address
    #
    #@ queue
    #
    
    ./xchcd.getf
    
    
    where me@my.email.address is your email address.
  4. chmod +x xchcd.getf.batch
  5. submit this job to the batch queue (load leveller) with:
    llsubmit xchcd.getf.batch
    
Load Leveller will then email you when the job is complete. Output and error logs will be sent to xchcd.getf.log

Author

Contact d.l.r.hodson(#AT#)reading.ac.uk to report bugs or make suggestions
Dan Hodson - 16 Feb 2007
Last updated Wednesday, 18-Apr-2007 14:45:18 BST