With the above settings the directory tree of your file system will be used to create the folder hierarchy in RAMADDA. Every file the Harvester finds will result in a file entry. You can run a Harvester any number of times and it will only add the new files that it has not seen before.
The "Active on startup" flag, when set, results in the harvester being started when the repository starts up. The "Run continually" flag has the harvester continually run. It uses the "Every" setting to determine the pauses between runs. You can choose Absolute time to pause every N minutes. Or, you can choose "Minutes" or "Hourly" to have it run relative to the hour or the day, e.g. "3 hourly" will run at 0Z, 3Z, 6Z, 9Z, etc.
For example, if you know you are getting data files in real-time that are coming in every 30 minutes you could set your harvester to run in "Absolute" mode every 15 minutes. If you had a Web harvester that is fetching images you might want to use an "Hourly" setting to get the image at some fixed interval (e.g., 0Z, 6Z, 12Z, 18Z, etc).
The regular expressions used are somewhat extended in that you can specify subsets of the regular expression and use the result text for metadata and other information when creating the entry in the repository. For example, a very common case is to have a date/time embedded in the filename. So, you could have in your regular expression something of the form:
.*data(fromdate:\d\d\d\d\d\d\d\d_\d\d\d\d)\.ncThis would match any files of the form:
data_yyyymmdd_hhmm.ncThe "(" and ")" define the sub-expression (just like normal regular expression). But the "fromdate:" is the special extension that tells the harvester that that sub-expression is used to create the repository entry fromdate field.
The date format that is used is defined in the Date Format field and follows the Java date format conventions.
If you are creating entries of a certain type that has a number of attributes you can extract the attribute values using this extended regular expression technique. For example, if you had an entry with two attributes attr1 and attr2 and your files were of the format:
<attr1>_<attr2>.csvYour regular expression would be:
(attr1:[^/]+.)_(attr2:[^/]*).csvThis says that attr1 is any number of characters except the slash ("/"). The slash exclusion is used to exclude the file path as the full file path is used when matching patterns. The value for attr2 follows the "_" and is any number of characters except a slash.
To define the folder you need to select an existing base folder and then optionally specify a folder template. The folder template is used to automatically create a new folder if needed. So for example, if your base folder was: Top/Data and your Folder Template was: Ingested/Satellite then the result folder would be:
Top/Data/Ingested/SatelliteThe Harvester would create the Ingested and the Satellite folders as needed.
The name, description and folder templates all can contain the following macros. Note: The different date fields (e.g., create_, from_ and to) refer to the create date/time, the from data time (which defaults to the create date unless specified in the pattern) and the to data time.
${filename} | The file name (not the full path) |
${fileextension} | The file extension |
${dirgroup} | See below |
${create_date},${from_date}, ${to_date} | The full formatted date string |
${create_day}, ${from_day}, ${to_day} | The numeric day of the month |
${create_week}, ${from_week}, ${to_week} | The numeric week of the month |
${create_weekofyear}, ${from_weekofyear}, ${to_weekofyear} | The numeric week of the year |
${create_month},${from_month}, ${to_month} | Numeric month of the year |
${create_monthname},${from_monthname}, ${to_monthname} | Month name |
${create_year}, ${from_year}, ${to_year} | Numeric year |
The dirgroup macro is the parent directories of the data file up to but not including the main directory path we are searching under. For example, if you are looking under a directory called "/data/idd" and that directory held sub-dirs:
/data/idd/dir1/data1.nc /data/idd/dir1/dir2/data2.ncThen when ingesting the data1.nc file its dirgroup value would be:
dir1When ingesting the data2.nc file its dirgroup value would be:
dir1/dir2Another common way of defining the folder is to use the date macros. For example a folder template of the form:
${from_year}/${from_monthname}/Week ${from_week}Would result in folders like:
2009/January/Week 1 2009/January/Week 2 ... 2009/March/Week 1 2009/March/Week 2You can also name the entrys using the macros. So, using the above date based folder template you could then have a Name template that incorporates the formatted date:
Gridded data - ${from_date}
The Move file to storage checkbox allows you to determine whether the file is to be moved from its initial location
to the RAMADDA storage area.
Note: If the file is not moved to the storage area than one of the data directories the file lies under needs to be added
to the list of file system directories in the Admin->Access area