regExp for matching directories

https://stackoverflow.com/questions/19385287

ncml
thredds

30-06-2022
|

Question

I have a somewhat complex directory structure for NetCDF files I want to create a THREDDS catalog for.

/data/buoy/A0121/realtime/A0121.met.realtime.nc
                         /A0121.waves.realtime.nc
                         etc.
/data/buoy/A0122/realtime/A0122.met.realtime.nc
                         /A0122.sbe37.realtime.nc
                         etc.
/data/buoy/B0122/realtime/B0122.met.realtime.nc
                         /B0122.sbe37.realtime.nc
etc.

But I have found that the regExp attribute in both datasetScan and aggregation/scan elements does not seem to be able to handle subdirectories using regExp. For example this catalog entry works.

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime"
   location="/data/buoy/B0122" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>

But the following does not. No datasets are found.

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime" 
  location="/data/buoy" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="B0122/realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>

This is a greatly simplified example done just to confirm that regExp does not match subdirectories which is implied at the bottom of this ncML page. http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/v2.2/AnnotatedSchema4.html

My real goal is to use ncML aggregation via <scan regExp="">

Should I be using FeatureCollections? These are pretty simple time series buoy observation files.

Solution 2

<filter>
  <include regExp="[A-Z]{1}[0-9]{4}" atomic="false" collection="true" />
  <include wildcard="realtime" atomic="false" collection="true" />
  <include wildcard="post-recovery" atomic="false" collection="true" />
  <include wildcard="*.nc" />
  <!-- exclude directory -->
  <exclude wildcard="old" atomic="false" collection="true" />
</filter>

OTHER TIPS

If you are scanning files for an <aggregation> and you want to include subdirectories, you can add subdirs="true" inside the <scan> element, for example:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <aggregation dimName="ocean_time" type="joinExisting">
        <scan location="." regExp=".*vs_his_[0-9]{4}\.nc$" subdirs="true"/>        
    </aggregation>
</netcdf>

For datasetScan datasets, the regexp filter will automatically apply to all subdirectories, so if you wanted to apply those filters to all subdirectories, you could just do:

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime" 
  location="/data/buoy" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow