Example: Compare Ground Measurements with CAMS CRS Satellite Data#

This notebook loads one BSRN station-month via :class:~bsrn.dataset.BSRNDataset, runs the recommended pipeline (solposclear_sky REST2 / MERRA-2 HF → qc_testqc_mask), adds CAMS CRS all-sky columns from Hugging Face, time-averages with :meth:~bsrn.dataset.BSRNDataset.average, and plots a calendar comparison.

[1]:
import os

import bsrn
from bsrn.io.crs import add_crs_columns
from bsrn.visualization import plot_calendar

1. Input path#

[2]:
INPUT_FILE = "/Volumes/Macintosh Research/Data/bsrn-qc/data/QIQ/qiq0824.dat.gz"

print(f"--- Processing {INPUT_FILE} ---")

if not os.path.exists(INPUT_FILE):
    raise FileNotFoundError(f"File not found: {INPUT_FILE}")

--- Processing /Volumes/Macintosh Research/Data/bsrn-qc/data/QIQ/qiq0824.dat.gz ---

2. Load validated station-month and run the pipeline#

BSRNDataset.from_file loads LR0100 plus metadata. Then solpos, clear_sky(model='rest2'), qc_test, and qc_mask update the cached frame in place.

[3]:
ds = bsrn.BSRNDataset.from_file(INPUT_FILE)
print(f"Station {ds.station_code}: loaded {len(ds.data())} rows.")

# Pipeline: geometry, REST2 clear-sky, QC flags, then mask failures
print("solpos() …")
ds.solpos()
print("clear_sky(model='rest2') …")
ds.clear_sky(model="rest2")
print("qc_test() …")
ds.qc_test()
print("qc_mask() …")
ds.qc_mask()

Station QIQ: loaded 44640 rows.
solpos() …
clear_sky(model='rest2') …
Fetching MERRA-2 from Hugging Face: qiq/qiq0824_merra2.parquet
qc_test() …
qc_mask() …
[3]:
ghi bni dhi lwd temp rh pressure zenith apparent_zenith azimuth bni_extra ghi_extra ghi_clear bni_clear dhi_clear lwd_clear
2024-08-01 00:00:00+00:00 105.0 0.0 106.0 421.0 25.2 70.0 981.0 54.881 54.857 102.414 1320.650 759.739 507.439814 664.498840 125.169227 410.715619
2024-08-01 00:01:00+00:00 108.0 0.0 109.0 422.0 25.2 71.0 981.0 54.717 54.694 102.625 1320.650 762.828 509.994881 665.753119 125.445597 411.548727
2024-08-01 00:02:00+00:00 112.0 0.0 113.0 421.0 25.2 70.0 981.0 54.553 54.530 102.836 1320.650 765.910 512.545982 666.998615 125.720382 410.715619
2024-08-01 00:03:00+00:00 113.0 0.0 114.0 421.0 25.2 70.0 981.0 54.390 54.367 103.049 1320.650 768.968 515.077560 668.227890 125.991932 410.715619
2024-08-01 00:04:00+00:00 116.0 0.0 117.0 421.0 25.2 71.0 981.0 54.226 54.204 103.261 1320.650 772.038 517.620641 669.456101 126.263590 411.548727
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-08-31 23:55:00+00:00 123.0 0.0 123.0 371.0 21.9 45.0 988.0 61.982 61.951 110.304 1335.838 627.508 430.077633 741.828764 81.604368 359.041392
2024-08-31 23:56:00+00:00 129.0 0.0 130.0 370.0 21.9 45.0 988.0 61.825 61.794 110.521 1335.838 630.737 432.754922 743.268714 81.808573 359.041392
2024-08-31 23:57:00+00:00 135.0 0.0 136.0 370.0 21.9 46.0 988.0 61.668 61.637 110.738 1335.838 633.962 435.429929 744.697256 82.011590 360.170497
2024-08-31 23:58:00+00:00 138.0 0.0 139.0 370.0 21.9 47.0 988.0 61.511 61.481 110.956 1335.838 637.181 438.102614 746.114514 82.213429 361.278756
2024-08-31 23:59:00+00:00 141.0 0.0 142.0 371.0 21.9 46.0 988.0 61.354 61.324 111.174 1335.838 640.396 440.772936 747.520613 82.414101 360.170497

44640 rows × 16 columns

3. CRS all-sky columns and time averaging#

Add 1-minute CRS from Hugging Face before averaging. ds.average uses centered windows and replaces the cache with the averaged frame.

[4]:
print("add_crs_columns (Hugging Face) …")
add_crs_columns(ds.data(), station_code=ds.station_code)

print("average(freq='30min', alignment='center') …")
df_avg = ds.average(
    "30min",
    alignment="center",
    match_ceiling_labels=True,
)

print("\nProcessing complete. Sample averaged data:")
print(df_avg[["ghi", "ghi_clear", "ghi_crs"]].head())

add_crs_columns (Hugging Face) …
Fetching CRS from Hugging Face: qiq/qiq0824_crs.parquet
average(freq='30min', alignment='center') …

Processing complete. Sample averaged data:
                                  ghi   ghi_clear   ghi_crs
2024-08-01 00:30:00+00:00  181.666667  582.063900  200.3826
2024-08-01 01:00:00+00:00  456.650000  648.930252  420.4916
2024-08-01 01:30:00+00:00  356.782609  707.238474  414.9592
2024-08-01 02:00:00+00:00  291.666667  755.752768  392.6008
2024-08-01 02:30:00+00:00  295.133333  793.024248  375.5578

4. Calendar plot#

[5]:
# Calendar helper expects floor-aligned labels for this workflow.
# Use output_file=None to show the figure in the notebook (no PDF).
p = plot_calendar(
    df=df_avg,
    output_file=None,
    meas_col="ghi",
    clear_col="ghi_clear",
    other_cols=["ghi_crs"],
    labels=[
        "Measured GHI",
        "REST2 Clear-sky GHI",
        "CAMS CRS Satellite GHI",
    ],
)
p

/Volumes/Macintosh Research/Data/bsrn-qc/src/bsrn/visualization/calendar.py:87: UserWarning: Automatically sliced input DataFrame to target month: 2024-08
[5]:
../../_images/examples_huggingface_obtain_crs_9_1.png