Create a monitoring policy
SWE allows your team to proactively oversee and ensure the performance of your AI models. This is essential for maintaining the integrity and reliability of your models in production. The following documentation guides you through establishing a custom monitoring policy tailored to the unique needs of your landscape.
Using the UI
Step 1: Defining your custom metric
Every monitoring journey with SWE begins with a metric β the vital sign of your modelβs health. Start by considering what you need to monitor. SWE's query builder is your tool for this job; it provides a user-friendly interface to craft custom metrics. whether you're tracking prediction accuracy, model drift, latency, or other performance indicators.

Step 2: Establish the monitoring frequency
Set how often SWE should evaluate the metric against the set threshold.
Important notice
Please note that the monitoring frequency is determined by UTC time. For example, if you want the policy to run every day at 07:00 AM EST, you should set the monitoring frequency to 12:00 PM (UTC).
Step 3: Avoiding Cold Starts
To avoid a cold start, SWE offers you the ability to initialize data with simulated inputs via a simulation we provide. All you need to do is mark the checkbox.

Step 4: Setting up your monitoring condition
Once your query is established, the next phase is to specify the conditions under which you desire the monitoring to take effect.
Hereβs where you establish the rules:
Use static threshold
Use a static threshold when you have a clear, predefined idea of what constitutes bad data. For example, you might set a static threshold if you know that it's unacceptable for more than 1000 of transactions to exceed $100.
Threshold determination: Decide upon the numeric threshold that, when breached, signifies an anomaly.

Use moving average threshold
If you're unsure of the exact levels that define bad data but want to detect changes or anomalies, a dynamic threshold is more suitable. For instance, you might not remember the typical percentage of missing values per model feature, but you want to flag any deviation from the norm. In such cases, a dynamic threshold helps by adapting to the usual data patterns and identifying unusual behavior.
Threshold determination: Determine the number of standard deviations that will qualify as an anomaly. Additionally, specify the number of recent evaluation points (query run results) to include when calculating the standard deviation.

Step 5: Configuring alerts
With your policy shaped up and ready, it's time to decide on the administration of alerts:
- Alert triggers: Outline the precise conditions that will trigger an alert. This ensures that the right people get informed at the right time.
- Destination channel: Currently, SWE is designed to send alerts directly via a Slack integration. This ensures that your team is promptly notified within a familiar workspace environment for quick collaboration and resolution.

Using the SDK:
Create monitoring via the SDK including one function with multiple variables:
Policy with a static threshold
from superwise_api.models.policy.policy import StaticThresholdSettings
from superwise_api.models.policy.policy import DataConfigStatistics
from superwise_api.models.policy.policy import TimeRangeConfig
ds = sw.dataset.get_by_id(dataset.id)
sw.policy.create(name=STATIC_POLICY_NAME,
dataset_id=ds.id,
data_config=DataConfigStatistics(
query= {"measures": [f"{ds.internal_id}.avgColumn_name"]},
time_range_config=TimeRangeConfig(
field_name="received ts",
unit="HOUR",
value=2
)
),
cron_expression= "2 */1 * * *",
threshold_settings = StaticThresholdSettings(
condition_above_value= 3.0,
condition_below_value= None,
),
alert_on_status= "HEALTHY_TO_UNHEALTHY",
alert_on_policy_level= True,
destination_ids= [],
initialize_with_historic_data=True
)
Policy with a moving average threshold
from superwise_api.models.policy.policy import MovingAverageThresholdSettings
from superwise_api.models.policy.policy import DataConfigStatistics
from superwise_api.models.policy.policy import TimeRangeConfig
ds = sw.dataset.get_by_id(dataset.id)
sw.policy.create(name=DYNAMIC_POLICY_NAME,
dataset_id=ds.id,
data_config=DataConfigStatistics(
query= {"measures": [f"{ds.internal_id}.avgColumn_name"]},
time_range_config=TimeRangeConfig(
field_name="received_ts",
unit="HOUR",
value=2
)
),
cron_expression= "2 */1 * * *",
threshold_settings = MovingAverageThresholdSettings(
violation_deviation=1,
window_size=10,
is_violation_above=True,
is_violation_below=True
),
alert_on_status= "HEALTHY_TO_UNHEALTHY",
alert_on_policy_level= True,
destination_ids= []
)
Let's elaborate on each:
- name - Assign a descriptive and unique name to your monitoring policy for easy identification.
Defining your custom metric
- dataset_id - Specify the ID of the dataset you wish to monitor.
- query - Construct a query using a list of measures in the format:
{dataset_id}.{aggregation_function}{column_name}
. Adhere to these guidelines when forming a measure:- Use one of the available aggregation functions, written exactly as shown. Aggregation functions list:
count
(not on a specific field),avg
,max
,min
,sum
,median
,quantile25
,quantile75
,quantile90
- Start the column name with an uppercase letter.
- For example, to compute the average age in a dataset with the ID 12345, use:
12345.avgAge
.
- Use one of the available aggregation functions, written exactly as shown. Aggregation functions list:
- time_range_field - Select the field you want to apply the time range selection on.
- time_range_value and time_range_unit - Define the time period to filter your query results by selecting a number (time_range_value) and a unit (time_range_unit) out of the following units:
MINUTE
,HOUR
,DAY
,WEEK
,MONTH
Avoid cold start
- initialize_with_historic_data - To avoid a cold start, initialize data with simulated inputs via a simulation we provide for you.
Setting up your monitoring condition
- cron_expression - Determine how frequently your policy runs by setting up a cron expression (In UTC)
Cron expression
* * * * *
- Minute: Ranges from 0 to 59.
- Hour: Ranges from 0 to 23 (24-hour time).
- Day of the Month: Ranges from 1 to 31.
- Month: Ranges from 1 to 12, or you can use the three-letter abbreviation (Jan, Feb, Mar, etc.).
- Day of the Week: Ranges from 0 to 6, where 0 or 7 stands for Sunday, 1 for Monday, etc. You can also use three-letter abbreviations (Sun, Mon, Tue, etc.).
(Optional)Each field can have a specific value, a range (e.g., 2-4), a list separated by commas (e.g., 1,3,5), an asterisk (which means "any value"), or a combination of these. You can also use "/" to specify increments and "-" for ranges.
Example: The cron expression
2 /1 * * *
means, run the policy at the second minute of every hour, every day, every month, and every day of the week. In other words, it will run every hour when the clock hits 2 minutes past the hour.
- threshold_settings
- Static threshold
- condition_above_value / condition_below_value - Establish numerical limits that indicate an anomaly. Set only one parameter to define a threshold or both to establish a range. At least one of these must be set to a value other than None.
- Moving average threshold
- violation_deviation: Specifies the number of standard deviations considered a violation.
- window_size: Defines the number of evaluation points (query run results) to include when calculating the standard deviation.
- is_violation_above / is_violation_below: By default, both are set to true, meaning violations are considered both above and below the threshold. You can set either of these to false based on your specific requirements.
- Static threshold
Configuring alerts
- destination_ids- Provide the IDs of the channels where you'd like to receive notifications.
- alert_on_status- Specify which state transitions should trigger an alert. Options include:
HEALTHY_TO_UNHEALTHY
UNHEALTHY_TO_HEALTHY
BOTH
Query Example:
Average Age Grouped by is_male, where is_male is not null, order by timestamp asc
internal_dataset_id = <your_internal_dataset_id>
query = {
"measures": [f"{internal_dataset_id}.avgAge"],
"dimensions": [f"{internal_dataset_id}.is_male"],
"filters": [
{
"member": f"{internal_dataset_id}.is_male",
"operator": "set",
}
],
"order": [
{
"id": f"{internal_dataset_id}.ts",
"desc": false
}
]
},
Updated 24 days ago