Connecting
Integrating your model data with SWE is a simple process meant to improve your ML operations. Here are the necessary steps to input your model's data into our platform from different stages of your data pipeline. Once connected, the data is systematically arranged into datasets for easy querying, allowing you to derive insights from all aspects of your operation.
How to Connect Data to Superwise
Currently, SWE supports data integration via file storage services, specifically GCS and AWS. Any file added to the file storage will automatically trigger our platform to fetch the data immediately. Here's how to set it up:
- Create a Source: A 'Source' in SWE is the object that establishes a connection to your file storage. This step provides an entry point for your data to be uploaded into the platform. It's primarily a one-time task where you configure permissions to grant access to the bucket.
- Create a Dataset: After setting up a Source, you need to create a Dataset. This object will hold your data within the SWE platform. You can query the data source and build Dashboards or set up monitoring policies based on it.
- Connect the Dataset to the Source: Establish a connection between the Dataset and the Source by matching the specific path for each dataset. This unique path ensures that any new file added to your file storage is fetched into the dataset, keeping your data current and reflective of the latest update
Important Considerations
- Pre-existing Files: Files present in your file storage before completing steps 1-3 will not be inserted into the dataset. Only new files, post-integration, will trigger our platform to fetch the data.
- Schema and Sample Data: The schema and sample data for your dataset will be created only after the first data insertion. Prior to this, the Dataset object within SWE will remain empty.
- Loading Time: After uploading a file to your file storage, it immediately triggers the SWE platform to fetch the data. However, it may take a few seconds for the data to be added to the dataset. Therefore, after uploading, you may need to wait a few seconds or refresh the page to see the data in the dataset.
CSV File Format Requirements:
- Column Name Composition: Each column name can include letters (a-z, A-Z), numbers (0-9), or underscores (_). It must start with a letter or an underscore.
- Maximum Length: Column names cannot exceed 300 characters.
- Restricted Prefixes: Column names cannot begin with any of the following prefixes:
_TABLE_
_FILE_
_PARTITION
_ROW_TIMESTAMP
__ROOT__
_COLIDENTIFIER
- Uniqueness: Duplicate column names are not allowed, regardless of letter casing. For instance,
Column1
andcolumn1
are considered identical.
Steps to Follow
- Connect New Source - Understand and establish your Source in SWE.
- Create Dataset and Connect to a Source - Create your Dataset and ensure it's linked to your Source to enable seamless data updates.
- Automated Alerts for Ingestion Failures (Optional) - Optionally, set up policies on "failed data ingestion" events to receive notifications when files fail to upload.
With these steps, integrating your data with SWE becomes an efficient process, ensuring continuous data flow and up-to-date insights for optimal ML operations.
Updated 7 months ago