HomeGuidesAPI ReferenceRelease notes
Log In
Guides

Connecting

Integrating your model data with SWE is a simple process meant to improve your ML operations. Here are the necessary steps to input your model's data into our platform from different stages of your data pipeline. Once connected, the data is systematically arranged into datasets for easy querying, allowing you to derive insights from all aspects of your operation.

How to Connect Data to Superwise

Currently, SWE supports data integration via file storage services, specifically GCS and AWS. Any file added to the file storage will automatically trigger our platform to fetch the data immediately. Here's how to set it up:

  1. Create a Source: A 'Source' in SWE is the object that establishes a connection to your file storage. This step provides an entry point for your data to be uploaded into the platform. It's primarily a one-time task where you configure permissions to grant access to the bucket.
  2. Create a Dataset: After setting up a Source, you need to create a Dataset. This object will hold your data within the SWE platform. You can query the data source and build Dashboards or set up monitoring policies based on it.
  3. Connect the Dataset to the Source: Establish a connection between the Dataset and the Source by matching the specific path for each dataset. This unique path ensures that any new file added to your file storage is fetched into the dataset, keeping your data current and reflective of the latest update

Important Considerations

  • Pre-existing Files: Files present in your file storage before completing steps 1-3 will not be inserted into the dataset. Only new files, post-integration, will trigger our platform to fetch the data.
  • Schema and Sample Data: The schema and sample data for your dataset will be created only after the first data insertion. Prior to this, the Dataset object within SWE will remain empty.
  • Loading Time: After uploading a file to your file storage, it immediately triggers the SWE platform to fetch the data. However, it may take a few seconds for the data to be added to the dataset. Therefore, after uploading, you may need to wait a few seconds or refresh the page to see the data in the dataset.

🚧

CSV File Format Requirements:

  • Column Name Composition: Each column name can include letters (a-z, A-Z), numbers (0-9), or underscores (_). It must start with a letter or an underscore.
  • Maximum Length: Column names cannot exceed 300 characters.
  • Restricted Prefixes: Column names cannot begin with any of the following prefixes:
    • _TABLE_
    • _FILE_
    • _PARTITION
    • _ROW_TIMESTAMP
    • __ROOT__
    • _COLIDENTIFIER
  • Uniqueness: Duplicate column names are not allowed, regardless of letter casing. For instance, Column1 and column1 are considered identical.

Steps to Follow

With these steps, integrating your data with SWE becomes an efficient process, ensuring continuous data flow and up-to-date insights for optimal ML operations.