Streaming Deployments

JFrog ML Documentation

Products
JFrog ML
Content Type
User Guide

Streaming deployments enable you to connect Kafka streams with your models for real-time inference. Leveraging this architecture simplifies the process of processing large amounts of distributed data by eliminating the need for complex triggering and scheduling as fresh data arrives.

Streaming Use Cases

Consider using streaming deployments in the following scenarios:

  1. Event-driven applications: Respond to specific events or triggers in real-time.

  2. Real-time decision-making: Process data 'as it arrives' for time-sensitive systems.

  3. Scalable processing: Efficiently handle large data volumes from multiple sources.

Deploying Streaming Models

To use streaming inference on JFrog ML, use the following examples:

Option 1: Deploying via UI

  1. Build a new model.

  2. Click Deploy and choose Streaming.

  3. Choose the instance type and configure the Kafka server and topics.

    Note

    Make sure you select the number of replicas and instance type to achieve a balance with the number of partitions from your Kafka broker. The partitions will be divided across all the replicas from your deployment.

  4. Click Deploy to make your model live.

Option 2: Deploying via CLI

You can also deploy your streaming model using the command line interface (CLI). Here’s a general command structure:

frogml models deploy stream \
    --model-id <model-id> \
    --build-id <build-id> \
    --instance <instance-type> \
    --replicas <number-of-replicas> \
    --consumer-bootstrap-server <bootstrap-server> \
    --consumer-topic <consumer-topic-name> \
    --producer-bootstrap-server <bootstrap-server> \
    --producer-topic <producer-topic-name>

Example Command 1 (using the above structure): Deploying a FLAN-T5 streaming model with 2 replicas on 2 A10.XL on-demand GPU instances

# Deploy a FLAN-T5 streaming model on 2 A10.XL on-demand instances
frogml models deploy stream \
    --model-id "flan-t5" \
    --build-id "0511ffa9-986e-4fd9-ae29-515269af5bac" \
    --instance "gpu.a10.xl" \
    --replicas 2 \
    --purchase-option "on-demand" \ # Use only on demand instances for executors
    --consumer-bootstrap-server <bootstrap-server> \
    --consumer-topic <consumer-topic-name> \
    --producer-bootstrap-server <bootstrap-server> \
    --producer-topic <producer-topic-name>

Example Command 2 (using the above structure): Deploying a churn streaming model on 2 small instances

# Deploy a churn streaming model on 2 small instances
frogml models deploy stream \
    --model-id "stream-churn-model" \
    --build-id "d4d56c1a-f326-40a0-88f3-f8e59c691a3e" \
    --instance "small" \
    --replicas 2 \
    --consumer-bootstrap-server "10.0.0.8" \
    --consumer-topic "model-input-topic" \
    --producer-bootstrap-server "10.0.0.9" \
    --producer-topic "model-output-topic"

Note

To use the above examples, do not forget to replace the parameter values with your own values.

Option 3: Configuring Streaming Models via YAML

You can also configure your streaming model with a YAML file.

Example

build_id: "d4d56c1a-f326-40a0-88f3-f8e59c691a3e"
model_id: "stream-churn-model"
resources:
  instance_size: "small"
  replicas: 2
stream:
  consumer_bootstrap_server: "10.0.0.8"
  consumer_topic: "model-input-topic"
  producer_bootstrap_server: "10.0.0.9"
  producer_topic: "model-output-topic"

Then run the deployment command:

frogml models deploy stream --from-file stream_deploy_config.yaml

Mixing Command Line and Configuration File

Additionally, you can mix and match between command line arguments and configuration file. In this case, command line arguments will take precedence and override the values specified in the YAML file.

Example Command

frogml models deploy stream --replicas 5 --from-file stream_deploy_config.yaml

Monitoring Your Streaming Deployment

JFrog ML models run on Kubernetes, where we automatically install advanced production-grade monitoring tools. This setup provides a robust platform for deploying and managing your models efficiently.

Under the Model Overview tab, you can view the following metrics for your deployed build:

  • Average message throughput: Evaluates how many messages are being processed over a specific period.

  • Total consumed messages: Tracks the total number of messages your model has consumed over a given minute.

  • Total produced messages: Measures the total number of output messages.

  • Consumer lag: Measures the number of messages (requests) that have been produced but remain unconsumed in a consumer group at a given time. High consumer lag indicates that the rate at which messages are being consumed is slower than the rate at which new messages are being generated. This can signal performance issues.

  • Error rate over time: Monitors any issues encountered processing messages.

  • Memory, CPU and GPU utilization: Analyzes resource usage to optimize performance and scaling.

  • Processing lag: . Shows the difference between the number of responses produced and the number of requests consumed by the model runtime in a given minute.

    • A positive value indicates that more messages are produced than consumed.

    • 0 indicates that more messages are consumed than are produced, or they are equal.

    • A stable or decreasing chart indicates stable performance.

streaming-models-dashboard.png

Configuring Streaming Models Overview

JFrog ML streaming deployments enable granular control through multiple consumer and producer specifications. Below is an overview of the key parameters and configurations you can set to tailor your streaming deployment.

CONFIGURATION

DESCRIPTION

Consumer Configuration

Settings relating to how your model consumes messages from Kafka.

Producer Configuration

Settings for how your model produces messages to Kafka.

Custom Configurations

Use environment variables to customize consumer and producer settings for advanced control.

Consumer Configuration

PARAMETER

DESCRIPTION

Consumer Bootstrap Servers (Required)

A list of Kafka bootstrap serverS to consume messages for model inference.

Consumer Topic*

The Kafka topic from which the model will consume messages (for model inference).

Consumer Group*

The name of the consumer group the model inference is attached to.

Group ID

Identifies a group of consumers that belong to the same consumer group. (Each message in a partition is delivered to only one consumer within the same group, enabling load balancing and parallel processing.)

Auto offset reset

Defines behavior when a consumer first joins a group or cannot find a valid offset. (For example, when the offset does not exist, or has been deleted.) Defines whether the consumer should start reading messages from the earliest available offset ("earliest") or from the latest offset ("latest") in the topic.

Default: Earliest

Consumer timeout

Maximum time a consumer can be inactive before being considered dead or no longer part of the consumer group. If a consumer does not send a heartbeat to Kafka within this timeout period, it may be removed from the group, and its partitions will be reassigned to other consumers.

Default: 60000 ms

Max batch

Maximum number of records the consumer will attempt to fetch in a single poll request to Kafka. This can affect throughput and latency.

Default: 1

Max polling latency

Maximum time the consumer will wait for new records to be available in a poll request (for new records to be available in the topic). If no new records are available within this time limit, the poll request will return empty.

Default: 1000 ms

Offset Behavior in Redeployments

When you redeploy a streaming model using the same consumer group, the new deployment resumes consumption from the last committed offset for that group.

It does not start consuming messages based on the deployment timestamp. This ensures message continuity—no data is skipped or reprocessed unnecessarily.

If the consumer group does not have a previously committed offset (for example, on the first deployment), the consumer starts from the offset defined by auto.offset.reset:

  • earliest: consumes all available messages from the beginning of the topic.

  • latest: consumes only new messages that arrive after deployment.

* Required

Producer Configuration

PARAMETER

DESCRIPTION

Producer Bootstrap servers*

A list of Kafka bootstrap servers to produce model inference outputs.

Producer Topic*

The Kafka topic to which the model will produce messages.

Compression type

Determines how messages are compressed before being stored in Kafka.

Default: uncompressed

Compressing messages can significantly reduce the amount of data transmitted over the network and storage requirements in Kafka. Options:

Compression Type

Description

Best Use

None (none)

No compression applied.

When message size isn’t important, or data is already compressed or optimized.

Gzip (gzip)

Compresses messages with the gzip algorithm.

Provides a good balance between compression ratio and CPU overhead.

Snappy (snappy)

Compresses messages with the Snappy algorithm.

Faster than gzip but with a lower compression ratio.

LZ4 (lz4)

Compresses messages with the LZ4 algorithm.

Fastest option; slightly lower compression ratio than gzip or Snappy.

* Required

Custom Producer and Consumer Configurations

In a streaming inference deployment, you can customize Kafka consumer and producer configurations via environment variables. Use the following prefixes:

  • kafka.consumer for consumer attributes

  • kafka.producer for producer attributes

For example, to modify the bootstrap.servers configuration for a producer, use the environment variable:

kafka.producer.bootstrap.servers

Example Command for SASL Authentication

If your Kafka topics require SASL authentication, you can provide the necessary username and password as environment variables when deploying the streaming model. The following example shows how to configure both the consumer and producer connections with SASL:

frogml models deploy stream --env-vars \
  kafka.consumer.sasl.mechanism=PLAIN \
  kafka.consumer.security_protocol=SASL_SSL \
  kafka.consumer.sasl_plain_username=<username> \
  kafka.consumer.sasl_plain_password=<password>
  kafka.producer.sasl.mechanism=PLAIN \
  kafka.producer.security_protocol=SASL_SSL \
  kafka.producer.sasl_plain_username=<username> \
  kafka.producer.sasl_plain_password=<password>

Note

These environment variables are only required if your Kafka server is configured to use SASL authentication. Replace the placeholders <username> and <password> with your actual credentials.

Summary

Configuring your streaming deployments with an understanding of replicas, partitions, and their parameters will help you achieve efficient and scalable real-time inference for your applications. Tailor these settings to meet your specific needs for optimal performance and reliability.