|
@@ -0,0 +1,310 @@
|
|
|
+## Custom Integration of Pre-defined Models
|
|
|
+
|
|
|
+### Introduction
|
|
|
+
|
|
|
+After completing the vendors integration, the next step is to connect the vendor's models. To illustrate the entire connection process, we will use Xinference as an example to demonstrate a complete vendor integration.
|
|
|
+
|
|
|
+It is important to note that for custom models, each model connection requires a complete vendor credential.
|
|
|
+
|
|
|
+Unlike pre-defined models, a custom vendor integration always includes the following two parameters, which do not need to be defined in the vendor YAML file.
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+As mentioned earlier, vendors do not need to implement validate_provider_credential. The runtime will automatically call the corresponding model layer's validate_credentials to validate the credentials based on the model type and name selected by the user.
|
|
|
+
|
|
|
+### Writing the Vendor YAML
|
|
|
+
|
|
|
+First, we need to identify the types of models supported by the vendor we are integrating.
|
|
|
+
|
|
|
+Currently supported model types are as follows:
|
|
|
+
|
|
|
+- `llm` Text Generation Models
|
|
|
+
|
|
|
+- `text_embedding` Text Embedding Models
|
|
|
+
|
|
|
+- `rerank` Rerank Models
|
|
|
+
|
|
|
+- `speech2text` Speech-to-Text
|
|
|
+
|
|
|
+- `tts` Text-to-Speech
|
|
|
+
|
|
|
+- `moderation` Moderation
|
|
|
+
|
|
|
+Xinference supports LLM, Text Embedding, and Rerank. So we will start by writing xinference.yaml.
|
|
|
+
|
|
|
+```yaml
|
|
|
+provider: xinference #Define the vendor identifier
|
|
|
+label: # Vendor display name, supports both en_US (English) and zh_Hans (Simplified Chinese). If zh_Hans is not set, it will use en_US by default.
|
|
|
+ en_US: Xorbits Inference
|
|
|
+icon_small: # Small icon, refer to other vendors' icons stored in the _assets directory within the vendor implementation directory; follows the same language policy as the label
|
|
|
+ en_US: icon_s_en.svg
|
|
|
+icon_large: # Large icon
|
|
|
+ en_US: icon_l_en.svg
|
|
|
+help: # Help information
|
|
|
+ title:
|
|
|
+ en_US: How to deploy Xinference
|
|
|
+ zh_Hans: 如何部署 Xinference
|
|
|
+ url:
|
|
|
+ en_US: https://github.com/xorbitsai/inference
|
|
|
+supported_model_types: # Supported model types. Xinference supports LLM, Text Embedding, and Rerank
|
|
|
+- llm
|
|
|
+- text-embedding
|
|
|
+- rerank
|
|
|
+configurate_methods: # Since Xinference is a locally deployed vendor with no predefined models, users need to deploy whatever models they need according to Xinference documentation. Thus, it only supports custom models.
|
|
|
+- customizable-model
|
|
|
+provider_credential_schema:
|
|
|
+ credential_form_schemas:
|
|
|
+```
|
|
|
+
|
|
|
+
|
|
|
+Then, we need to determine what credentials are required to define a model in Xinference.
|
|
|
+
|
|
|
+- Since it supports three different types of models, we need to specify the model_type to denote the model type. Here is how we can define it:
|
|
|
+
|
|
|
+```yaml
|
|
|
+provider_credential_schema:
|
|
|
+ credential_form_schemas:
|
|
|
+ - variable: model_type
|
|
|
+ type: select
|
|
|
+ label:
|
|
|
+ en_US: Model type
|
|
|
+ zh_Hans: 模型类型
|
|
|
+ required: true
|
|
|
+ options:
|
|
|
+ - value: text-generation
|
|
|
+ label:
|
|
|
+ en_US: Language Model
|
|
|
+ zh_Hans: 语言模型
|
|
|
+ - value: embeddings
|
|
|
+ label:
|
|
|
+ en_US: Text Embedding
|
|
|
+ - value: reranking
|
|
|
+ label:
|
|
|
+ en_US: Rerank
|
|
|
+```
|
|
|
+
|
|
|
+- Next, each model has its own model_name, so we need to define that here:
|
|
|
+
|
|
|
+```yaml
|
|
|
+ - variable: model_name
|
|
|
+ type: text-input
|
|
|
+ label:
|
|
|
+ en_US: Model name
|
|
|
+ zh_Hans: 模型名称
|
|
|
+ required: true
|
|
|
+ placeholder:
|
|
|
+ zh_Hans: 填写模型名称
|
|
|
+ en_US: Input model name
|
|
|
+```
|
|
|
+
|
|
|
+- Specify the Xinference local deployment address:
|
|
|
+
|
|
|
+```yaml
|
|
|
+ - variable: server_url
|
|
|
+ label:
|
|
|
+ zh_Hans: 服务器URL
|
|
|
+ en_US: Server url
|
|
|
+ type: text-input
|
|
|
+ required: true
|
|
|
+ placeholder:
|
|
|
+ zh_Hans: 在此输入Xinference的服务器地址,如 https://example.com/xxx
|
|
|
+ en_US: Enter the url of your Xinference, for example https://example.com/xxx
|
|
|
+```
|
|
|
+
|
|
|
+- Each model has a unique model_uid, so we also need to define that here:
|
|
|
+
|
|
|
+```yaml
|
|
|
+ - variable: model_uid
|
|
|
+ label:
|
|
|
+ zh_Hans: 模型UID
|
|
|
+ en_US: Model uid
|
|
|
+ type: text-input
|
|
|
+ required: true
|
|
|
+ placeholder:
|
|
|
+ zh_Hans: 在此输入您的Model UID
|
|
|
+ en_US: Enter the model uid
|
|
|
+```
|
|
|
+
|
|
|
+Now, we have completed the basic definition of the vendor.
|
|
|
+
|
|
|
+### Writing the Model Code
|
|
|
+
|
|
|
+Next, let's take the `llm` type as an example and write `xinference.llm.llm.py`.
|
|
|
+
|
|
|
+In `llm.py`, create a Xinference LLM class, we name it `XinferenceAILargeLanguageModel` (this can be arbitrary), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods:
|
|
|
+
|
|
|
+- LLM Invocation
|
|
|
+
|
|
|
+Implement the core method for LLM invocation, supporting both stream and synchronous responses.
|
|
|
+
|
|
|
+```python
|
|
|
+def _invoke(self, model: str, credentials: dict,
|
|
|
+ prompt_messages: list[PromptMessage], model_parameters: dict,
|
|
|
+ tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
|
|
|
+ stream: bool = True, user: Optional[str] = None) \
|
|
|
+ -> Union[LLMResult, Generator]:
|
|
|
+ """
|
|
|
+ Invoke large language model
|
|
|
+
|
|
|
+ :param model: model name
|
|
|
+ :param credentials: model credentials
|
|
|
+ :param prompt_messages: prompt messages
|
|
|
+ :param model_parameters: model parameters
|
|
|
+ :param tools: tools for tool usage
|
|
|
+ :param stop: stop words
|
|
|
+ :param stream: is the response a stream
|
|
|
+ :param user: unique user id
|
|
|
+ :return: full response or stream response chunk generator result
|
|
|
+ """
|
|
|
+```
|
|
|
+
|
|
|
+When implementing, ensure to use two functions to return data separately for synchronous and stream responses. This is important because Python treats functions containing the `yield` keyword as generator functions, mandating them to return `Generator` types. Here’s an example (note that the example uses simplified parameters; in real implementation, use the parameter list as defined above):
|
|
|
+
|
|
|
+```python
|
|
|
+def _invoke(self, stream: bool, **kwargs) \
|
|
|
+ -> Union[LLMResult, Generator]:
|
|
|
+ if stream:
|
|
|
+ return self._handle_stream_response(**kwargs)
|
|
|
+ return self._handle_sync_response(**kwargs)
|
|
|
+
|
|
|
+def _handle_stream_response(self, **kwargs) -> Generator:
|
|
|
+ for chunk in response:
|
|
|
+ yield chunk
|
|
|
+def _handle_sync_response(self, **kwargs) -> LLMResult:
|
|
|
+ return LLMResult(**response)
|
|
|
+```
|
|
|
+
|
|
|
+- Pre-compute Input Tokens
|
|
|
+
|
|
|
+If the model does not provide an interface for pre-computing tokens, you can return 0 directly.
|
|
|
+
|
|
|
+```python
|
|
|
+def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],tools: Optional[list[PromptMessageTool]] = None) -> int:
|
|
|
+ """
|
|
|
+ Get number of tokens for given prompt messages
|
|
|
+
|
|
|
+ :param model: model name
|
|
|
+ :param credentials: model credentials
|
|
|
+ :param prompt_messages: prompt messages
|
|
|
+ :param tools: tools for tool usage
|
|
|
+ :return: token count
|
|
|
+ """
|
|
|
+```
|
|
|
+
|
|
|
+
|
|
|
+Sometimes, you might not want to return 0 directly. In such cases, you can use `self._get_num_tokens_by_gpt2(text: str)` to get pre-computed tokens. This method is provided by the `AIModel` base class, and it uses GPT2's Tokenizer for calculation. However, it should be noted that this is only a substitute and may not be fully accurate.
|
|
|
+
|
|
|
+- Model Credentials Validation
|
|
|
+
|
|
|
+Similar to vendor credentials validation, this method validates individual model credentials.
|
|
|
+
|
|
|
+```python
|
|
|
+def validate_credentials(self, model: str, credentials: dict) -> None:
|
|
|
+ """
|
|
|
+ Validate model credentials
|
|
|
+
|
|
|
+ :param model: model name
|
|
|
+ :param credentials: model credentials
|
|
|
+ :return: None
|
|
|
+ """
|
|
|
+```
|
|
|
+
|
|
|
+- Model Parameter Schema
|
|
|
+
|
|
|
+Unlike custom types, since the YAML file does not define which parameters a model supports, we need to dynamically generate the model parameter schema.
|
|
|
+
|
|
|
+For instance, Xinference supports `max_tokens`, `temperature`, and `top_p` parameters.
|
|
|
+
|
|
|
+However, some vendors may support different parameters for different models. For example, the `OpenLLM` vendor supports `top_k`, but not all models provided by this vendor support `top_k`. Let's say model A supports `top_k` but model B does not. In such cases, we need to dynamically generate the model parameter schema, as illustrated below:
|
|
|
+
|
|
|
+```python
|
|
|
+ def get_customizable_model_schema(self, model: str, credentials: dict) -> AIModelEntity | None:
|
|
|
+ """
|
|
|
+ used to define customizable model schema
|
|
|
+ """
|
|
|
+ rules = [
|
|
|
+ ParameterRule(
|
|
|
+ name='temperature', type=ParameterType.FLOAT,
|
|
|
+ use_template='temperature',
|
|
|
+ label=I18nObject(
|
|
|
+ zh_Hans='温度', en_US='Temperature'
|
|
|
+ )
|
|
|
+ ),
|
|
|
+ ParameterRule(
|
|
|
+ name='top_p', type=ParameterType.FLOAT,
|
|
|
+ use_template='top_p',
|
|
|
+ label=I18nObject(
|
|
|
+ zh_Hans='Top P', en_US='Top P'
|
|
|
+ )
|
|
|
+ ),
|
|
|
+ ParameterRule(
|
|
|
+ name='max_tokens', type=ParameterType.INT,
|
|
|
+ use_template='max_tokens',
|
|
|
+ min=1,
|
|
|
+ default=512,
|
|
|
+ label=I18nObject(
|
|
|
+ zh_Hans='最大生成长度', en_US='Max Tokens'
|
|
|
+ )
|
|
|
+ )
|
|
|
+ ]
|
|
|
+
|
|
|
+ # if model is A, add top_k to rules
|
|
|
+ if model == 'A':
|
|
|
+ rules.append(
|
|
|
+ ParameterRule(
|
|
|
+ name='top_k', type=ParameterType.INT,
|
|
|
+ use_template='top_k',
|
|
|
+ min=1,
|
|
|
+ default=50,
|
|
|
+ label=I18nObject(
|
|
|
+ zh_Hans='Top K', en_US='Top K'
|
|
|
+ )
|
|
|
+ )
|
|
|
+ )
|
|
|
+
|
|
|
+ """
|
|
|
+ some NOT IMPORTANT code here
|
|
|
+ """
|
|
|
+
|
|
|
+ entity = AIModelEntity(
|
|
|
+ model=model,
|
|
|
+ label=I18nObject(
|
|
|
+ en_US=model
|
|
|
+ ),
|
|
|
+ fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
|
|
|
+ model_type=model_type,
|
|
|
+ model_properties={
|
|
|
+ ModelPropertyKey.MODE: ModelType.LLM,
|
|
|
+ },
|
|
|
+ parameter_rules=rules
|
|
|
+ )
|
|
|
+
|
|
|
+ return entity
|
|
|
+```
|
|
|
+
|
|
|
+- Exception Error Mapping
|
|
|
+
|
|
|
+When a model invocation error occurs, it should be mapped to the runtime's specified `InvokeError` type, enabling Dify to handle different errors appropriately.
|
|
|
+
|
|
|
+Runtime Errors:
|
|
|
+
|
|
|
+- `InvokeConnectionError` Connection error during invocation
|
|
|
+- `InvokeServerUnavailableError` Service provider unavailable
|
|
|
+- `InvokeRateLimitError` Rate limit reached
|
|
|
+- `InvokeAuthorizationError` Authorization failure
|
|
|
+- `InvokeBadRequestError` Invalid request parameters
|
|
|
+
|
|
|
+```python
|
|
|
+ @property
|
|
|
+ def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
|
|
|
+ """
|
|
|
+ Map model invoke error to unified error
|
|
|
+ The key is the error type thrown to the caller
|
|
|
+ The value is the error type thrown by the model,
|
|
|
+ which needs to be converted into a unified error type for the caller.
|
|
|
+
|
|
|
+ :return: Invoke error mapping
|
|
|
+ """
|
|
|
+```
|
|
|
+
|
|
|
+For interface method details, see: [Interfaces](./interfaces.md). For specific implementations, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py).
|