Skip to main content

Vector Database

Vector database is a database for vector storage and retrieval, providing semantic search functionality. It handles vector storage, similarity retrieval, metadata filtering and collection management, supporting various metric algorithms such as cosine similarity, Euclidean distance, etc., providing data support for AI knowledge bases and semantic search.

The vector database element has a hierarchical structure of Meta (vectordbs.Meta) → Type (vectordbs.ChromaType) → Instance. Developers can quickly create vector database instance elements through JitAi's visual development tools.

Of course, developers can also create their own Type elements or modify the official vectordbs.ChromaType element provided by JitAi in their own App to implement their own encapsulation.

Supported Type Types:

Type ElementfullNameVector DatabaseDescription
ChromaTypevectordbs.ChromaTypeChromaLightweight vector database, supports persistent and HTTP modes, integrated by default

Quick Start

Creating Instance Elements

The following is a complete example of creating a Chroma vector database instance element:

Directory Structure

myapp/vectordbs/MyVectorDB/
├── e.json
└── config.json

e.json File

myapp/vectordbs/MyVectorDB/e.json
{
"title": "My Vector Database",
"type": "vectordbs.ChromaType",
"backendBundleEntry": "."
}

config.json File

myapp/vectordbs/MyVectorDB/config.json
{
"mode": "persistent",
"persist_directory": "./vector_data",
"timeout": 30
}

Usage Example

# Get vector database element
vector_db = app.getElement("vectordbs.MyVectorDB")

# Health check
health = vector_db.health_check()
print("Database status:", health['status'])

# Create collection
vector_db.create_collection(
name="documents",
metadata={"description": "Document vector collection"},
distance_metric="cosine"
)

# Add vector data
result = vector_db.add_vectors(
collection_name="documents",
embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
metadata=[{"title": "Document 1"}, {"title": "Document 2"}],
ids=["doc1", "doc2"]
)

Element Configuration

e.json Configuration

ParameterTypeRequiredDescription
titlestrYesInstance element display name
typestrYesPoints to Type element fullName, such as vectordbs.ChromaType
backendBundleEntrystrYesFixed as "."

config.json Configuration

General Configuration

ParameterTypeRequiredDefault ValueDescription
modestrNo"persistent"Connection mode: persistent/http
timeoutintNo30Connection timeout (seconds)

Persistent Mode Configuration

ParameterTypeRequiredDescription
persist_directorystrYesPersistent storage directory path

HTTP Mode Configuration

ParameterTypeRequiredDefault ValueDescription
hoststrNo"localhost"Database server address
portintNo8000Database server port
auth_tokenstrNo-Authentication token

Methods

health_check

Check the connection status and running health of the vector database, return basic database information.

Method Signature

def health_check(self) -> Dict[str, Any]

Parameter Details

No parameters required

Return Value

  • Type: Dict
  • Description: Dictionary containing database status information

Healthy Status Return Fields:

FieldTypeDescription
statusstrDatabase status: healthy/unhealthy
db_typestrDatabase type, fixed as "chroma"
modestrConnection mode: persistent/http
collections_countintNumber of collections
heartbeatintHeartbeat detection result
client_connectedboolClient connection status
versionstrChromaDB version information

Exception Status Return Fields:

FieldTypeDescription
statusstrFixed as "unhealthy"
errorstrError message description
db_typestrDatabase type, fixed as "chroma"
modestrConnection mode: persistent/http
client_connectedboolFixed as False

Usage Example

# Get vector database element
vector_db = app.getElement("vectordbs.MyVectorDB")

# Execute health check
health = vector_db.health_check()
print("Database status:", health['status'])
print("Collections count:", health.get('collections_count', 0))

create_collection

Create a new collection in the vector database for storing and managing related vector data.

Method Signature

def create_collection(self, name: str, metadata: Optional[Dict[str, Any]] = None, distance_metric: str = "cosine") -> Any

Parameter Details

ParameterTypeRequiredDefault ValueDescription
namestrYes-Collection name, must be unique
metadataDictNoNoneCollection metadata, optional description information
distance_metricstrNo"cosine"Distance metric algorithm

Supported distance_metric values:

  • cosine: Cosine similarity (recommended)
  • euclidean: Euclidean distance
  • l2: L2 norm distance
  • ip: Inner product distance

Return Value

  • Type: Any
  • Description: Created collection object

Usage Example

# Get vector database element
vector_db = app.getElement("vectordbs.MyVectorDB")

# Configure collection parameters
collection_name = "jitai_documents"
metadata = {"description": "JitAi framework demo collection"}

try:
# Create collection
vector_db.create_collection(
name=collection_name,
metadata=metadata,
distance_metric="cosine"
)
print("Collection created successfully:", collection_name)
except Exception as e:
if "already exists" in str(e):
print("Collection already exists:", collection_name)
else:
raise e

add_vectors

Store vector data and its metadata to the specified collection.

Method Signature

def add_vectors(self, collection_name: str, embeddings: List[List[float]], metadata: Optional[List[Dict[str, Any]]] = None, ids: Optional[List[str]] = None) -> Dict[str, Any]

Parameter Details

ParameterTypeRequiredDescription
collection_namestrYesTarget collection name
embeddingslistYesVector data list, each vector is a list of floats
metadatalistNoMetadata list corresponding to vectors, optional
idslistNoVector ID list, optional, auto-generated if not provided

Return Value

  • Type: Dict
  • Description: Dictionary containing addition result information
FieldTypeDescription
added_countintNumber of successfully added vectors
failed_countintNumber of failed vector additions
collection_namestrTarget collection name
vector_dimensionintVector dimension
batch_size_usedintBatch size used

Usage Example

# Get vector database element
vector_db = app.getElement("vectordbs.MyVectorDB")

# Configure vector data
collection_name = "jitai_documents"
sample_embeddings = [
[0.1, 0.2, 0.3, 0.4, 0.5], # Document 1 vector
[0.2, 0.3, 0.4, 0.5, 0.6], # Document 2 vector
[0.3, 0.4, 0.5, 0.6, 0.7], # Document 3 vector
]

# Configure metadata
sample_metadata = [
{"title": "JitAi Architecture Documentation", "category": "Technology", "source": "internal"},
{"title": "Vector Database Usage Guide", "category": "Tutorial", "source": "docs"},
{"title": "AI Assistant Development Practice", "category": "Technology", "source": "blog"},
]

# Configure IDs
sample_ids = ["jitai_doc_1", "jitai_doc_2", "jitai_doc_3"]

# Store vector data
result = vector_db.add_vectors(
collection_name=collection_name,
embeddings=sample_embeddings,
metadata=sample_metadata,
ids=sample_ids
)
print("Added vector count:", result['added_count'])

query_vectors

Search for the most similar vector data in the collection based on query vector.

Method Signature

def query_vectors(self, collection_name: str, query_embedding: List[float], n_results: int = 10, where: Optional[Dict[str, Any]] = None, include: Optional[List[str]] = None) -> Dict[str, Any]

Parameter Details

ParameterTypeRequiredDefault ValueDescription
collection_namestrYes-Target collection name
query_embeddinglistYes-Query vector, list of floats
n_resultsintNo10Number of results to return
whereDictNoNoneMetadata filter conditions, optional
includelistNo["metadatas", "distances"]Included return fields

Supported include values:

  • "metadatas": Include metadata
  • "distances": Include distance values
  • "embeddings": Include vector data

Return Value

  • Type: Dict
  • Description: Dictionary containing search results
FieldTypeDescription
resultslistSearch results list

Each item in results contains:

  • id: Vector ID
  • distance: Distance value
  • similarity: Similarity (0-1, 1 is most similar)
  • metadata: Metadata (if included)

Usage Example

# Get vector database element
vector_db = app.getElement("vectordbs.MyVectorDB")

# Configure query parameters
collection_name = "jitai_documents"
query_embedding = [0.15, 0.25, 0.35, 0.45, 0.55] # Query vector
where_condition = {"category": "Technology"} # Filter condition

# Execute similarity search
search_results = vector_db.query_vectors(
collection_name=collection_name,
query_embedding=query_embedding,
n_results=2,
where=where_condition,
include=["metadatas", "distances"]
)

# Process search results
print("Search results count:", len(search_results["results"]))
for item in search_results["results"]:
print("Document ID:", item['id'])
print("Similarity:", item.get('similarity', 'N/A'))
print("Title:", item.get('metadata', {}).get('title', 'N/A'))

Properties

None

Advanced Features

Adaptive Batch Processing

System automatically adjusts batch size based on vector dimensions to optimize storage performance:

  • 384 dimensions: batch size 2000
  • 768 dimensions: batch size 1000
  • 1536 dimensions: batch size 500
  • 2048+ dimensions: batch size 200

Distance Metric Algorithms

Supports multiple distance metric algorithms to meet different scenario requirements:

  • Cosine Similarity: Suitable for text semantic analysis, default recommendation
  • Euclidean Distance: Suitable for spatial data like geographic locations
  • L2 Norm: Suitable for high-dimensional data like image features
  • Inner Product: Suitable for recommendation systems and other scenarios

Intelligent Exception Handling

Built-in intelligent exception mapping mechanism, converts underlying database errors to unified error codes for easier problem diagnosis and handling.

Flexible Connection Mode Switching

Supports persistent mode and HTTP mode to meet different deployment requirements:

  • Persistent Mode: Suitable for single-machine deployment, local data storage
  • HTTP Mode: Suitable for distributed deployment, supports remote access and authentication
JitAI AssistantBeta
Powered by JitAI