Read¶
Read¶
Read
-
class
vectorai.api.read.
ViReadAPIClient
(username, api_key, url=None)¶ Read Operations
-
collection_stats
(collection_name: str)¶ Retrieves stats about a collection
Stats include: size, searches, number of documents, etc.
- Parameters
collection_name – Name of Collection
-
collection_schema
(collection_name: str)¶ Retrieves the schema of a collection
The schema of a collection can include types of: text, numeric, date, bool, etc.
- Parameters
collection_name – Name of Collection
-
id
(collection_name: str, document_id: str, include_vector: bool = True)¶ Look up a document by its id
- Parameters
document_id – ID of a document
include_vector – Include vectors in the search results
collection_name – Name of Collection
-
bulk_id
(collection_name: str, document_ids: List[str])¶ Look up multiple document by their ids
- Parameters
document_ids – IDs of documents
include_vector – Include vectors in the search results
collection_name – Name of Collection
-
retrieve_documents
(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [])¶ Retrieve some documents
Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.
- Parameters
include_fields – Fields to include in the document, if empty list [] then all is returned
cursor – Cursor to paginate the document retrieval
page_size – Size of each page of results
sort – Fields to sort the documents by
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
collection_name – Name of Collection
-
random_documents
(collection_name: str, page_size: int = 20, seed: int = 10, include_vector: bool = True)¶ Retrieve some documents randomly
Mainly for testing purposes.
- Parameters
seed – Random Seed for retrieving random documents.
page_size – Size of each page of results
include_vector – Include vectors in the search results
collection_name – Name of Collection
-
id_lookup_joined
(join_query: dict, doc_id: str)¶ Look up a document by its id with joins
- Parameters
join_query –
.
doc_id – ID of a Document
-
aggregate
(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False)¶ Aggregate a collection
Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:
{ "groupby" : [ {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"}, {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"} ], "metrics" : [ {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"} ] }
“groupby” is the fields you want to split the data into. These are the available groupby types:
category” : groupby a field that is a category
“metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:
average”, “max”, “min”, “sum”, “cardinality”
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc – Whether to sort results by ascending or descending order
-
facets
(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False)¶ Retrieve the facets of a collection
Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.
- Parameters
facets_fields – Fields to include in the facets, if [] then all
date_interval – Interval for date facets
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
collection_name – Name of Collection
-
filters
(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False)¶ Filters a collection
Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.
The filters query is a json body that follows the schema of:
[ {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"}, {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90}, ]
These are the available filter_type types:
1. "contains": for filtering documents that contains a string. {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]} 2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]} 3. "categories": for filtering documents that contains any of a category from a list of categories. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]} 4. "exists": for filtering documents that contains a field. {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "} 5. "date": for filtering date by date range. {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"} 6. "numeric": for filtering by numeric range. {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}
These are the available conditions:
“==”, “!=”, “>=”, “>”, “<”, “<=”
- Parameters
collection_name – Name of Collection
filters – Query for filtering the search results
page – Page of the results
page_size – Size of each page of results
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
-
job_status
(collection_name: str, job_id: str, job_name: str)¶ Get status of a job. Whether its starting, running, failed or finished.
- Parameters
job_id –
.
job_name –
.
collection_name – Name of Collection
-
list_jobs
(collection_name: str)¶ Get history of jobs
List and get a history of all the jobs and its job_id, parameters, start time, etc.
- Parameters
collection_name – Name of Collection
-
Reads Operations designed for python
-
class
vectorai.read.
ViReadClient
(username, api_key, url=None)¶ -
random_aggregation_query
(collection_name: str, groupby: int = 1, metrics: int = 1)¶ Generates a random filter query.
- Parameters
collection_name – name of collection
groupby – The number of groupbys to randomly generate
metrics – The number of metrics to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
-
search
(collection_name: str, vector: List, field: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector=False, include_count=True)¶ Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search
Enables machine learning search with vector search. Search with a vector for the most similar vectors.
For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):
Query person's characteristics as a vector: [180, 40, 70] representing [height, age, weight] Search Results: [ {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]}, {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]}, ...]
- Parameters
vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
-
random_filter_query
(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)¶ Generates a random filter query.
- Parameters
collection_name – name of collection
text_filters – The number of text filters to randomly generate
numeric_filters – The number of numeric filters to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
-
head
(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)¶ The main Vi client with most of the available read and write methods available to it.
- Parameters
collection_name – The name of your collection
page_size – The number of results to return
return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.head(collection_name, page_size=10)
-
retrieve_all_documents
(collection_name: str, sort_by: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [])¶ Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.
- Parameters
collection_name – Name of collection.
sort_by – Select the fields by which to sort by.
asc – If true, returns in ascending order of what is sort.
include_vector – If true, includes _vector_ fields to return them.
include_fields – Adjust which fields are returned.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> all_documents = vi_client.retrieve_all_documents(collection_name)
-
wait_till_jobs_complete
(collection_name: str, job_id: str, job_name: str)¶ Wait until a specific job is complete.
- Parameters
collection_name – Name of collection.
job_id – ID of the job.
job_name – Name of the job.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2) >>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)
-
check_schema
(collection_name: str)¶ Check the schema of a given collection.
- Parameters
collection_name – Name of collection.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.check_schema(collection_name)
-
list_collections
() → List[str]¶ List Collections
- Parameters
username –
Username api_key:
Api Key, you can request it from request_api_key
- Returns
List of collections
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> doc = {'items': {'chicken': 'fried'}, 'food_vector_': [0, 1, 2]} >>> vi_client._check_schema(doc)
-