Document import using Python SDK

Document import without bulk

Short example how to import data into a dataset. In this example we don’t use bulk import.

import slamby_sdk
from slamby_sdk.rest import ApiException
import uuid

client = slamby_sdk.ApiClient("https://europe.slamby.com/demo/")
client.set_default_header("Authorization", "Slamby s3cr3t")
client.set_default_header("X-DataSet", "demo")

document = {
              "id":str(uuid.uuid4()),
              "title":"demo",
              "desc":"description",
              "tags":[]
            }

try:
    slamby_sdk.DocumentApi(client).create_document(document=document)
except ApiException as e:
    print(e)

Bulk import without parallel processing

In this example you can see how to import data into your dataset using bulk import.

import slamby_sdk
from slamby_sdk.rest import ApiException
import uuid

client = slamby_sdk.ApiClient("https://europe.slamby.com/demo/")
client.set_default_header("Authorization", "Slamby s3cr3t")
client.set_default_header("X-DataSet", "demo")


documents = {
     "documents":
         [
            {
              "id":str(uuid.uuid4()),
              "title":"demo",
              "desc":"description",
              "tags":[]
            },
            {
              "id":str(uuid.uuid4()),
              "title":"demo",
              "desc":"description",
              "tags":[]
            },
            {
              "id":str(uuid.uuid4()),
              "title":"demo",
              "desc":"description",
              "tags":[]
            }
        ]
 }

try:
    slamby_sdk.DocumentApi(client).bulk_documents(settings=documents)
except ApiException as e:
    print(e)

Single import with parallel processing

In this example we combine single import process with parallel processing. For parallel processing we use Parallel from joblib.

You can even combine parallel processing with bulk import as well.

import slamby_sdk
from slamby_sdk.rest import ApiException
import csv
from joblib import Parallel, delayed

client = slamby_sdk.ApiClient("https://europe.slamby.com/demo/")
client.set_default_header("Authorization", "Slamby s3cr3t")
client.set_default_header("X-DataSet", "demo")

def addDocument(document):
    try:
        slamby_sdk.DocumentApi(client).create_document(document=document)
    except ApiException as e:
        print(e)

if __name__ == '__main__':
    with open('ads.csv', 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        num_cores = 4
        Parallel(n_jobs=num_cores)(delayed(addDocument)(document) for document in reader)

Leave a Reply

Your email address will not be published. Required fields are marked *