D3database

The D3database module provides a python interface for users to interact with the D3TaLES database. D3TaLES uses a MongoDB No-SQL structure, so some base modules can be adapted to any MongoDB database. This module includes applications to for data insertion, schema validation, basic data generation, database queries, and Python-based interaction with the D3TaLES REST API.

Full documentation can be found here.

Basic Data Generation

The GenerateMolInfo module can be used to generate several 2D molecular descriptors from a SMILES string input. The descriptors can be accessed as attributes of the resulting class or accessed as a dictionary with the mol_info_dict attribute. These descriptors include smiles, selfies, inchi, inchi_key, iupac_name, synonyms, init_structure (coordinates for estimated 3D geometry), molecular_formula, groundState_charge, number_of_atoms, molecular_weight, d2_image (bit string for molecule image), source_group, and groundState_spin.

from d3tales_api.D3database.info_from_smiles import GenerateMolInfo

# Generate basic molecule information using the GenerateMolInfo module
smiles = "CC"
instance = GenerateMolInfo(smiles, database="frontend")
print(instance.mol_info_dict)

Data Insertion

The following example show the insertion of generated molecule information (see above) into the frontend database. Note that information will be inserted into the database specified in the DB_INFO_FILE.

from d3tales_api.D3database.d3database import FrontDB
from d3tales_api.D3database.info_from_smiles import GenerateMolInfo

# Generate basic molecule information using the GenerateMolInfo module
smiles = "CC"
instance = GenerateMolInfo(smiles, database="frontend").mol_info_dict

# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(schema_layer='mol_info', instance=instance, smiles=smiles, group="Non-D3TaLES")

Schema Validation

The D3database modules automatically validate instances with the appropriate schema.

from d3tales_api.D3database.d3database import FrontDB

# Create instance with an erroneous field
instance = {"groundState_charge": "string"}

# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(_id="05XICU", schema_layer='mol_info', instance=instance)

Here, the code tries to insert a string instead of an integer for the attribute groundState_charge. However, this violates the frontend database schema. Because the module automatically validates the schema when and _id is present, this should result in an error something like this:

Failed validating 'type' in schema['properties']['mol_info']['properties']['groundState_charge']:
    {'description': 'Charge of the ground state molecule', 'type': 'number'}

On instance['mol_info']['groundState_charge']:
    'string'

Alternatively, this should not produce an error:

from d3tales_api.D3database.d3database import FrontDB

# Create instance with an erroneous field
instance = {"groundState_charge": 0}

# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(_id="05XICU", schema_layer='mol_info', instance=instance)

Database Queries

One useful function for database queries is the FrondDB method check_if_in_db. This will check if the SMILES string associated with the FrondDB already exists in the database. If it exists, the method will return the molecule ID, if not it will return False.

from d3tales_api.D3database.d3database import FrontDB

smiles = "CC"
FrontDB(smiles=smiles).check_if_in_db()

This should return 06PCFL if the DB_INFO_FILE contains the database information for the D3TaLES database. If not, the result will vary depending on the database.

This API can also be used for more general MongoDB queries. The following example show how a user might query the backend computational database for calculations with the mol_id of 05XICU. The query can also be filtered to return only calculation types.

from d3tales_api.D3database.d3database import BackDB

# Query all computational entries with the mol_id 05XICU
BackDB(collection_name="computation").make_query({"mol_id": "05XICU"})

# Query all computational entries with the mol_id 05XICU and return only calculation_types
BackDB(collection_name="computation").make_query({"mol_id": "05XICU"}, {"calculation_type": 1})

D3TaLES REST API

The RESTAPI class may be used to Pythonically access the D3TaLES REST API. This is useful for searching and scraping the D3TaLES database on a larger scale.

Documentation for the REST API syntax and URL interaction with the D3TaLES REST API can be found here.

The following examples show how one might (1) get 200 of the SMILES from the D3TaLES database or (2) fnd the ground state charge for the molecule with ID 05XICU.

from d3tales_api.D3database.restapi import RESTAPI

# Get 200 of the SMILES from the D<sup>3</sup>TaLES
endpoint="restapi/molecules/{}/mol_info.smiles=1/limit=200/"
response_1 = RESTAPI(method='get', endpoint=endpoint, url="https://d3tales.as.uky.edu", return_json=True).response

# Fnd the ground state charge for the molecule with ID 05XICU
endpoint="restapi/molecules/_id=05XICU/mol_info.groundState_charge=1"
response_2 = RESTAPI(method='get', endpoint=endpoint, url="https://d3tales.as.uky.edu", return_json=True).response

Alternatively, you can use the D3talesData class, which produces a pandas DataFrame.

from d3tales_api.D3database.restapi import D3talesData

# Create data collector object 
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Get 10 oxidation potentials 
data_collector.get_prop_data('mol_characterization.oxidation_potential.0.value', limit=10)

This class will also gather all oxidation_potential properties in the database and produce a one-dimensional histogram of the resulting values. Note USERNAME and PASSWORD should be the user’s username and password.

from d3tales_api.D3database.restapi import D3talesData

# Create data collector object 
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Plot 1 dimensional histogram
data_collector.hist_1d('mol_characterization.oxidation_potential.0.value', min_cutoff=-10, max_cutoff=10)

Likewise, this example will gather all molecular_weight and associated globular_volume properties in the database and produce a two-dimensional histogram of the resulting values.

from d3tales_api.D3database.restapi import D3talesData

# Create data collector object 
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Plot 1 dimensional histogram
data_collector.hist_2d("species_characterization.groundState.globular_volume.0.value", "mol_info.molecular_weight")

There is also a Colab notebook with examples of how to access the D3TaLES data with these moduels: Open In Colab

Note that the D3TaLES REST API is user restricted. This means that you must have a D3TaLES website account and user permission to access this tool. When using the D3TaLES API RESTAPI class, you must include your D3TaLES website username and password as keyword arguments. Alternatively, you may establish your username and password as the environment variables UPLOAD_USER UPLOAD_USER, respectively. The RESTAPI class will automatically check these environment variables if no keyword arguments are provided.

If you do not have a D3TaLES website account, you may create one here.

If you have an account but do not have permission to access the REST API, you may request permission here.