# D3database
The D3database module provides a python interface for users to interact with the D3TaLES
database. D3TaLES uses a [MongoDB](https://www.mongodb.com/) No-SQL structure, so some base modules can be adapted to any MongoDB database. This module includes applications to
for data insertion, schema validation, basic data generation, database queries, and
Python-based interaction with the [D3TaLES REST API](https://d3tales.as.uky.edu/docs/restapi.html).
Full documentation can be found [here](d3tales_api.D3database.html).
### Basic Data Generation
The `GenerateMolInfo` module can be used to generate several 2D molecular descriptors
from a SMILES string input. The descriptors can be accessed as attributes of the resulting
class or accessed as a dictionary with the `mol_info_dict` attribute. These descriptors include
`smiles`, `selfies`, `inchi`, `inchi_key`, `iupac_name`, `synonyms`, `init_structure` (coordinates
for estimated 3D geometry), `molecular_formula`, `groundState_charge`, `number_of_atoms`,
`molecular_weight`, `d2_image` (bit string for molecule image), `source_group`, and `groundState_spin`.
```python
from d3tales_api.D3database.info_from_smiles import GenerateMolInfo
# Generate basic molecule information using the GenerateMolInfo module
smiles = "CC"
instance = GenerateMolInfo(smiles, database="frontend")
print(instance.mol_info_dict)
```
## Data Insertion
The following example show the insertion of generated molecule information (see above)
into the frontend database. Note that information will be inserted into the database
specified in the `DB_INFO_FILE`.
```python
from d3tales_api.D3database.d3database import FrontDB
from d3tales_api.D3database.info_from_smiles import GenerateMolInfo
# Generate basic molecule information using the GenerateMolInfo module
smiles = "CC"
instance = GenerateMolInfo(smiles, database="frontend").mol_info_dict
# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(schema_layer='mol_info', instance=instance, smiles=smiles, group="Non-D3TaLES")
```
### Schema Validation
The D3database modules automatically validate instances with the [appropriate schema](https://github.com/D3TaLES/schema).
```python
from d3tales_api.D3database.d3database import FrontDB
# Create instance with an erroneous field
instance = {"groundState_charge": "string"}
# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(_id="05XICU", schema_layer='mol_info', instance=instance)
```
Here, the code tries to insert a `string` instead of an `integer` for the attribute `groundState_charge`.
However, this violates the frontend database schema. Because the module automatically validates
the schema when and `_id` is present, this should result in an error something like this:
```
Failed validating 'type' in schema['properties']['mol_info']['properties']['groundState_charge']:
{'description': 'Charge of the ground state molecule', 'type': 'number'}
On instance['mol_info']['groundState_charge']:
'string'
```
Alternatively, this should not produce an error:
```python
from d3tales_api.D3database.d3database import FrontDB
# Create instance with an erroneous field
instance = {"groundState_charge": 0}
# Insert basic molecule information into the Frontend database
db_insertion = FrontDB(_id="05XICU", schema_layer='mol_info', instance=instance)
```
## Database Queries
One useful function for database queries is the `FrondDB` method `check_if_in_db`. This
will check if the SMILES string associated with the `FrondDB` already exists in the database.
If it exists, the method will return the molecule ID, if not it will return `False`.
```python
from d3tales_api.D3database.d3database import FrontDB
smiles = "CC"
FrontDB(smiles=smiles).check_if_in_db()
```
This should return `06PCFL` if the `DB_INFO_FILE` contains the database information for the
D3TaLES database. If not, the result will vary depending on the database.
This API can also be used for more general MongoDB queries. The following example show
how a user might query the backend computational database for calculations with the mol_id
of `05XICU`. The query can also be filtered to return only calculation types.
```python
from d3tales_api.D3database.d3database import BackDB
# Query all computational entries with the mol_id 05XICU
BackDB(collection_name="computation").make_query({"mol_id": "05XICU"})
# Query all computational entries with the mol_id 05XICU and return only calculation_types
BackDB(collection_name="computation").make_query({"mol_id": "05XICU"}, {"calculation_type": 1})
```
### D3TaLES REST API
The `RESTAPI` class may be used to Pythonically access the D3TaLES REST API. This is
useful for searching and scraping the D3TaLES database on a larger scale.
Documentation for the REST API syntax and URL interaction with the D3TaLES REST API can be found
[here](https://d3tales.as.uky.edu/docs/restapi.html).
The following examples show how one might (1) get 200 of the SMILES from the D3TaLES
database or (2) fnd the ground state charge for the molecule with ID `05XICU`.
```python
from d3tales_api.D3database.restapi import RESTAPI
# Get 200 of the SMILES from the D3TaLES
endpoint="restapi/molecules/{}/mol_info.smiles=1/limit=200/"
response_1 = RESTAPI(method='get', endpoint=endpoint, url="https://d3tales.as.uky.edu", return_json=True).response
# Fnd the ground state charge for the molecule with ID 05XICU
endpoint="restapi/molecules/_id=05XICU/mol_info.groundState_charge=1"
response_2 = RESTAPI(method='get', endpoint=endpoint, url="https://d3tales.as.uky.edu", return_json=True).response
```
Alternatively, you can use the `D3talesData` class, which produces a pandas DataFrame.
```python
from d3tales_api.D3database.restapi import D3talesData
# Create data collector object
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Get 10 oxidation potentials
data_collector.get_prop_data('mol_characterization.oxidation_potential.0.value', limit=10)
```
This class will also gather all `oxidation_potential` properties in the database and produce a one-dimensional
histogram of the resulting values. Note `USERNAME` and `PASSWORD` should be the user's username and password.
```python
from d3tales_api.D3database.restapi import D3talesData
# Create data collector object
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Plot 1 dimensional histogram
data_collector.hist_1d('mol_characterization.oxidation_potential.0.value', min_cutoff=-10, max_cutoff=10)
```
Likewise, this example will gather all `molecular_weight` and associated `globular_volume` properties in the
database and produce a two-dimensional histogram of the resulting values.
```python
from d3tales_api.D3database.restapi import D3talesData
# Create data collector object
data_collector = D3talesData(username='USERNAME', password='PASSWORD')
# Plot 1 dimensional histogram
data_collector.hist_2d("species_characterization.groundState.globular_volume.0.value", "mol_info.molecular_weight")
```
There is also a Colab notebook with examples of how to access the D3TaLES data with these moduels:
[](https://colab.research.google.com/drive/1oK1hOZs0rTpc_SoSFg54qQA5U4Qekqu8?usp=sharing)
Note that the D3TaLES REST API is user restricted. This means that you must have a
D3TaLES website account and user permission to access this tool. When using the
D3TaLES API `RESTAPI` class, you must include your D3TaLES website
username and password as keyword arguments. Alternatively, you may establish your
username and password as the environment variables `UPLOAD_USER` `UPLOAD_USER`, respectively.
The `RESTAPI` class will automatically check these environment variables if no keyword
arguments are provided.
If you do not have a D3TaLES website account, you may create one
[here](https://d3tales.as.uky.edu/register/).
If you have an account but do not have permission to access the REST API, you
may request permission [here](https://d3tales.as.uky.edu/request-permission/).