# Processors
This module contains data parsers to process both computational and experimental
raw data files. The D3TaLES parser then converts the processed data to the
D3TaLES schema. It also contains a submodule for converting data from the backend
D3TaLES database to frontend data and pushing the new data to the frontend
database.
Full documentation can be found [here](d3tales_api.Processors.html).
### Current Parsers:
* Molecular DFT
* Gaussian logfiles
* Psi4 logfiles (in development)
* Cyclic Voltammetry
* Output file from `chi_660d, chi_1100b, pine_wavenow` instruments
* UV/Vis Spectroscopy (in development)
* Excell files
* Literature Articles
* Generate article metadata from DOI
## Processing Molecular DFT
For this example, we must get an example log file. Here, we pull a [Gaussian](https://gaussian.com/)
log file from GitHub [D3TaLES API repo](https://github.com/D3TaLES/d3tales_api).
```python
import shutil
from urllib import request
# Pull example file form GitHub
dft_url = "https://raw.githubusercontent.com/D3TaLES/d3tales_api/main/tests/raw_data/05XICU/logs/05XICU_opt_groundState_3H0.log"
with request.urlopen(dft_url) as response, open("gaussian_ex.log", 'wb') as out_file:
shutil.copyfileobj(response, out_file)
```
To parse the DFT file, simply import the `d3tales_parser` module and use the `ProcessDFT`
module to parse the data file (in this case, `gaussian_ex.log`). Here we chose the `ProcessGausLog`
parsing class because our data file is Gaussian format.
```python
from d3tales_api.Processors.d3tales_parser import *
dft_data = ProcessDFT(filepath="gaussian_ex.log", parsing_class=ProcessGausLog)
print(dft_data.data_dict)
```
A user can also include `submission_info` and `metadata` in the processing.
```python
from d3tales_api.Processors.d3tales_parser import *
submission_info = {
"source" : "Risko",
"author" : "d3tales@gmail.com",
"author_email" : "d3tales@gmail.com",
"upload_time" : "2021-10-01T21:07:29.546377+00:00",
"file_type" : "zip",
"data_category" : "computation",
"data_type" : "gaussian",
"all_files_in_zip" : [
"opt_groundState.log",
"opt_groundState.fchk"
],
"approved" : True
}
metadata = {
"id" : "opt_groundState",
"calculation_type" : "tddft_cation1"
}
dft_data = ProcessDFT(filepath="gaussian_ex.log", submission_info=submission_info, metadata=metadata, parsing_class=ProcessGausLog)
print(dft_data.data_dict)
```
## Processing Cyclic Voltmeter
For this example, we must get an example CV file. Here, we pull a CV text file from GitHub
[D3TaLES API repo](https://github.com/D3TaLES/d3tales_api).
```python
import shutil
from urllib import request
# Pull example file form GitHub
cv_url = "https://raw.githubusercontent.com/D3TaLES/d3tales_api/main/tests/raw_data/cv_test.csv"
with request.urlopen(cv_url) as response, open("cv_test.txt", 'wb') as out_file:
shutil.copyfileobj(response, out_file)
```
To parse the CV file, simply import the `d3tales_parser` module and use the `ProcessCV`
module to parse the data file (in this case, `cv_test.txt`). Here we chose the `ParseChiCV`
parsing class because of the format of our data file.
```python
from d3tales_api.Processors.d3tales_parser import *
cv_data = ProcessCV(filepath="cv_test.txt", _id='test', parsing_class=ParseChiCV)
print(cv_data.data_dict)
```
A user can also include `submission_info` and `metadata` in the processing.
```python
from d3tales_api.Processors.d3tales_parser import *
submission_info = {
"source" : "Risko",
"author" : "d3tales@gmail.com",
"author_email" : "d3tales@gmail.com",
"upload_time" : "2022-04-14T22:11:23.490652+00:00",
"file_type" : "csv",
"data_category" : "experimentation",
"data_type" : "cv",
"approved" : False
}
metadata = {
"electrode_counter" : "standard_hydrogen_electrode",
"electrode_working" : "standard_hydrogen_electrode",
"electrode_reference" : "standard_hydrogen_electrode",
"solvent" : ["acetonitrile"],
"electrolyte" : ["tetrabutylammonium hexafluorophosphate"],
"ionic_liquid" : [],
"instrument" : "chi_660d,_chi_1100b,_pine_wavenow",
"working_electrode_surface_area" : "0.05 cm^2",
"temperature" : "273 K",
"redox_mol_concentration" : "0.1 M",
"data_type" : "cv"
}
cv_data = ProcessCV(filepath="cv_test.txt", _id='test', submission_info=submission_info, metadata=metadata, parsing_class=ParseChiCV)
print(cv_data.data_dict)
```
## Processing Literature Articles
Here we generate metadata for an article based on its DOI. To do this, we
simply import the `d3tales_parser` module and use the `ProcessNlp`
module to generate the article metadata.
```python
from d3tales_api.Processors.d3tales_parser import *
doi = "10.1039/c7cs00569e"
nlp_data = ProcessNlp(doi)
print(nlp_data.data_dict)
```
We can also instruct the parser to download the article PDF with the `article_download` kwarg and
store it in a directory specified with the `download_dir` kwarg. The parser will record the PDF
location in the output data.
```python
from d3tales_api.Processors.d3tales_parser import *
doi = "10.1039/c7cs00569e"
nlp_data = ProcessNlp(doi, article_download=True, download_dir="temp/")
print(nlp_data.data_dict)
```