Malaria parasites DNA barcode geography classification

Section 1: Use Case Identifiers

Use Case ID: HHS-CDC-00023

Agency: HHS

Op Div/Staff Div: CDC

Use Case Topic Area: Health & Medical

Is the AI use case found in the below list of general commercial AI products and services?

None of the above.

Describe the AI system’s outputs.

Algorithm examines a sequence barcode/genotype (our use case is for malaria, but it can be applied to other contexts) and it assigns the malaria parasite genotype to a geographic origin, which may be a continent or some other geographic regional category. While this use case has the barcode as a set of genetic loci from malaria parasites, and the categories are geographic (i.e., continents, subregions), the algorithm could be trained to assign genetic barcodes from other pathogens to any number of different categories assuming the appropriate training dataset exists.

Stage of Development: Implementation and Assessment

Is the AI use case rights-impacting, safety-impacting, both, or neither?

Neither

Section 2: Use Case Summary

Date Initiated: 07/2023

Date when Acquisition and/or Development began: 07/2023

Date Implemented: N/A

Date Retired: N/A

Was the AI system involved in this use case developed (or is it to be developed) under contract(s) or in-house?

Developed in-house.

Provide the Procurement Instrument Identifier(s) (PIID) of the contract(s) used.

N/A

Is this AI use case supporting a High-Impact Service Provider (HISP) public-facing service?

N/A

Does this AI use case disseminate information to the public?

No

How is the agency ensuring compliance with Information Quality Act guidelines, if applicable?

N/A

Does this AI use case involve personally identifiable information (PII) that is maintained by the agency?

No

Has the Senior Agency Official for Privacy (SAOP) assessed the privacy risks associated with this AI use case?

ongoing

Section 3: Data and Code

Do you have access to an enterprise data catalog or agency-wide data repository that enables you to identify whether or not the necessary datasets exist and are ready to develop your use case?

No

Describe any agency-owned data used to train, fine-tune, and/or evaluate performance of the model(s) used in this use case.

Data used are a mixture of data generated at CDC and other data available publicly. All CDC data are accessible via the following links: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA428490/, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1092573/, https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1110244. Travel histories from case patients were used to assess performance of the model to compare the generated output of the model and are available in the following manuscript: https://journals.asm.org/doi/full/10.1128/aac.01203-24. Non-CDC owned data are available through this link: https://apps.malariagen.net/apps/pf7/

Is there available documentation for the model training and evaluation data that demonstrates the degree to which it is appropriate to be used in analysis or for making predictions?

Documentation is widely available

Which, if any, demographic variables does the AI use case explicitly use as model features?

N/A

Does this project include custom-developed code?

Yes

Does the agency have access to the code associated with the AI use case?

If the code is open-source, provide the link for the publicly available source code.

N/A

Section 4: AI Enablement and Infrastructure

Does this AI use case have an associated Authority to Operate (ATO) for an AI system?

Yes

System Name: Advanced Molecular Detection Scientific Computing Platform

How long have you waited for the necessary developer tools to implement the AI use case?

6-12 months

For this AI use case, is the required IT infrastructure provisioned via a centralized intake form or process inside the agency?

Yes

Do you have a process in place to request access to computing resources for model training and development of the AI involved in this use case?

Yes

Has communication regarding the provisioning of your requested resources been timely?

Yes

How are existing data science tools, libraries, data products, and internally-developed AI infrastructure being re-used for the current AI use case?

Use of existing data platforms

Has information regarding the AI use case, including performance metrics and intended use of the model, been made available for review and feedback within the agency?

Documentation has been published