Annotation Guidelines

This repository contains scripts currently been used to develop the New OTAR3088 NER pipeline

Annotation Guidelines

Entity Type Definitions for NLP Recognition

“tissue”

Definition: Supra-cellular anatomical entities (types of parts of the body and body substances, anatomical boundaries and spatial regions) from organisms within the Metazoa taxon (NCBITaxon:33208).

Anatomical entity references


“cell type”

Definition: In vivo cell types that constitute the anatomical entities of Metazoan organisms (NCBITaxon:33208) and can be classified as a subclass of CL:0000000 cell.

Entities in this category encompass any classification categories of the physical entities that biologists call cells. You can categorise cells based on size, shape, histological staining, tissue origin, developmental stage, etc. Cell states in the sense of cells being at a particular stage of a biological process including the expression of a specific set of genes (transcriptional state) can be also used for categorising cells, thus they are regarded as subcategories of cell types for entity recognition purposes. When labelling entities, include adjectives, descriptors in the text span referring to an entity that can be seen as part of a descriptive phrase that identifies a set of cells and/or distinguishes a set of cells from other similar cells.

Cell type references


“cell line”

Definition: Immortalised cultured cell lines (CLO:0000019) derived from Metazoan organisms (NCBITaxon:33208). These are cultured cells (CL:0000010) that are also immortal cell line cells. When labelling entities, include adjectives, descriptors in the text span referring to an entity that can be seen as part of a descriptive phrase that identifies a set of cell lines and/or distinguishes a set of cell lines from other similar cell lines.

Examples of identified text spans of this type include: “MCF-7”, “HeLa”, “HepG2”

Cell line references


Vague or ambiguous entities

Descriptors of terms extra to those defined above, deemed too vague to classify as a true entity type “tissue” or “cell type”. These descriptions aid us in understanding our annotation logic and allow for the production of more ML-usable data.