University College London
4 files

Giemsa Stained Thick Blood Films for Clinical Microscopy Malaria Diagnosis with Deep Neural Networks Dataset.

posted on 2020-05-01, 11:50 authored by Petru ManescuPetru Manescu, Mike ShawMike Shaw, Muna Elmi, Lydia Zajiczek, Remy Claveau, Vijay PawarVijay Pawar, Iason KokkinosIason Kokkinos, Gbeminiyi Oyinloye, Bendkowski Christopher, Olajide A. Oladejo, Bolanle Oladejo, Tristan Clark, Denis Timm, John Shawe-TaylorJohn Shawe-Taylor, Mandayam SrinivasanMandayam Srinivasan, Ikeoluwa Lagunju, Olugbemiro Sodeinde, Biobele J Brown, Delmiro Fernandez-ReyesDelmiro Fernandez-Reyes
By using this dataset you agree to cite:

Manescu, P., Shaw, M.J., Elmi, M., Neary‐Zajiczek, L., Claveau, R., Pawar, V., Kokkinos, I., Oyinloye, G., Bendkowski, C., Oladejo, O.A., Oladejo, B.F., Clark, T., Timm, D., Shawe‐Taylor, J., Srinivasan, M.A., Lagunju, I., Sodeinde, O., Brown, B.J. and Fernandez‐Reyes, D. (2020), Expert‐Level Automated Malaria Diagnosis on Routine Blood Films with Deep Neural Networks. Am J Hematol. Accepted Author Manuscript. doi:10.1002/ajh.25827

Dataset Licence: CC BY-NC-SA 4.0

Context: Thick Blood Films (TBF) remains the gold standard for diagnosing malaria in sub-Saharan regions. TBF relies on the availability of a trained human microscopist to visually inspect Giemsa stained blood smears under a light microscope to identify and count the P. falciparum parasites. This is time-consuming and subject to human error. A wrong diagnosis of malaria can have negative consequences for patients and for anti-malarial therapy resources. Over-treatment, in the long run, leads to parasite resistance.

Dataset Image acquisition: Image fields from Giemsa stained thick blood smears were captured using a upright brightfield microscope (Olympus BX63) fitted with a 100X/1.4NA objective lens, a motorised stage (Prior Scientific) and a colour camera (Edge 5.5c, PCO). Each image field covers an area of 166 μm x 142 μm (2560x2160 pixels). A z-stack comprising 14 focal planes with a separation of 0.5 μm was captured at each position with a camera exposure time of 50 ms. The z-stacks were projected onto a single plane using a wavelet-based extended depth of field algorithm.

Contains object-level annotations. The uncompressed folder contains a number (13) of subfolders [ooo] each containing a number of images named FieldPosXXX_EDOF_RGB.tiff and a annotation file named [ooo].json. A typical ooo.json would look like this:
"dataset_id": ooo,
"rois": [
"image_id": to be ignored,
"image_name": "FieldPosXXX_EDOF_RGB.tiff",
"roi": [
"height": 2048.2000000000007,
"shape": "rectangle",
"shape_id": to be ignored,
"type": CLASS TYPE,
"user_id": to be ignored,
"width": 1711.6000000000008,
"x": 632.0117647058846,
"y": 29.399999999999537
A separate field containing the annotations is created for each image. The annotations (bounding boxes) are defined by the upper left corner (“x”, “y”) and the size of the bounding box (“width”, “height”). To each bounding box a class is assigned in the “type” field. The following classes are present in the dataset:

Contain sample-level labels. Each contain a number of subfolders [ddmmyy-nn] each containing 100 digitised TBF FoV.

FILE: tbf_samples_parasite_count.csv
Contains the manual malaria parasites and white blood cell count (performed on the field) of each sample corresponding to the images in the subfolders above mentioned. A parasite count of 0 means the sample was diagnosed malaria negative.


This work was supported by the College of Medicine of the University of Ibadan, Ibadan, Nigeria; the UK Medical Research Council (MC_U117585869); Department of Computer Science, Faculty of Engineering Sciences of University College London, United Kingdom and UK Engineering and Physical Sciences Research Council (EP/P028608/1).


Usage metrics

    Department of Computer Science


    Ref. manager