University College London
Browse
DATASET
20240806_cleandata.csv (91.88 kB)
DATASET
raw_namelist.xlsx (36.95 kB)
1/0
2 files

Jyutping Project - Raw Data and Clean Data

dataset
posted on 2024-08-19, 10:47 authored by Joseph LamJoseph Lam

Raw and clean data for Jyutping project, submitted to International Journal of Epidemiology.

All data are openly available at the time of scrapping. I only retained Chinese Name and Hong Kong Government Romanised English Names.

This project aims to describe the problem of non-standardised romanisation and it's impact on data linkage. The included data allows researchers to replicate my process of extracting Jyutping and Pinyin from Chinese Characters. Quite a few of manual screening and reviewing was required, so the code itself was not fully automated. The codes are stored on my personal GitHub, https://github.com/Jo-Lam/Jyutping_project/tree/main.

Please cite this data resource: doi:10.5522/04/26504347


Funding

Efficient and transparent methods for linking and analysing longitudinal population studies and administrative data

Wellcome Trust

Find out more...

History

Usage metrics

    University College London

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC