The Public Health Bioinformatics Group at BCCDC is an interdisciplinary group of researchers with backgrounds in microbiology, molecular biology, computer science, cognitive science, and genomics. We are interested in solving practical publichealth problems through research and transforming public health practice using innovative technologies. Our research combines knowledge engineering techniques (e.g. ontology modeling, data curation, semantic web) and bioinformatics tools (e.g.genomic sequence analysis, phylogenetic, comparative genomics, text mining, workflow and platform development) to improve data sharing and integrated analysis in public health. We study microbes in isolation (genomics) and in communities(metagenomics) using high-throughput sequencing and other *omics techniques.

This leads to better disease diagnostics and intervention so public health practitioners can response to outbreaks more rapidly and effectively. Recognizing the One Health nature of many infectious diseases, we collaborate extensively withresearchers and practitioners from a wide range of disciplines in quantitative sciences, social sciences and biological sciences. Our research is funded through competitive grants and contracts from Genome Canada, Genome BC, CIHR, US Departmentof Agriculture, Welcome Trust, and CANARIE. Overall, our research program aims to improve understanding of the pathogens that make us sick and the microbiota that keep us healthy.


Genomic Epidemiology Ontology Mart

The Genomic Epidemiology Entity Mart (GEEM) aims to provide a faster way for system implementers to gain access to the vocabulary standards provided by ontology communities like – without requiring users to have training in ontology development. More information regarding GenEpio can be found at


FResearchers often require open-source software tools for cleaning up and harmonizing free-text specimen descriptions. However, there aren’t many options for them currently to choose from as the majority of the publically available textmining systems are focussed more on grammatically well-formed and longer texts. Hsiao’s group has developed a text mining system a “LexMapr” that mines the short free-text specimen descriptions and maps the detected entities to terms fromselected domain ontologies that provide the breadth of terms sought for sample description. LexMapr is an open source tool and the source code has been made available at and we intend to provide an API platform for LexMapr in the near future.

SeqUDAS: Sequence Upload and Data Archiving System

Modern DNA sequencing machines generate several gigabytes (GB) of data per run. Organizing and archiving this data presents a challenge for small labs. Hsiao’s has created a Sequence Upload and Data Archiving System (SeqUDAS) that aims toease the task of maintaining a sequence data repository through process automation and an intuitive web interface.


MentaLiST is a Multi-Locus Sequence Typing (MLST) caller, based on a k-mer counting algorithm and written in the Julia language, specificallydesigned and implemented to handle large typing schemes. Tests on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLSTschema with up to thousands of genes while requiring limited computational resources. MentaLiST includes built-in commands for downloading MLST schemes from remote servers such as and, and has been integrated into the web-based Galaxy workflowsystem and IRIDA.


GenEpio- The Genomic Epidemiology Ontology

To better harmonize and integrate genomics data into food microbiology and public health workflows, we have developed the Genomic Epidemiology Ontology (GenEpiO), which aims to provide a single, open-source, globally accessible set of terms to use in databases and software user interfaces. GenEpiO is being testedfor use in a number of different platforms and initiatives, such as Canada’s Integrated Rapid Infectious Disease Analysis (IRIDA) platform, the US FDA’s GenomeTrakr Foodborne Pathogen Surveillance Network, University of Warwick’s Enterobase sequence typing platform, and a new InternationalOrganization for Standardardization (ISO) standard for the implementation of WGS for food microbiology. The use of GenEpiO has also been included as part of best practices for the application of genomic data supporting regulatory food safety.

Our current projects focus on expanding GenEpiO vocabulary with regards to animal health and agricultural in order to better fit with a One Health approach to surveillance, analyses and investigations. In addition to foodborne pathogens,GenEpiO continues to expand its scope, incorporating vocabulary for other pathogens such as influenza and tuberculosis. The Hsiao’s lab is also working to create ontology-driven tools and metadata specifications (fields of information) tohelp harmonize, exchange and integrate data across sectors. More information regarding GenEpio can be found at (

FoodOn Food Ontology

FoodOn is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food which accurately and consistently describes foods commonly known in cultures from around the world. Through ourinvolvement with the project, Hsiao Lab found that a food vocabulary resource incompatibility problem existed between agencies seeking to share genomic and contextual information about foodborne pathogen biosamples – ranging fromCanada’s federal departments and provincial health authorities, and extending to international partners such as the US FDA, European EFSA, and WHO FAO. Our current work focuses on developing FoodOn to handle the vast majority of food sampledescriptions, and on creating the cross-referencing necessary for food sample data translation between European and north American health authorities. Â Partners like and agencies like the USDA ARS are exploring the use of FoodOnwithin their research projects and database systems, and we see an opportunity to serve commercial platforms with FoodOn’s “lingua franca”of food products.More information regarding GenEpio can be found at (

Bio banking Ontology

A biobank contains a collection of biological samples, along with associated medical information of sample donors, which can be used for different types of studies. Given the wealth of information that can be derived from stored informationand biological materials, there is a pressing need for structuring biobank data for more computer-amenable analyses. The utility of first generation biobanks was originally evaluated simply based on the number of samples that they contained.Currently, the value of biobank data lies in how it can linked with other molecular and clinical data (“-omics data”), to provide new insights into health and disease. Linking data has thus far, however, proven challenging due to unstructuredand incompatible data types. Here, we describe the development of a Next-Generation biobanking ontology (NGBO) ( that is capable of supporting both Biospecimen processing, management, storage and retrieval infrastructure, and acting as a knowledge hub for an integrated clinical and translational research ecosystem integrating –omics data. NGBOharmonizes the instrumentation and procedures used to prepare and process specimens, and also cover terminology used to describe computational biology algorithms, analytical tools, electronic-communication protocols, in vitro assays.Laboratories, investigators, and other biobanks would also benefit from the knowledge contained in the ontology, by the means of using NGBO a biobank data catalogue that can be used to map any existing unstructured data.


Avian Influenza

Avian Influenza (AI) is a viral disease that can cause significant morbidity and mortality in domestic poultry. Wild waterfowl are the reservoirs for AI and focus of AI surveillance programme around the world. This project focuses onrefining the AI sediment surveillance technology and methodology, to validate the sediment surveillance approach in the field, and to identify the optimal combination of AI surveillance techniques for maximum efficiency and efficacy. Theinformation generated in this project will be used to develop and implement a new Provincial Waterfowl AI Surveillance Program, with genomic analysis of wet sediments as the cornerstone of this program.