Educational Workshop


Big Data Phenotyping: Opportunities, Analytic Challenges, and Solutions

The IGES 2016 educational workshop is a joint initiative between IGES, the Dalla Lana School of Public Health, and the Institute for Clinical Evaluative Sciences to offer a stimulating and timely full-day event on big data phenotyping.

Date: October 24, 2016
Time: 9 am to 5 pm, followed by a reception at 7 pm
Location: Sheraton Centre Toronto, Canada

Registration fee: $50 USD (includes breakfast, lunch, break and mixer reception)

Program: draft-program

A ‘must-attend’ event if you want to learn about:

  • the wide variety of routinely collected data that can be used to derive health and clinical information;
  • the basics of computational linguistics and how these techniques can be applied to bodies of text;
  • the fundamentals of using machine learning to derive phenotypes from large complex health data sources;
  • deriving phenotypic information from unstructured Electronic Medical and Health Records (EMR/EHRs);
  • finding additional cases that match phenotypic profiles of rare and unnamed diseases;
  • existing software and platforms to get the work done!

Sign up now!

Learning objectives: Participants will learn about the fundamentals of novel computational and statistical methodologies, and key tools and resources, to best define phenotypes to answer important health research questions.

Format: Two internationally recognized keynote speakers will set the stage by presenting their views on how statistical and computational technologies can help define and derive phenotypes from big health data to contribute to breakthroughs in precision medicine and improve population health. Building on the local strength in computational and statistical sciences for health research, instructors from the University of Toronto community will give a series of short presentations on the concepts and fundamental principles underlying these methods and tools.

Program details and a complete list of speakers is coming soon! Please, check the website regularly for updates!

Speaker Biographies

rob tibshiraniRobert Tibshirani is a Professor in the Departments of Statistics and Health Research and Policy at Stanford University. He received a B.Math. from the University of Waterloo, an M.Sc. from the University of Toronto and a Ph.D. from Stanford University. He was a Professor at the University of Toronto from 1985 to 1998. Professor Tibshirani is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics and the Royal Society of Canada. He won the prestigious COPSS Presidents’ award in 1996, the NSERC Steacie award in 1997, the CRM-SSC Prize in Statistics in 2000, and the University of Waterloo distinguished alumni achievement award in 2006. He was elected to the U.S National Academy of Sciences in 2012. In his work he has made important contributions to the analysis of complex datasets, most recently in genomics and proteomics. Some of his most well-known contributions are the lasso, which uses L1 penalization in regression and related problems, generalized additive models and Significance Analysis of Microarrays (SAM). He also co-authored five widely used books: “Generalized Additive Models”, “An Introduction to the Bootstrap”, “The Elements of Statistical Learning”, “Introduction to Statistical Learning, and “Sparsity in Statistics”. Professor Tibshirani also co-authored the first study that linked cell phone usage with car accidents, a widely cited article that has played a role in the introduction of legislation that restricts the use of phones while driving.

Quaid_Morris_2Quaid Morris ( is a professor in the Donnelly Centre at the University of Toronto in Canada. He is a multi-disciplinary researcher with cross-appointments in the Departments of Computer Science, Engineering, and Molecular Genetics. He founded his lab in 2005 and after having received his PhD from the Massachusetts Institute of Technology (MIT) in 2003. His doctoral training was in machine learning and computational neuroscience under the supervision of Peter Dayan at M.I.T. and the Gatsby Unit at University College London. His undergraduate training was in computer science and biology at the University of Toronto. His lab uses statistical learning to make biological discoveries and develop new methodology for analyzing large-scale biomedical datasets. His lab is currently interested in cancer genomics, post-transcriptional regulation, text mining of medical records and the automated prediction of gene function (see


BrudnoDr. Michael Brudno is the Director of the Centre for Computational Medicine at SickKids, and an Associate Professor in the Department of Computer Science at the University of Toronto. After receiving his Bachelor’s Degree in Computer Science and History from the University of California, Berkeley, he continued his studies and completed his PhD from the Computer Science Department of Stanford University. Stanford is where he worked on algorithms for whole genome alignments. He went on to complete his postdoctoral fellowship at UC Berkeley and was a Visiting Scientist at MIT before starting his position here in Toronto. Dr. Brudno’s main research interest is the development of computational methods for the analysis of clinical and genomic datasets, and the analysis of High Throughput Sequencing data, including methods for the discovery of structural and copy-number polymorphisms and other genomic variation, identification of functional variants, and visualization of genomic data. In addition, he works on comparative genomics, molecular evolution and cloud computing. PhenoTips, PhenomeCentral, Medsavant and Savant are among the genomic and clinically-relevant tools developed by his team.


GutmanDr. Astrid Guttmann is Chief Science Officer at the Institute for Clinical Evaluative Sciences. She is a staff physician in the Division of Paediatric Medicine at the Hospital for Sick Children and an associate professor of Paediatrics with a cross appointment in the Institute for Health Policy, Management and Evaluation at the University of Toronto. She received her undergraduate degree at Harvard University, a second BA at Oxford University where she was a Rhodes Scholar. She holds a CIHR Applied Chair in Reproductive, Child and Youth Health Services Research. She sits on the Advisory Board for CIHR’s Institute of Human Development, Child and Youth Health. Her key research interests include using large data sets to investigate: access to and quality of care for children, performance measurement for child health services and system integration for children with chronic medical and mental health conditions. At ICES she has led a program to create comprehensive data sets to enable study of child development using administrative data.


Graeme_Hirst-27Professor Graeme Hirst is a computational scientist at the University of Toronto. His research covers a range of topics in applied computational linguistics and natural language processing, including lexical semantics, the resolution of ambiguity in text, the analysis of authors’ styles in literature and other text (including plagiarism detection and the detection of online sexual predators), and the automatic analysis of arguments and discourse (especially in political and parliamentary texts). Hirst’s present research includes detecting markers of Alzheimer’s disease in language; determining ideology in political texts; and the identification of the native language of a second-language writer of English. With colleagues in Canada, the U.K. and The Netherlands, he is a co-PI of a Digging Into Data grant on processing linked parliamentary data.

Hirst is the editor of the Synthesis series of books on Human Language Technologies. He is the author of two monographs: Anaphora in Natural Language Understanding and Semantic Interpretation and the Resolution of Ambiguity. He is the recipient of two awards for excellence in teaching. He has supervised more than 50 theses and dissertations, four of which have been published as books. He was elected Chair of the North American Chapter of the Association for Computational Linguistics for 2004–05 and Treasurer of the Association for 2008–17.

Dr. Karen Tu is a Senior Scientist at the Institute for Clinical Evaluative Sciences, a Professor in the Department of Family and Community Medicine at University of Toronto, with a cross appointment in the Institute of Health Policy, Management and Evaluation at the University of Toronto and a staff family physician at the University Health Network-Toronto Western Hospital Family Health Team. She received her MD from the University of Toronto and her MSc in Health Policy, Planning and Financing in a joint degree from the London School of Economics and the London School of Tropical Medicine. She is the founder of the Electronic Medical Record Administrative data Linked Database (EMRALD) and has unique insights into the analysis electronic medical record (EMR) data. She has extensive experience and expertise in the use of primary care EMR data, administrative data and the validation of administrative data and EMR data algorithms for the identification of common chronic diseases. She is one of the leading Canadian primary care researchers in the areas of hypertension and primary care electronic medical records and also is experienced in the feedback of information to family physicians and the measurement of primary care physician adherence to guidelines.

DSC_2042 (2)Frank Rudzicz is a scientist at the Toronto Rehabilitation Institute (University Health Network), an assistant professor of Computer Science at the University of Toronto, co-founder and President of WinterLight Labs Inc. and President of the international joint ACL/ISCA special interest group on Speech and Language Processing for Assistive Technologies. He is the recent recipient of the Young Invimage003estigator award from the Alzheimer’s Society of Canada and the Early Researcher award from the Government of Ontario. His work involves machine-learning, human-computer interaction, speech-language pathology, rehabilitation engineering, signal processing, and linguistics. Significant contributions include: i) the TORGO database of disordered speech, ii) the first speech recognition system for people with speech disorders that models physical speech articulation, iii) subsequent communication aid software that modifies hard-to-understand speech signals to be more understandable to the typical listener, iv) design of the speech interaction for hitchBOT, the hitch-hiking robot, and LUDWIG, the caregiver robot, and v) state-of-the-art machine learning software that can assess cognitive disorders, such as Alzheimer’s disease, by analyzing short samples of speech. Dr. Rudzicz is currently commercializing several of these contributions.

StGeorge-Hyslop Dr. Peter St George-Hyslop’s research has been primarily directed toward elucidating the mechanisms causing human neurodegenerative disease, with research into the molecular basis for Alzheimer Disease constituting the main focus of this work. His laboratory discovered that Alzheimer’s disease is etiologically heterogeneous, an observation that subsequently had a profound effect on the design of both clinical and basic research paradigms on this disease. His laboratory led the discovery of multiple genes associated with AD, including presenilin 1, presenilin 2, nicastrin, and SORL1. He collaborated on the discovery of three other AD genes: APP with J. Gusella, APOE with A. Roses and TREM2 with J. Hardy. More recently, as a member of the Alzheimer’s Disease Genetics Consortium, International Genomics of Alzheimer’s Program (IGAP) he used Genome Wide Association Study (GWAS) methods to identify at least 20 new genes associated with late onset AD.

Organizing Committee

  • David Henry (Chair), ICES and DLSPH
  • Laura Rosella, DLSPH and ICES
  • France Gagnon, DLSPH and IGES

Workshop Sponsors

ICES logo