Week 6

Done

  • Have spent much of my time this week running models using pydra-ml on hbn dataset and running models using regression (L1,L2,elastic net) and classification (decision tree, adaboost, random forest) to predict CGAS Score (continuous measure of general cognitive functioning) and diagnosis, respectively. A simple demographic model (features: sex, age) is not successful at predicting either of these measures, although there is a significant relationship between diagnosis and sex. Language tasks do reasonably well at predicting whether a participant has been diagnosed with a disorder or not (compared to physiologic function for example). It also seems, tentatively, that language tasks do better at predicting certain disorders, notably autism, however, I haven’t done significant testing yet.
  • Technical challenges: I am doing the preprocessing step before I run pydra-ml, there were some challenges in passing ColumnTransformer to the pipeline separately for both numeric and categorical data. I also would like to incorporate another step in pydra-ml that allows for model comparison (features + classifiers). Currently, one filename is passed in (one set of features) + multiple classifiers and model comparison is done on one set of features.
  • Tried UMap in supervised mode to identify unique patterns in phenotypic data
  • Ironed out some bugs in SUITPy for isolation and segmentation and tested it on a small subset of the HBN (across developmental stages). Each participant takes ~ 25 minutes and adds < 1GB of storage. Have yet to extend this analysis to >10 participants of HBN, will need to talk to Dorota/Hoda about space allocation on OpenMind, perhaps will do batch analysis.
  • Interviewed UROP candidates for two positions, recruited one student from Wellesley to work on research methods, and have to decide between three great MIT candidates for a computational position.

To Do

  • questions to answer: what is the reliability in the KSADS assessment across child and parent measures? meant to get to this question during the week but didn’t have time.
  • Perform VBM to measure local changes in structural abnormalities, following methods steps implemented in this paper - correcting for brain size etc.

Questions

  • Is there anything that I should do in advance of the CHOP meeting on Monday?
  • What is the protocol for storing new analyses (SUITPy output) on >2000 participants? I was planning on analyzing n=30 from each subgroup of HBN (adhd, autism, control, mood, anxiety, intellectual etc) as a first step, but ultimately I will want to analyze the broader dataset and will need to run SUITPy on each participant.