BMI206: Statistical Methods

OVERVIEWSYLLABUSREADING –  LABS – TUTORIALSPROJECT

Free Training Courses

  • UC Berkeley DLab
  • Gladstone Workshops (also in person on campus)
  • The UC Berkeley Data Science 100 introductory course
  • UCSF Wynton Cluster

  • Shared public data on the UCSF Wynton computing cluster:
    /wynton/group/databases
    /wynton/group/datasets
  • If you do not already have access to the Wynton cluster, access can be arranged for students in this class. Contact Dr. Pollard to get an account.
  • Online Resources

    These tutorials and other websites may help you to configure your laptop for the activities in the course and to brush up your programming and data analytical skills.

  • R resources:
    Introduction to R: resources for using the R programming language.
    RStudio is a fantastic IDE for R. Some selling points: code completion, variable explorer, graphics window, help function viewer.
    Performing statistical analyses in R.
    Tutorials for statistical genetics in R.
    Vector memory errors
  • Python resources:
    Google introductory course on Python course
    Computer Science Online Python for Programmers
    Writing python like poetry
  • General programming tips:
    LearnXinY
    UCSF Library
  • Cytoscape resources:
    Cytoscape tutorials for the user interface
    How to automate Cytoscape via R or Python
    RCy3 paper
    Presentations on how to use Cytoscape
    Ten simple rules for making network figures
    Published Cytoscape networks
  • Network/pathway enrichment tutorials:
    NEArender paper
    Pathway Commons
  • Machine-learning resources:
    Tutorial on supervised learning
    Machine learning in R
    A machine learning primer
    Pitfalls of applying machine learning in genomics.
    The problem of imbalanced classes
    Documentation for scikit-learn
    Building autoencoders with keras tutorial
    TensorFlow playground
    A quick ML introduction
    Introduction to statistical learning
    Paul G. Allen School lectures
    Feature engineering and vizualization
    A primer on deep learning in genomics
    A list of deep learning implementations for biology
  • RNA-seq pipeline tutorials:
    kallisto alignment free method
    tximport and BioC testing packages
  • ChIP-seq pipeline tutorials:
    BioC epigenomics course
  • Package for visualization of embeddings of high-dimensional data, such as single-cell genomics.
  • Tutorial on hierarchical random effects models.
  • Books

    These books are available free online:
    Modern Statistics for Modern Biology
    Safari books(the programming books with animals on the cover; provided free through UC System). There are multiple books on R and Python, though some are too specialized for the purposes of this class.

    General Programming Tips

  • Consider following a style guide to make your code easier to read. (R guidePython guide)
  • If you don’t already have it, Homebrew is a great utility for installing/updating applications in OSX. To install it just follow the instructions on the linked page. It does require XCode’s Command Line Tools, which you can get either as a standalone or by installing all of XCode, both of which can be downloaded from the Apple developers site. Once you have Homebrew, installing code like R, or Git, or Mercurial becomes as easy as typing “brew install _program name_” on the command line.
  • Jupyter notebooks are good for writing up projects with code.
  • If you are using Windows, you might consider installing Cygwin to get some Unix functionality, or running a Linux virtual machine.