{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generation of plots for tutorial on good talks\n", "\n", "*This document was created from a Jupyter notebook. You can download the notebook [here](l04_good_plots_for_slides.ipynb).*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import warnings\n", "warnings.simplefilter(action='ignore', category=UserWarning)\n", "\n", "import numpy as np\n", "import scipy.stats as st\n", "\n", "import seaborn as sns\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "%config InlineBackend.figure_formats = {'png', 'retina'}\n", "\n", "# Markov chain Monte Carlo package\n", "import emcee\n", "\n", "import bokeh.plotting\n", "import bokeh.io\n", "import bokeh.charts\n", "import bokeh.models\n", "bokeh.io.output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data set and necessary calculations\n", "\n", "The data set consists of reversals of worms from three strains, wild type, Channelrhodopsin in the ASH neuron, and Channelrhodopsin in the AVA neuron. We got, respectively, 0/35, 9/35, and 33/36 reversals." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Store data set\n", "r = np.array([0, 9, 33])\n", "n = np.array([35, 35, 36])\n", "strains = ('wild type', 'ASH', 'AVA')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The posterior distribution is\n", "\n", "\\begin{align}\n", "P(p_\\mathrm{rev}) = \\frac{(n+1)!}{r!(n-r)!}\\,p_\\mathrm{rev}^r\\,(1-p_\\mathrm{rev})^{n-r}.\n", "\\end{align}\n", "\n", "We can write a function to compute this, since we'll need it for plotting." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def posterior(p_rev, r, n):\n", " \"\"\"Log posterior of reversal measurements.\"\"\"\n", " return (n + 1) * st.binom.pmf(r, n, p_rev)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Later, we will plot 95% confidence intervals that contain the highest probability density, or HPD. This is easily computed analytically for the case where $r = 0$, since, in this case,\n", "\n", "\\begin{align}\n", "P(p_\\mathrm{rev}) = (n+1)(1-p_\\mathrm{rev})^n\n", "\\end{align}\n", "\n", "This decreases monotonically with $p_\\mathrm{rev}$. Then, the minimum of the HPD interval is $0$. We can find the maximum by evaluating\n", "\n", "\\begin{align}\n", "\\int_0^{p_\\mathrm{rev}^\\mathrm{max}} \\mathrm{d}p_\\mathrm{rev}\\,P(p_\\mathrm{rev}) = \n", "1-(1-p_\\mathrm{rev}^\\mathrm{max})^{n+1} = f_\\mathrm{HPD},\n", "\\end{align}\n", "\n", "where $f_\\mathrm{HPD}$ is the fraction of the total space of values that the HPD covers (in our case, 0.95). This is solved to give\n", "\n", "\\begin{align}\n", "p_\\mathrm{rev}^\\mathrm{max} = 1 - (1-f_\\mathrm{HPD})^{(n+1)^{-1}}.\n", "\\end{align}\n", "\n", "So, we can compute the HPD for the wild type." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Intialize PDs, filling in WT value\n", "hpds = [(0, 1 - (1 - 0.95)**(1 / (n[0] + 1))), None, None]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will compute the HPDs using Markov chain Monte Carlo. First, we need a function for the log posterior." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def log_posterior(p_rev, r, n):\n", " \"\"\"Unnormalized log posterior, used for MCMC calculations.\"\"\"\n", " # Get scalar probability (emcee expects array argument)\n", " p = p_rev[0]\n", "\n", " if p < 0 or p > 1:\n", " return -np.inf\n", " return r * np.log(p) + (n - r) * np.log(1-p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll write a function to set up and run an MCMC calculation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def sample_posterior_mcmc(r, n, n_walkers=50, n_burn=5000, n_steps=5000):\n", " \"\"\"Get samples of p_rev from the posterior.\"\"\"\n", "\n", " # Specify starting points of walkers\n", " p0 = np.empty((n_walkers, 1))\n", " p0[:,0] = np.random.uniform(0, 1, n_walkers)\n", "\n", " # Instantiate sampler\n", " sampler = emcee.EnsembleSampler(n_walkers, 1, log_posterior, args=(r, n))\n", " \n", " # Do burn-in and sampling\n", " pos, prob, state = sampler.run_mcmc(p0, n_burn, storechain=False)\n", " _ = sampler.run_mcmc(pos, n_steps)\n", " \n", " return sampler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we need a function to compute the HPD from the samples." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hpd(sampler, mass_frac):\n", " \"\"\"\n", " Returns highest probability density region given by\n", " a set of samples.\n", " \"\"\"\n", " trace = sampler.flatchain\n", " \n", " # Get sorted list\n", " d = np.sort(np.copy(trace))\n", "\n", " # Number of total samples taken\n", " n = len(trace)\n", " \n", " # Get number of samples that should be included in HPD\n", " n_samples = np.floor(mass_frac * n).astype(int)\n", " \n", " # Get width (in units of data) of all intervals with n_samples samples\n", " int_width = d[n_samples:] - d[:n-n_samples]\n", " \n", " # Pick out minimal interval\n", " min_int = np.argmin(int_width)\n", " \n", " # Return interval\n", " return np.sort(np.array([d[min_int], d[min_int+n_samples]]).flatten())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will compute the mode and HPD for the ASH and AVA strains." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for i in (1, 2):\n", " sampler = sample_posterior_mcmc(r[i], n[i])\n", " hpds[i] = hpd(sampler, 0.95)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that all of our calculations are done, we can make our plots." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ugly plots\n", "\n", "We'll begin by making the \"ugly\" plots that are more or less defaults, but not really useful for presentations. We'll start with a bar chart." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "