what constitutes an exoplanet discovery?

Over the last few days, Megan Bedell (Flatiron) and I have been discussing the criteria that go into a detection of an exoplanet, or what things need to be true for a detection or discovery to be considered made. We are thinking of discoveries with the Terra Hunting Experiment or other radial-velocity surveys. We identified three-ish condtions:

One condition is that the amplitude be significantly different from zero; that is, the null is ruled out. Another is that the planet be characterizable: that is, the orbital parameters can be estimated to some level of precision. A third is that the planetary explanation of the signal be preferred at some confidence over other qualitatively different explanations, like stellar variabilities, stellar rotation, or signals faked by beats from other planets.

I presented these criteria at Astronomical Data Group meeting, and Dan Foreman-Mackey (Flatiron) said that he disagreed with every aspect of it. But we didn't (yet) find out why.


how to show spectra to astronomers?

OMG I posted a paper to arXiv today! It should appear on Monday.

In the morning, I met with Villar (NYU) and Huang (NYU) to discuss Huang's paper on adversarial attacks against machine-learning regression methods in astronomical contexts. We spent a lot of the call discussing how to display the results in such a way that the astronomers will understand them and see how strange they are. Our main target is spectroscopic methods, so this involves displaying spectra.


finished a paper!

I didn't have a great day today. So I took the last few hours of the day and tried to finish the final to-do items on my paper with Price-Whelan (Flatiron) and Leistedt (Imperial) on multiplying Gaussians. Yes, multiplying together Gaussians. How can I have 13 pages of things to say about that? But I actually did finish, and we might submit to arXiv tomorrow (for appearance on Monday). This is my first first-author paper in a long time (and it isn't a research contribution, really).


almost nothing

Today was an almost-nothing research day. I did speak to Eilers about her projects, and Villar about double descent and organizing our results. Price-Whelan about writing. But almost nothing else today.


double descent is a thing

Yesterday in the Flatiron Astronomical Data Group weekly meeting, I showed the crew something called double descent: When you are training a model and the number of data points you have (number of training-set objects) approaches the number of features you have (number of pixels in your image, say) then regression models often blow up. That is, you get better answers with fewer training objects when the number approaches certain values. This is a highly discussed issue in machine learning (and math and statistics; it's like some kind of phase transition) but hasn't really hit the domains (like astronomy) very much yet. The crew was surprised so today I made a tiny colab notebook to demonstrate it for a polynomail fit. It's amusing!


how to design figures and captions?

Lily Zhao (Yale) gave Megan Bedell (Flatiron) and I a tour of the figures in our nascent paper on spectrograph calibration. For each figure we asked: Does this figure tell a part of our story? And: Is that point clear in both the figure and the caption? My philosophy is that most of your readers are not going to curl up with a hot cocoa on the sofa and luxuriously read your paper; most of your readers are going to skim your paper on their phones, while waiting in line to pick up a coffee. So you want the paper to be comprehensible on a skim. That skim probably involves settling on figures and captions, so these should tell the whole story of the paper, or as best they can. Of course I don't always follow my own advice here, but I should.


homework problem

This week Megan Bedell (Flatiron) assigned me a homework problem, which is to work out the information theory (or whatever) that will tell us how sensitive a particular radial-velocity survey is to exoplanets. I started to write this down as a problem and a solution today. It might evolve into a paper if we have enough to say!


search and characterization

I had a wide-ranging conversation with Ana Bonaca (Harvard) about many things. One thing we discussed is the relationship between what astronomers call search and what astronomers call characterization. We were talking about this in the context of asteroseismology signals. But it is there for exoplanet searches, binary-star searches, and even the Higgs boson. There are certain respects in which search and characterization are very similar: Most search methods involve parameter estimation over some grid of models (periods and phases, say, estimating amplitudes, or masses, estimating coupling parameters). But there are certain respects in which they are very different too: Search involves harsh approximations and cheats for performance; characterization involves high-quality inference with baroque models. Anyway, it is interesting to think about these worlds and ideas and how much astronomy has changed since I was a wee astronomer.


page layout, the assumptions of linear models

I spent way more of my research time today than I should have working on the page layout of the paper I am currently writing with Price-Whelan and Leistedt. My goal is to make a document that works well either printed on paper, or read on a computer screen, or read on a phone. And a document that includes footnotes and figures, which the user can see without endless scrolling up and down the document. One reason I am so against typesetting two-column (as many astronomy papers do) is that it is very challenging to read on a phone, at least current-generation phones.

I also spent some time today reading in the Agresti book on linear models. Although this book is a bible for linear models and generalized linear models, it states at the very outset (yes, in Chapter 1) that every single method in every chapter of the entire book will presume that there is no noise in the features, only noise in the labels. (Think: You are predicting noisy labels y given features x.) That assumption—no noise in x—is fine, but it is violated in every important example I know in every single application of any kind of linear model (or nonlinear model of course) in every area, academic and commercial.

Other than that, it's a good assumption.


how to write the (very mathematical) method section?

Eilers (MIT) started the day with a discussion of how, specifically, to modify our paper given Friday's discussion of potential perturbations and realism in steady-state dynamical models. It is obvious that a mixture of rigidly rotating fixed patterns can mock up what we want for our potential perturbation, but then computing each of these is not simple, since resonances occur. So what to do to mock up something that is more like a continuous integral over an infinite family such perturbations, which would have no such pathologies? We have various hacks in mind, but there are trade-offs between having more complete math and having more discursive discussion about the math. I usually prefer the latter. But if part of our audience is graduate students trying to reproduce the result, it might make sense to get more math in there.


invert, solve, and least-squares

One of the things I throw into seminars and papers these days is about numerical linear-algebra operators. My advice is: Never use the invert() function to invert a matrix! Why not? Because you almost never want the inverse, you want the inverse applied to a vector. So when you are tempted to do dot( inv(A), x ) you should instead do solve( A, x ). The first gets a machine-precision (if you are lucky) inverse of A and applies it to x. The latter gets a machine-precision (if you are lucky) estimate of A-inverse applied to x. You see the difference?

Well, all of that got complexified today when I learned (from Soledad Villar NYU) about lstsq(). This does the same as solve() (and more) but it is zero-safe. Meaning: It gives good answers even when the A matrix has zero eigenvalues. I think my advice may be evolving.


true confessions of dynamicists

There was an hour-long conversation this morning among the coauthors of Eilers et al. We discussed inconsistencies in the model as described in that paper. It was a wide-ranging discussion! There are no self-consistent, steady-state, non-axisymmetric models for disks! So we are building models in which a symmetric disk is stirred or modified by a rigid potential. Our issues are all about how to make this model useful for understanding the Milky Way, which is certainly not being stirred by a rigid, rotating, m=2 potential perturbation! There was quite a range of positions on the call, but I think pragmatism won the day. We all left a bit confused.


noisy power spectra

Ana Bonaca (Harvard) and I continued today our discussion of the determination of nu-max and delta-nu in asteroseismic observations of stars. We find that there are cases in which it is obvious-ish to an asteroseismic expert which peaks in the spectrum are the forest of lines, so delta-nu can be read, but in which our probabilistic model for the spectrum sees no strong evidence for that delta-nu. The asteroseismologists know what they are doing, so I suspect either a bug or a think-o. But the spikiness of power spectra is concerning. It's hard to know what's believable. Especially to a newcomer like me. And noise in a power spectrum is famously structured.


funding! and interpolation!

Kate Storey-Fisher (NYU) got great news today. Her NASA FINESST proposal was funded! So she gets paid for the next three years. And she did it herself. Funding is one of the most challenging, confusing, disheartening, and complicated parts of this job. I don't like it, but it's reality. Congratulations, Storey-Fisher

On my student-research call, I worked on interpolating complex numbers with Avery Simon (NYU). We have a set of phase transforms of two very similar images, which return, for components, complex amplitudes (or amplitudes and phases, if you like). Now the question is: How to interpolate between these images, or their phase transforms? We want to test the hypothesis (put forward in computer-vision research like this) that interpolating in complex phase is way better in many respects than interpolating naively.


debugging copy pasta

It has been nice, during the quarantine, to be doing some coding again. Of course my coding is very amateurish these days. I am not obeying good code practices! For one, I am working in Jupyter notebooks, which are very bad for good code practice: They encourage cut-and-paste and discourage building structures and modules. Today I found many bugs in my linear-regression statistics and information-theory code. Most were from repeated code blocks. Any time I am writing the same code twice, I am almost certainly making a mistake!


finishing a paper

Adrian Price-Whelan (Flatiron) and I have nearly finished our paper on multiplying Gaussians together. I spent part of the weekend re-reading, adjusting, tweaking, and reformatting it. I also discussed it a bit with Hans-Walter Rix (MPIA), who thinks maybe we should attach our result more clearly to what the reader already knows.


linear regression

In our code notebook today, with repeated numerical experiments, Soledad Villar (NYU) and I demonstrated that we can, in some settings, regularly beat the Gauss–Markov estimator, which is the standard linear discriminative regression estimator. We are working in a toy world where everything is linear and Gaussian! And the Gauss—Markov estimator has very good properties, and provably, so what gives? The main thing that gives (we think) is that we are adding noise to the features, not just the labels. This case is surprisingly thorny. When there is noise only in the labels, all is cool. As soon as you add noise to the features, all h*ck breaks loose.


writing, planning, manipulating

I did many small things today. I put citations (to the literature) and comments (to the reader) into my note with Adrian Price-Whelan (Flatiron) about refactoring Gaussian products. Kristina Hayhurst (NYU) has all the ESA Planck data in a Jupyter(tm) notebook and she is ready to build a linear latent-variable model, which has been a dream of mine for years! We adjusted various visualizations to see the data. I helped Anu Raghunathan with some Pythonic index manipulation to speed up and simplify her implementation of box least squares (the exoplanet search algorithm). She is reinventing the wheel here, but it's so we can do various kinds of statistical experiments.


unearthing and re-scoping a dormant paper

I spoke to Hans-Walter Rix (MPIA) about many things today. One of these was rebooting and simplifying our selection-function paper. My loyal reader knows that Rix and I are trying to write something on this; I made progress this summer but then the school year hit, plus many other things. Now time to dust it off and re-start. Rix is arguing that we should go more introductory, more simple, more pedagogical. I agree. We left it that I would dust off what I have, and he would dust off or rebuild his mental outline for the paper.


adaptive numerical integration

During my student-research “office hours” today, Kate Storey-Fisher (NYU) told me about issues she is having doing certain kinds of numerical integrals for her project on two-point statistics. That got me fired up about adaptive sampling for integration. I wrote a code notebook that does an adaptive one-dimensional integration. As I hoped, the precision of the integral (in some limit, and in some cases) seems to improve exponentially with the number of samples, provided that they are placed very carefully.


linear operators in a linear universe

In a very enlightening call with Soledad Villar (NYU), the two of us figured out the minimum-variance unbiased estimator in our toy linear universe. This is a universe in which we get to see noisy (with Gaussian-distributed noise) features X and noisy labels Y, all generated by an unobservable Z and some linear operators. In this toy, linear, Gaussian Universe—which I think is extremely general—what is the best predictor for some new y value given some new x value? We found the minimum-variance unbiased estimator today with a little quadratic programming. This is not fair, however, because the estimator we derived is one you could only construct if you had access to unobservable things, like how the Z space is related to the X and Y spaces. The whole point of this project is going to be that you can't know those things!

Now our goal is to relate this, and other less-cheat-y estimators to what's known as the BLUE: The best linear unbiased estimator (and the subject of the Gauss–Markov theorem). I personally think we can beat the BLUE handily in many situations (Villar is reserving judgement); we are trying to figure out how and when, if I'm right.

Funny thing about this project: We are definitely reinventing wheels here: All this must be known! But we are learning a huge amount, so we are forging on. Sensible? I don't know.


pair (or triple) writing

In our weekly with Lily Zhao (Yale), Megan Bedell (Flatiron) and I got Zhao to open the text of her and we did some writing. Yes, we wrote together. That isn't always a good idea! But Zhao was patient and we got some things in the paper addressed and fixed.