Last week I gave a colloquium at MPIA in which I advocated the use of jackknife and bootstrap resampling to obtain empirical uncertainty estimates in a complex data analysis. Today I actually implemented jackknife in my project on cosmic homogeneity (and isotropy). I jackknifed by sky position: I split the sky into 12 nearly-equal regions for 12-fold leave-one-out. I have intuitions about when it is a good idea to jackknife on a quantity (like sky position) and when it is a good idea to jackknife on a quantity that is completely random, but I don't know exactly where my intuition comes from. In general it must be the case that jackknifing on different things answers different questions about your noise.
I had a great conversation over lunch today with Lorenzo Pino (Arcetri), who is measuring exoplanet direct spectra in systems with large, hot planets. He makes a data-driven model for the star spectrum (and its variations) in a time-domain spectroscopic campaign, and then stacks the residuals in the (computed) rest frame of the planet to get the planet spectrum. We discussed the next-order correction to this method that approximates simultaneous fitting of planet and star.
Today I visited the group of Saskia Hekker (HITS). We discussed many things asteroseismological! We discussed:
- the ESA Plato observing strategy
- is the asteroseismic signal a Gaussian process to any degree of accuracy?
- using asteroseismic information to improve and inform open-cluster membership
- synchronization of orbital periods with primary-star rotation periods
- are two distributions different?
Long, long ago, when I worked with Sam Roweis (deceased) and Dustin Lang (Perimeter) on locating images on the sky, we used to discuss coordinate systems: You don't actually need a long-lat or theta-phi coordinate system to describe the locations of things on the sky, right? You can just use angular relationships among sources to locate everything precisely and unambiguously! And with that approach, you don't need to make as many choices and standards and lines of code about reference frames. But, alas, this point of view is not in the ascendent.
Not being deterred, I put the bright stars on my maps (from this weekend) of Kate Storey-Fisher's ESA Gaia quasar sample. Can you find the big dipper and Orion? And Sirius?
At the Heidelberg Tiergarten Schwimmbad I worked out the mathematics for an equal-area projection of the sphere, centered on the poles. It turns out that I reinvented Lambert's projection from the 1770s. Here's a plot of the Gaia DR3 quasar sample (censored by some dust cuts) in my new projection:
I gave the Königstuhl Colloquium today, which was fun. I spoke about this paper and related matters. I got great questions. One was about when to fit a very flexible model vs just doing simple interpolation. I gave some kind of minimal answer in my talk but then I thought about it a lot more later. The key difference between fitting a very flexible model and interpolation is that the former can be made part of a bigger probabilistic model whereas the latter (without serious modifications) cannot. That's a big deal when (say) you are trying to find planet transits in the face of stellar and spacecraft variability.
After making (yesterday) all the plots that demonstrate the uniformity and large-scale homogeneity of Kate Storey-Fisher's Gaia quasar catalog (which she is writing up now), I decided (tentatively) to write a paper on cosmic homogeneity with these data: When a catalog shows beautiful homogeneity, that is both a statement about the catalog and a statement about the Universe. I wrote a title and abstract and some figure captions today.
Back around 2004 I promised myself I would never compute a fractal dimension ever again! But I did today, using Kate Storey-Fisher's (NYU) new quasar catalog from the ESA Gaia data. And it turns out that it is 3. Good! Actual measurement with uncertainty coming soon.
I am working with Soledad Villar (JHU) and others on making generalizations of convolutional operators (and image-based non-linear functions based on those convolutions) that can deal (correctly) with input data that contain vectors and tensors. That is, tensor convolutions of tensor images. Anyway, one of the problems is: How do you visualize a tensor field or an image of tensors? I implemented a possible solution, pictured below: You make a figure that has no symmetry, and you take that figure through the tensor! That only works for 2-tensors of course. 3- and 4-tensors? I'm at a loss.
We had a great meeting today with Kate Storey-Fisher (NYU), Hans-Walter Rix (MPIA), Christina Eilers (MIT), and me to discuss KSF's progress on the ESA Gaia quasar sample. We looked at her large-scale structure results and her jackknifes and discussed paper scope. Options range from a quasar-catalog paper to a selection-function paper to a full cosmological parameter-estimation paper. Of course we decided to do all three! But importantly we decided that this week we would focus on writing a quasar-catalog paper. That's good, and achievable.
On the weekend, Kate Storey-Fisher (NYU) and I implemented jackknife uncertainty estimation for KSF's cosmology-with-Gaia-quasars project. We jackknifed by cutting the sky into RA slices. This is standard practice but I don't love it! It assumes that you know that your main source of error is calibration or sample consistency over the sky. It might be something way more insidious. In principle I guess you should jackknife over many things, and also randomly.
Today Matthias Samland (Stockholm) gave a nice Königstuhl Colloquium at MPIA about direct imaging of exoplanets with high-contrast imaging. He showed some beautiful results from ESO Gravity and from NASA JWST. One of his main take-away points is that the situation is changing fast, and we might achieve very much higher contrast ratios in the near future than we've ever had, and thus get many more planets.
I spent some time late in the day looking at uncertainty propagation for neural networks: Given that you can optimize a NN, and given that it makes good predictions for held-out data, and given that you can take all derivatives of everything with respect to everything, does that mean you can propagate errors or noise from the data to the results? I think the answer is yes in a limited sense: You can see how the output depends on the input at the training step. But what you can't do—and probably will never be able to do—is propagate the uncertainties that come from your training set (the uncertainties in your weights, as it were). And these uncertainties can be very large, especially since the models tend to be enormously over-parameterized, and also contain combinatorially large exact and near-exact degeneracies. (I think maybe the near-exact degeneracies are worse than the exact ones.) I vaguely recall Tom Charnock making strong statements about all these things at Ringberg.
Today I posted this tweet (below), which I think explains what happened today! I also gave a talk at MPIA Galaxy Coffee, with Adrian Price-Whelan (Flatiron), about the appearance of stellar parameters in the ESA Gaia XP spectral coefficients.
I am fully obsessed with geometry these days. In particular, I am obsessed with the point that scalars aren't just numbers, but rather numbers that don't depend on your choice of coordinate system. Similarly, vectors aren't just things with a magnitude and a direction: They are things with a magnitude and a direction which are coordinate free, or which have a stable direction and magnitude no matter what coordinate system you choose. Thus, for example, the unit vectors defining the x, y, and z directions of your coordinate system are not really vectors at all. But the acceleration due to gravity right here is a vector.
But there are pseudo- quantities. For example, the angular momentum isn't exactly a vector; it is a pseudo-vector: It's magnitude and direction doesn't depend on the orientation (or translation) of the coordinate system, but its direction does depend on the handedness of the coordinate system. Thus there are pseudoscalar, pseudovector, and pseudotensors in addition to scalars, vectors, and tensors. Today Soledad Villar (JHU) wrote definitions for these in the paper we are drafting. It isn't trivial, because we want a notation that is agnostic about the group operator and the thing it is operating on.
Kate Storey-Fisher (NYU) has made really nice random catalogs that look very very similar, in sky coordinates, to the quasars we have. However, there is obviously more exclusion of quasars from the Galactic plane region than we can explain with any reasonable model of how dust is affecting things. It's the stellar density of course: ESA Gaia selection is very sensitive to stellar density, especially (as now) when you are using the XP spectra. Today she included the stellar density in the random-catalog regression and boom: Excellent random! Our model is not mechanistic, it is effective and data-driven.