I asked, in the Astronomical Data Group meeting at Flatiron, about the method of spectral 2D-to-1D extraction known as flat-relative optimal extraction. It's genius, and simple, but it makes strong assumptions about the spectrograph. I asked how we might improve it. And I think I maybe have a plan. The idea (which was thrown out by Megan Bedell) is to make the spectral representation something continuous, and evaluate it individually at every pixel, not just once per column of the detector. This should improve extraction. And it is relevant to the NASA proposal I am writing with Matt Daunt.
2022-04-29
2022-04-28
wacky shape scalars
Kate Storey-Fisher (NYU) showed me today the results of her work predicting stellar contents of dark-matter halos in hydrodynamic n-body simulations. She is building her shape scalars from geometric properties (scalars, vectors, and tensors) of the position-space and velocity-space distributions of the n-body particles. She did a very principled feature-importance study, including one feature at a time, combinatorically, and seeing how each feature helps, differentially. The most important features are... strange! Why? Because most of the regression work is done by very simple features (halo mass, halo size, halo velocity) so the (dimensionless) shape scalars we have made are fixing up non-trivial problems. Time to write the paper!
2022-04-27
Dr Yucheng Zhang
Today Yucheng Zhang (NYU) defended his PhD. He used SDSS eBOSS large-scale structure samples to test gravity on large scales, and also made forecasts for measuring the non-Gaussiany parameter fnl and other very-large-scale-structure measurements in upcoming surveys. Beautiful work and a very nice defense. In the question period, Kate Storey-Fisher (NYU) asked Zhang about his possible forecasts for the upcoming ESA Gaia sample of 6.4 million quasars. Zhang has not considered this sample yet (almost no cosmologists have!) but he said that he does have the technology to make predictions for it. His intuition is that it would be great for measuring baryon acoustic feature and fnl. We plan to take Zhang out to lunch to discuss in the near future!
2022-04-26
information loss
I wrote words today about how information is being lost in radial-velocity-spectrograph data-analysis pipelines at the stage of going from 2D spectra to 1D spectra. I am proposing to NASA (with Matt Daunt, NYU) to fix these problems! This is important, in my opinion, but I have to admit that it is not currently considered the tall pole in EPRV.
2022-04-25
exoplanet roadmaps, plans, and surveys
Inspired by research by Matt Daunt (NYU), I looked at the various reports, presentations, and papers that have been written by NASA panels, committees, and projects about the tall poles and engineering gaps in the exoplanet research ecosystem. Why? Writing a proposal, of course! Daunt and I are proposing to work very close to the metal in radial-velocity work, so we are looking at the critical infrastructure that's close to the metal.
2022-04-22
radio reboot
[Somehow this blog keeps failing. I will try to get back into it, but no promises! I apologize to my loyal reader.]
Today I met with Abby Shaum (IPAC) who worked with me a few years ago making a phase demodulator to find stellar companions. The idea is that if a star is broadcasting a coherent (or even incoherent) asteroseismic or pulsation mode, and if the star is orbiting a companion, the kinematics of the orbit will be imprinted on phase and frequency modulations of the carrier frequency. Like a radio! Indeed we built a signal-processing method that looks just like a radio demodulator. Today we discussed how to reboot this project and write a paper for the refereed literature.
2022-04-11
sailing
I gave a seminar at lunch today (black-board talk) about how sailboats work. I got lots of great comments and questions, especially about sailing down wind faster than the wind. I vowed to add a paragraph to my paper on sailing (with Matt Kleban) about how to sail this way. I think it is extremely hard to do, technically. So much so that some of the books on sailing say that it is impossible! It isn't, in principle.
2022-04-06
extragalactic stellar stream
Sarah Pearson (NYU) is working on modeling a stellar stream (disrupted satellite galaxy) around an external galaxy. The goal is to figure out what observables are most critical, and what properties of the host galaxy are most strongly constrained by a good model. That is, information theory. Pearson showed beautiful results today to Adrian Price-Whelan (Flatiron) and me: She can show that the mass of the galaxy's dark-matter halo is covariant with velocity gradients along the stream. Those would be hard to measure but not impossible. One high-level objective is to understand what would be the scientific merit of a big program with new imaging data and follow-up spectroscopy.
2022-04-05
simulating BpRp spectra
I had an early meeting with Maddie Lucey (UT Austin) and Adrian Price-Whelan (Flatiron) about simulating ESA Gaia BpRp spectra. Lucey has this technology and can simulate stars with any parameters. We discussed making a fake-data set that we can use to test ideas and methods we would like to use after Gaia DR3 in June. We ended with a plan to simulate matched BpRp spectra, one for each APOGEE DR17 spectrum. Lucey is on the case. Let us know if you are interested in doing preparatory science with such a data set!
2022-04-04
how do clustering results scale with survey size?
I spoke with Abby Williams (NYU) and Kate Storey-Fisher (NYU) today about Williams's forecasts for measuring cosmological-scale gradients in the large-scale structure. We came up (many moons ago) with approximate scalings with survey volume, the number of tracers, and the amplitude of the clustering. Some of these are obeyed by Williams's results and some aren't! What gives? We think it might have to do with the occupation number of the modes. If the number density of tracers is high, the clustering precision depends on volume, not galaxy number density.
2022-04-01
distributions of dimensionless quantities
In finishing up our paper on dimensional analysis for machine learning, Soledad Villar and I have been discussing how to talk about out-of-distribution generalization of machine-learning methods. The space of dimensionless quantities is smaller in many ways, but I couldn't figure out how to argue that it is easier to match the test data to the training data in the dimensionless quantities than in the original, dimensional inputs. Villar pointed out that one way to see it is that many different distributions in the dimensional quantities map to the same distribution in the dimensionless quantities. For example, if you multiply all the masses by five, you haven't changed the distribution in the mass ratios, even though your mass distributions will no longer overlap. That's a good argument, and what we ended up arguing in the paper.
2022-03-31
GPRV, day 4
Today was day 4 and the last day of GPRV in Oxford. The day ended with a discussion led by Heather Cegla (Warwick) and Jennifer Burt (JPL) about EPRV and national priorities. Exoplanet science is obviously extremely important to the 2021 Decadal Survey, but in detail, the first seven chapters of that Survey (the chapters to which NASA and NSF must respond) do not actually mention radial velocity! The conversation in the room today was extremely wide-ranging; it covered hardware, software, science goals, and community-building goals. But it also covered months, years, and decade-long time-scales.
The highest level recommendation of the Decadal Survey was that we need to do preparatory work to design and assess feasibility of a large IR–visible–UV telescope that will discover habitable worlds. There is no doubt (I think it's uncontroversial) that this preparatory work will require lots and lots of EPRV science and observations. Of course the fact that this is obvious is separated somewhat from the question of whether there will be abundant funding!
It will come as no surprise to my loyal reader that I was a proponent, in this discussion, of building open-science communities around open data, open-source software, and open science collaborations. I think we have so much evidence now that open-science communities science way better. What I loved is that there were absolutely no objections in the room to this idea. The only controversies were about exactly how open data should be managed and released in that utopian future. I'm optimistic about this business!
And I thank Suzanne Aigrain (Oxford) and her OC for a great meeting!
2022-03-30
GPRV, day 3
Day 3 of GPRV continued great! There were a few talks and discussions of very young stars that got everyone in the room quite excited, from Di Maio (INAF), Suárez Mascareño (IAC), and Nielsen (Oxford). The activity signals are huge, but the planets are extremely interesting, so how do we approach this? Tons of of observing time? Cleverness? Give up? Of course I think it is so important to understand how planetary systems form and evolve, I would be willing to spend the telescope time.
In the morning, Luger (Flatiron) gave a seminar and then a tutorial about modeling stellar surfaces and predicting spectroscopic quantities. The tutorial was fun; his code Starry does everything an astronomer could want, and beautifully (and, of course, blazingly fast). We had fun playing with it in a group hacking session.
2022-03-29
GPRV, day 2
Today was day two of GPRV. It was a delight! Here's another highly unfair summary of the day:
Hara (Geneva) kicked it off with a discussion of a Bayesian-decision-theory-like method for deciding on the reality and correctness of exoplanet discoveries. He made clever choices to deliver really strong probabilistic results. I was about to object to all this but then he disarmed me at the end of the talk by noting that everything is extremely sensitive to noise models and that is the biggest issue. He gave some chilling examples.
Shahaf (Tel Aviv) showed some very nice results in old-school statistics that generalize the periodogram to a correlation between phase differences and distances between pairs of any quantities you like, as a function of period. This can be used to perform causal inferences for periods seen in naive periodograms. He uses a very interesting phase variable for these phase differences; this is extremely relevant to things I have been discussing with Zhao and Bedell.
Mortier (Cambridge) mesmerized the room with her work with six (yes 6) years of solar data from HARPS-N (maybe). She can show amazing relationships between pipeline RVs, activity indicators, and spectral shape measures. But she showed that often the correlations are not at zero-time-lag. Often the correlations are strongest with delays of 1/9 to 1/8 of a rotation period. When she sub-samples the data to typical kinds of long-term monitoring campaigns we are going to do on distant stars, it is a bit scary. That led to a lot of discussion over lunch and dinner.
Zhao (Flatiron), Bucchave (DTU), and Dumusque (Geneva) led discussions on community-building, hardware, instrument calibration, and other things. The meeting is set up for lots of discussion and is itself an extremely good example of a community-building activity, around the hard challenges of EPRV. I opined in one of these sessions that EPRV now looks like cosmology around 2000, when everything was just about to go open and the world community started working together. This meeting is part of this change that we want to see.
Finally, a theme of the day was representations for spectral signals. Dumusque (inadvertently, I think) made a strong case that we should be working in the 2-d spectrograph images directly! That's music to my heart. He also emphasized that the stellar surface is a complex physical place. I agree! And Cretignier (Geneva) showed a beautiful representation of the spectral residuals to disentangle Doppler and spectral-variability variations. I think his work and Shahaf's could be combined in interesting ways; I am excited to get back to the lab.
2022-03-28
GPRV, day 1
The GPRV meeting started in Oxford today. The meeting brings together people working on data analysis in extreme precision radial-velocity projects, but united by interests in and uses of Gaussian processes. The first day ended with a very nice tutorial by Foreman-Mackey (Flatiron) on applied-math and computational tools for scalable Gaussian processes. He even live-coded and blew everyone's mind with Python jax.
Many talks (including Barragán (Oxford), Delisle (Geneva), and Tran (UT Austin) to name a few) are using Gaussian processes and their derivatives or two Gaussian processes to model the star's variability, with photometry, radial-velocity measurements, and activity indicators modeled as linear combinations of these latent processes. That's a really interesting theme, and connects somehow to my evil plan (with Bedell, Luger, Zhao, et al) of modeling the whole stellar surface. It is definitely an exciting time.
One issue that came up is how to judge or assess over-fitting. There was no consensus or answer, and most of the GP practitioners are very Bayesian. But Bayesian approaches aren't always sensitive to true statistical violations of the model; I want to see some cross-validation in this house.
In other news, Halverson (JPL) told us about publicly available solar data (and lots of it) from NASA NEID. I might want to play with that when I get home!