Hogg's Research: reading

Showing posts with label reading. Show all posts

2023-11-16

grant proposals

There is a non-wrong view of academic science that it is all about applying for funding, and evaluating the proposals of others for funding. That's all I did today (evaluated proposals for a foreign funding program; I submitted my own proposal to the NSF yesterday).

2023-07-31

raw data from Cassini

One thing we discovered this past academic year is that NASA Cassini took more than 300,000 images of Saturn's rings! Today I met with Maya Nesen (NYU) and Ana Pacheco (NYU) to look at Cassini raw spacecraft data. Nesen is working on the tabulated housekeeping data, giving the position and orientation of the spacecraft and instruments in various coordinate systems (that we are trying to work out). Pacheco is working on the raw imaging data from the imaging module. We discussed how to display the imaging so that an astronomer can confirm the the noise level and rough noise properties in the pixels. We discussed adjustments to our plots of the housekeeping data to aid in our interpretation of it. In particular, we looked at some of the camera-related meta data and it looks like the camera might have a few different zoom settings. I guess we have to read some documentation!

2023-04-09

coordinate-free reading?

The world is O(3) equivariant. Meaning: The laws of physics don't depend on the orientations of things, nor do they depend on the orientation of your coordinate system. But handwriting—and printed words—are not equivariant: Writing systems have a definite orientation and parity. Indeed, it can be hard to read things when they are reversed in a mirror or at an odd angle. Pick up a paper from your desk and read it. Before you start, you have to orient it. How do you do that?

My answer is: Context. I think you try different orientations until one seems to work for the reading. You can't always tell from a single letter (like an M or a W or an O), but you can tell once a string of a few letters or numbers are visible. Inspired by all this, Villar and I are making this data set (among others) for learning and reasoning tasks:

2022-09-28

can a (say) 5-sigma result be interpreted as a p-value?

In preparing for class today (I am teaching NYU #data4physics), I worked through the relationship between a p-value (like what's used in medical research) and a physicist's n-sigma measurement. They are related in some very special cases, like in particular when the value being measured is a linear parameter (like an amplitude) and the noise is Gaussian. But those cases are special. And also: Converting n-sigma to p-value depends very critically on the noise model. So I don't like thinking of it as a p-value. That said, maybe there is no difference?

2022-05-09

discretized vector calculus

On Friday, Will Farr (Flatiron) suggested to me that the work I have been doing (with Soledad Villar) on image-convolution operators with good geometric and group-theoretic properties might be related somehow to discretized differential geometry. It does! I tried to read some impenetrable papers but my main take-away is that I have to understand this field.

2022-04-25

exoplanet roadmaps, plans, and surveys

Inspired by research by Matt Daunt (NYU), I looked at the various reports, presentations, and papers that have been written by NASA panels, committees, and projects about the tall poles and engineering gaps in the exoplanet research ecosystem. Why? Writing a proposal, of course! Daunt and I are proposing to work very close to the metal in radial-velocity work, so we are looking at the critical infrastructure that's close to the metal.

2022-03-14

me reading?

As my collaborators and friends know, if there is one thing I hate to do, it is spend all day reading the literature. I love and respect the literature! But don't make me actually read it. But today I sucked it up and read some 20-ish papers about characterizing dark-matter halo shapes, to find out if the coordinate-free shape measurements that Kate Storey-Fisher (NYU) and I are measuring are new. I think they are! In almost every paper I read, the word “shape” translated to eigenvalues of the positional variance tensor, or maybe ratios of those. Am I wrong?

2021-07-05

physics-of-sailing literature

I sucked it up and read a bunch of the physics-of-sailing literature today (and on the weekend). Some of the books very correctly attribute the forces on sails and wings to momentum transport. Some of the books very incorrectly attribute them to differences of pressure calculable from Bernoulli effect alone. But in reading it all, I did come to the conclusion that no-one is working in precisely the space we want to work, so I do think there is a (correctly scoped) paper to write. Of course even if there weren't, I couldn't stop myself!

2020-07-27

exoplanet transit science

I have had a good day finishing up my reading of the PhD dissertation of Emily Sandford (Columbia), who has a great collection of results on transit measurements of planets and stars. She makes use of Bayesian methods and also classic optimization or signal-processing methods to make challenging inferences, and she shows the limits and capabilities of current and future data. One thing I liked was that she works on what are called “single transits” where only one transit is found in the whole survey duration; what can you infer from that? A lot! (I have also worked on this problem long ago.) In the dissertation she busts a myth that the multiplicity distribution requires that there be a mixture of two qualitatively different kinds of planetary systems. I enjoyed that, and it leads to a lot of other science. Plus some creative work on understanding the detailed properties of multi-planet systems, treating them like sequence data. It's a great thesis and I am very much looking forward to tomorrow's defense.

2020-07-19

machine learning inside a physical model

A my loyal reader knows, I love putting machine learning inside a physical model. That is, not just using machine learning, but re-purposing machine learning to play a role in modeling a nuisance we don't care about inside our physical model. It's similar to how the best methods for causal inference use machine learning to capture the possibly complex and adversarial effects of confounders. Today I had the pleasure of reading closely a new manuscript by Francois Lanusse (Paris) that describes a use of machine learning to model galaxy images, but putting that model inside a causal structure (think: directed acyclic graph) that includes telescope optics and photon noise. The method seems to work really well.

2020-06-25

reading a difficult (to me) paper

I participated in day 3 of #sdss2020 today, and even started to pitch a project that could make use of the (literally) millions of unassigned fiber–visits in SDSS-V. Yes, the SDSS-V machines are so high throughput that, even doing multiple, huge surveys, there will be millions of unassigned fiber–visits. My pitch is with Adrian Price-Whelan; it is our project to get a spectrum of every possible “type”of star, where we have a completely algorithmic definition of “type”. More on this tomorrow, I hope.

In the afternoon, I spent time with Soledad Villar (NYU) reading this paper (Hastie et al 2019) on regression. It contains some remarkable results about what they call “risk” (and I call mean squared error) in regression. This paper is one of the key papers analyzing the double descent phenomena I described earlier. The idea is that when the number of free parameters of a regression becomes very nearly equal to the number of data points in the training set, the mean squared error goes completely to heck. This is interesting in its own right—I am learning about the eigenvalue properties of random matrices—but it is also avoidable with regularization. The paper explains both why and how. Villar and I are interested in avoiding it with dimensionality reduction, which is another kind of regularization, in a sense.

Related somehow to all this, I have been reading a new (new to me, anyway) book on writing, aimed at mathematicians. The Hastie et al paper is written by math-y people, and it has some great properties, like giving a clear summary of all of its findings up-front, a section giving the reader intuitions for each of them, and clear and timely reminders of key findings along the way. It's written almost like a white paper. It's refreshing, especially for a non-mathematician reader like me. As you may know, I can't read a paper that begins with the word Let!

2020-03-17

visualizing the kinematics of the disk

My only research today was reading and signing off on a paper by Jason Hunt (Flatiron) about the kinematics of stars in the Milky Way disk. His innovation was to plot the stars in something akin to action-angle coordinates (guiding-center-position coordinates). It's a good space to look at spiral structure and other velocity substructure. And to compare to simulations. The visualizations in the paper are lovely.

2020-02-16

LVM self-calibration

I read and commented on some documents today related to the calibration of the Local Volume Mapper part of the SDSS-V family of projects. The project is an intensity-mapping project to observe the interstellar medium in the Milky Way and nearby galaxies, using one spectrograph but many different telescopes (with different apertures). It's clever! The question is: Does this project need calibration telescopes in addition to the science telescope? My position is that they don't. Well, calibration telescopes might be very useful for debugging things and understanding things! But at the end of the day, calibration will be self-calibration I bet. I'm offering very good odds.

One point is the following: When you have an imager or a spectrographic imager, you have to calibrate so that every exposure has calibration consistent with every other exposure, and every pixel has calibration consistent with every other pixel. Good! Now imagine you introduce a calibration telescope. Now you have to do the same for the calibration system, and you have to understand the cross-calibration between the systems (science and calibration). So it greatly increases the difficulty of the task, introduces new variables, and (usually) reduces precision of the final results. The self-consistency of the science data (provided that it is properly taken) is always the strongest constraint on calibration. See, for example, Planck, WMAP, SDSS, PanSTARRS, and so on.

2019-08-09

selection, EPRV, spirals

[I have been on travel of various kinds, mostly non-work, for almost two weeks, hence no posts!]

While on my travels, I wrote in my project about target selection for spectroscopic surveys (with Rix) and my project about information theory and extreme-precision radial-velocity measurement (with Bedell). I also discovered this nice paper on Cepheid stars in the disk, which is a highly relevant position-space complement to what Eilers and I have been doing in velocity space.

2019-04-25

a book about writing

My only research today was finding and starting this interesting book about writing.

2019-04-22

time-domain speckle models

I spent time on the long weekend and today working through the front parts of a new paper by Matthias Samland (MPIA) who is applying ideas we used for our pixel-level model for Kepler data to high-contrast (coronographic) imaging. Most high-performance data pipelines for coronograph imaging model the residual speckles in the data with a data-driven model. However, most of those models are spatial models: They are models for the imaging or for small imaging patches. They don't really capture the continuous time dependence of the speckles. In Samland's work, he is building temporal models, which don't capture the spatial continuity but do capture the time structure. The best possible methods I can imagine would capture some of both. Or really the right amount of both! But Samland's method is good for working at very small “inner working angle” where you don't have much training data for a spatial model because there just isn't that much space very near the null point.

2019-04-08

student projects; 2-pt function estimators

Most of my research time over the weekend and today was taken up reading proposals for a funding review. That doesn't count as research, by my Rules. I don't love that part of my job. But I did get in some time with students, reading thesis chapters by Malz (NYU), planning two papers with Storey-Fisher (NYU), and discussing graduate school options with Birky (UCSB). I love these parts of my job!

In the conversation with Storey-Fisher, we set the minimal (though still very large) scope for a paper that competes or tests large-scale structure correlation-function estimators in realistic and toy data. Our issues are: We have identified biases in the standard estimators, and we (additionally) don't love the tests or arguments that say that Landy–Szalay is optimal. So we want to test them again, and also add some new estimators, from the math literature on point processes.

2018-10-10

#DSESummit2018, day 1

Today was the start of the annual Moore-Sloan Data Science Environments summit. I led an ice-breaker in which we split into small groups and discussed figures and data visualizations. It's a great community, so it was fun to get started. But as for research: I read and commented on text for Bedell (Flatiron) on the plane, and I worked with Richard Galvez (NYU) on designing a small project that brings machine learning to the Gaia data.

2018-08-21

what is a methods paper?

Dustin Lang (Toronto) and I spent time discussing this strongly worded paper about combining images. The paper makes extremely strong claims about what its method for co-adding images can do; it claims that the combined image is “optimal” for any measurement of any time-independent quantity of any kind in the original images. The word “optimal” is one I'm allergic to: For one, once you have written down your assumptions with sufficient precision, there is no optimal method, there is just one method! For two, optimal must be for a specific purpose, or set of assumptions. So while it is probably true (we are looking at it) that this paper delivers a method that is optimal for some purposes, it cannot be for any purpose whatsoever.

I guess I have a few general philosophical points to make here: Methods flow from assumptions! So if you can clearly state a complete set of assumptions, you will fully specify your method; there will be only one method that makes sense or is justifiable under those assumptions. It will therefore be (trivially) optimal for your purposes. That is, any well-specified method is optimal, by construction. And methods are defined not by what they do but by what they don't do. That is, your job when you deliver a method is to explain all the ways that it won't work and will fail and won't be appropriate in real-world situations. Because most people are trying to figure out if their problem is appropriate to your method! This means that much of the discussion of a methodological paper should be about how or why the method will fail or become wrong as the assumptions are violated. And finally, a method that relies on you knowing your PSF perfectly, and having perfectly registered images, and having no rotations of the detector relative to the the sky, and having all images through exactly the same atmospheric transmission, and having all images taken with the same filter and detector, and having no photon noise beyond background noise, and having perfect flat-fielding, and having identical noise in every pixel, and having absolutely no time variability is not a method that is optimal for any measurement. That said, the paper contains some extremely good and important ideas, and we will be citing it positively.

2018-04-12

references

If there is one thing I am worse at than anything else in writing papers, it is properly reading and citing the relevant literature. I spent all morning working through the literature relevant to my Gaia likelihood-function paper. And I know I am still missing things.