Hogg's Research

submitted!

noreply@blogger.com (Hogg) — Sat, 16 Mar 2024 20:28:00 +0000

OMG I actually just submitted an actual paper, with me as first author. I submitted to the AAS Journals, with a preference for The Astronomical Journal. I don't write all that many first-author papers, so I am stoked about this. If you want to read it: It should come out on arXiv within days, or if you want to type pdflatex a few times, it is available at this GitHub repo. It is about how to combine many shifted images into one combined, mean image.

IAIFI Symposium, day two

noreply@blogger.com (Hogg) — Sat, 16 Mar 2024 03:14:00 +0000

Today was day two of a meeting on generative AI in physics, hosted by MIT. My favorite talks today were by Song Han (MIT) and Thea Aarestad (ETH), both of whom are working on making ML systems run ultra-fast on extremely limited hardware. Themes were: Work at low precision. Even 4-bit number representations! Radical. And bandwidth is way more expensive than compute: Never move data, latents, or weights to new hardware; work as locally as you can. They both showed amazing performance on terrible, tiny hardware. In addition, Han makes really cute 3d-printed devices! A conversation at the end that didn't quite happen is about how Aarestad's work might benefit from equivariant methods: Her application area is triggers in the CMS device at the LHC; her symmetry group is the Lorentz group (and permutations and etc). The day started with me on a panel in which my co-panelists said absolutely unhhinged things about the future of physics and artificial intelligence. I learned that many people think we are only years away from having independently operating, fully functional aritificial physicists that are more capable than we are.

IAIFI Symposium, day one

noreply@blogger.com (Hogg) — Fri, 15 Mar 2024 03:07:00 +0000

Today was the first day of a two-day symposium on the impact of Generative AI in physics. It is hosted by IAIFI and A3D3, two interdisciplinary and inter-institutional entities working on things related to machine learning. I really enjoyed the content today. One example was Anna Scaife (Manchester) telling us that all the different methods they have used for uncertainty quantification in astronomy-meets-ML contexts give different and inconsistent answers. It is very hard to know your uncertainty when you are doing ML. Another example was Simon Batzner (DeepMind) explaining that equivariant methods were absolutely required for the materials-design projects at DeepMind, and that introducing the equivariance absolutely did not bork optimization (as many believe it will). Those materials-design projects have been ridiculously successful. He said the amusing thing “Machine learning is IID, science is OOD”. I couldn't agree more. In a panel at the end of the day I learned that learned ML controllers now beat hand-built controllers in some robotics applications. That's interesting and surprising.

The Cannon and El Cañon

noreply@blogger.com (Hogg) — Tue, 12 Mar 2024 21:45:00 +0000

At the end of the day I got a bit of quality time in with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), who have just (actually just before I met with them) created a new implementation of The Cannon (the data-driven model of stellar photospheres originally created by Melissa Ness and me back in 2014/2015). Why!? Not because the world needs another implementation. We are building a new implementation because we plan to extend out to El Cañon, which will extend the probabilistic model into the label domain: It will properly generate or treat noisy and missing labels. That will permit us to learn latent labels, and de-noise noisy labels.

black holes as the dark matter

noreply@blogger.com (Hogg) — Tue, 12 Mar 2024 03:45:00 +0000

Today Cameron Norton (NYU) gave a great brown-bag talk on the possibility that the dark matter might be asteroid-mass-scale black holes. This is allowed by all constraints at present: If the masses are much smaller, the black holes evaporate or emit observably. If the black holes are much smaller, they would create observable microlensing or dynamical signatures.

She and Kleban (NYU) are working on methods for creating such black holes primordially, by modifying hte potential at inflation, creating opportunities for bubble nucleations in inflation that would subsequently collapse into small black holes after the Universe exits inflation. It's speculative obviously, but not ruled out at present!

An argument broke out during and after the talk whether you would be injured if you were intersected by a 10²⁰ g black hole! My position is that you would be totally fine! Everyone else in the room disagreed with me, for many different reasons. Time to get calculating.

Another great idea: Could we find stars that have captured low-mass black holes by looking for the radial-velocity signal? I got really interested in this one at the end.

APOGEE spectra as a training set

noreply@blogger.com (Hogg) — Sun, 10 Mar 2024 14:46:00 +0000

I spent a lot of the day building a training set for a machine-learning problem set. I am building the training set out of the SDSS-V APOGEE spectra, which are like one-dimensional images for training CNNs and other kinds of deep learning tasks. I wanted relatively raw data, so I spent a lot of time going deep in the SDSS-V data model and data directories, which are beautiful. I learned a lot, and I created a public data set. I chose stars in a temperature and log-gravity range in which I think the APOGEE pipelines work well and the learning problem should work. I didn't clean the data, because I am hoping that contemporary deep learning methods should be able to find and deal with outliers and data issues. If you want to look at my training set (or do my problem set), start here.

getting the absolutely rawest APOGEE data

noreply@blogger.com (Hogg) — Sat, 09 Mar 2024 19:04:00 +0000

I spent time today (at the bar!) understanding the data model and directory structure for the raw, uncalibrated APOGEE data. The idea is that I want to do a real-data example for my paper with Casey (Monash) on combining spectra, and I want to get back to the raw inputs. I also might use these spectra for a problem set in my machine-learning class. The code I wrote is all urllib and request and re, because I think it is necessary to read directories to understand the data dependencies in the survey. Is that bad?

Putting aside my concerns: The coolest thing about this project is that the SDSS family of projects (currently SDSS-V) puts absolutely every bit of its data on the web, in raw and reduced form, for re-analysis at any level or stage. That's truly, really, open science. If you don't believe me, check out this this code that spelunks the raw data. It's all just URL requests with no authentication!

combining spectral exposures

noreply@blogger.com (Hogg) — Fri, 08 Mar 2024 18:56:00 +0000

I wrote words! I got back to actually doing research this week, in part inspired by a conversation with my very good friend Greg McDonald (Rum & Code). I worked on the words in the paper I am finishing with Andy Casey (Monash) about how to combine individual-visit exposures into a mean spectrum. The biggest writing job I did today was the part of the paper called “implementation notes”, which talks about how to actually implement the math on a finite computer.

the transparency of the Universe and the transparency of the university

noreply@blogger.com (Hogg) — Mon, 12 Feb 2024 21:39:00 +0000

The highlight of my day was a wide-ranging conversation with Suroor Gandhi (NYU) about cosmology, career, and the world. She made a beautiful connection between a part of our conversation in which we were discussing the transparency of the Universe, and new ways to study that, and a part in which we were discussing the transparency with which the University speaks about disciplinary and rules cases, which (at NYU anyway) is not very good. Hence the title of this post. On transparency of the Universe, we discussed the fact that distant objects (quasars, say) do not appear blurry must put some limit on cosmic transparency. On transparency of the University, we discussed the question of how much do we care about the behavior of our institutions, and changing those behaviors. I'm a big believer in open science, open government, and open institutions.

I've been privileged these years to have some very thoughtful scientists in my world. Gandhi is one of them.

Betz limit for sailboats?

noreply@blogger.com (Hogg) — Mon, 22 Jan 2024 23:12:00 +0000

In the study of sustainable energy, there is a nice result on windmills, called the Betz limit: There is a finite limit to the fraction of the kinetic energy of the wind that a windmill can absorb or exploit. The reason is often stated as: If the windmill took all of the power in the wind, the wind would stop, and then there would be no flow of energy over the windmill. I'm not sure I exactly agree with that explanation, but let's leave that here.

On my travel home today I worked on the possibility that there is an equivalent to the Betz limit for sailboats. Is there an energetic way of looking at sailing that is useful?

One paradox is that a sailboat is sailing steadily when the net force on the boat is zero (just like when a windmill is turning at constant angular velocity). In the Betz limit, the windmill is thought of as having two different torques on it, one from the wind, and one from the turbine. Sailing has no turbine. So this problem has a conceptual component to it.

Happy birthday, Rix

noreply@blogger.com (Hogg) — Fri, 19 Jan 2024 22:53:00 +0000

Today was an all-day event at MPIA to celebrate the 60th birthday (and 25th year as Director) of Hans-Walter Rix (MPIA). There were many remarkable presentations and stories; he has left a trail of goodwill wherever he has gone! I decided to use the opportunity to talk about measurement, which is something that Rix and I have discussed for the last 18 years. My slides are here.

I've been very lucky with the opportunities I've had to work with wonderful people.

divide by your selection function, or multiply by it?

noreply@blogger.com (Hogg) — Sun, 14 Jan 2024 23:22:00 +0000

With Kate Storey-Fisher (San Sebastián), Abby Williams (Caltech) is working on a paper about large-angular-scale power, or anisotropy, in the distribution of quasars. It is a great subject; we need to estimate this power in the context of a very non-trivial all-sky selection function. The tradition in cosmology is to divide the data by this selection function. But of course you shouldn't manipulate your data. Instead, you could multiply your model by the selection function. You can guess which one I prefer! In fact you can do either, as long as you weight the data in the right way in the fit. I promised to write up a few words and equations about this for Williams.

why study astrophysics?

noreply@blogger.com (Hogg) — Thu, 11 Jan 2024 21:39:00 +0000

I spent the day with Neige Frankel (CITA), working on various projects. One of the things we discussed was her slides for an upcoming talk. I made the following blanket statement; is it true? There are only two ways to ultimately justify a subject of study in astrophysics. Either it will tell us something important about fundamental physics (think: dark matter, initial conditions of the Universe, or nucleosynthesis, say), or else it will tell us something about our origins (formation of our Galaxy, occurrence of rocky, habitable planets, origin of life, say). I am not entirely sure this is right, but I can't currently think of much in the way of counter-examples. I guess one other justification might be that we are developing technologies that will help people in other areas (CCDs, spacecraft attitude management, or machine learning, say).

Galactic cartography

noreply@blogger.com (Hogg) — Wed, 10 Jan 2024 01:37:00 +0000

Neige Frankel (CITA) and I discussed measurements of the age and metallicity gradients in the Milky Way today. In my machine-learning world, I am working on biases that come in when you use the outputs of regressions (label transfer) to perform population inferences (like mean age as a function of actions or radius). We are gearing up to do a fake but end-to-end simulation of how the Milky Way gets observed, to see if the observed Galaxy looks anything like (what we know in this fake world to be) the truth.

auto-encoder for calibration data

noreply@blogger.com (Hogg) — Tue, 09 Jan 2024 01:11:00 +0000

Connor Hainje (NYU) is looking at whether we could build a hierarchical or generative model of SDSS-V BOSS spectrograph calibration data, such that we could reduce the survey's per-visit calibration overheads. He started by building an auto-encoder, which is a simple, self-supervised generative model. It works really well! We discussed how to judge performance (held-out data) and how performance should depend on the size of the latent space (I predict that it won't want a large latent space). We also decided that we should announce an SDSS-V project and send out a call for collaboration.

[Note added later: Contardo (SISSA) points out that an autoencoder is not a generative model. That's right, but there are multiple definitions of generative model; only one of which is that you can sample from it. Another is that it is a parameterized model that can predict the data. Another is that it is a likelihood function for the parameters. But she's right: We are going to punk parts of the auto-encoder into a generative model in the sense of a likelihood function.]

what book am I going to write?

noreply@blogger.com (Hogg) — Fri, 05 Jan 2024 16:48:00 +0000

One possible new year's resolution this year is for me to decide which book am I going to write? I don't love this, because it is the hallmark of a scientist at the end of the career that they switch to writing books! I guess maybe I'm at the end of my career? But that said, I have (maybe like many scientists at the end of their careers?) a lot to say. Okay anyway, I had a long conversation this morning with Greg McDonald (Rum&Code) about all this, and he strongly encouraged me to make some content for the project code-named ”The Practice of Astrophysics“.

wind power

noreply@blogger.com (Hogg) — Thu, 04 Jan 2024 01:12:00 +0000

I met up with Matt Kleban (NYU) to discuss our dormant project on the physics of sailing. Our conversation ranged around many different things related to sustainable power. In particular, we discussed whether it was possible to take a energy or power point of view on sailing, which has to do with the work that the sailboat is doing on the water and on the air. I feel like there will be some symmetries in play there. We also discussed power generation with wind farms, including the Betz limit (which is a limit on how much power you can get out of the wind). Is there an equivalent of the Betz limit for a sailboat? Finally, Kleban made a remark that is simultaneously obvious and deep: If you have a propeller turning in a fluid (like air), it might be a turbine (generating power from the wind) or a fan (using power to make wind). The question of turbine or fan has a frame-independent (relativistically scalar) answer.

informal scientific communication

noreply@blogger.com (Hogg) — Tue, 02 Jan 2024 17:52:00 +0000

I have been sending out my draft manuscript on machine learning in the natural sciences to various people I know who have opinions on this. I've been getting great feedback, and it reminds me that there is a lot of important scientific communication that is on informal channels. One thing that interests me: Is there a way to make such conversation more public and viewable and research-able?

partial differential equations

noreply@blogger.com (Hogg) — Fri, 29 Dec 2023 17:18:00 +0000

I am trying to write a proposal to fund the research I do on machine-learning theory. The proposal is to work on ocean dynamics. It's a great application for the things we have done! But it's hard to write a credible proposal in an area that's new to you. Interdisciplinarity and agility is not rewarded in the funding system at present! At least I am learning a ton as I write this.

philosophy

noreply@blogger.com (Hogg) — Thu, 28 Dec 2023 15:45:00 +0000

I've been working on two philosophical projects this month. The first has been an interaction with Jim Peebles (Princeton) around a paper he has been writing, setting down his philosophy of physics. I am pretty aligned with his position, which I expect to hit the arXiv soon. I'm not a co-author of that. But one of the interesting things about science is how much of our work in in anonymous (or quasi-anonymous) support of others.

The second philosophical project is a paper about machine learning and science: I am trying to set down my thoughts about how ML can and can't help the sciences. This is fundamentally a philosophy-of-science question, not a science question.

try bigger writing

noreply@blogger.com (Hogg) — Sat, 02 Dec 2023 17:49:00 +0000

I have been buried in job season and other people's projects. That's good! Hiring and advising are the main things we do in this job. But I decided today that I need to actually start a longer writing project that is my own baby. So I started to turn the set of talks I have been giving about machine learning and astrophysics into a paper. Maybe for the new ICML Position Paper call?

Terra Hunting Fall Science Meeting, day 4

noreply@blogger.com (Hogg) — Thu, 30 Nov 2023 23:46:00 +0000

Today we delved into even more detail about how the HARPS3 instrument works, looking at engineering drawings and discussing how charge-coupled devices (CCDs) read out. We discussed the time stability of various parts of the instrument and electronics. We are all very excited about assembly, verification, and testing in Cambridge this summer.

Terra Hunting Fall Science Meeting, day 3

noreply@blogger.com (Hogg) — Wed, 29 Nov 2023 21:59:00 +0000

Today was a delight! In a working session, Clark Baker (Cambridge) gave a beautiful, conceptual and concrete description of how an echelle spectrograph works and the blaze and the resolution and etc. My favorite moment was the aha! moment I had when he described the Littrow condition. This was followed by Alicia Anderson (Cambridge) explaining how the data reduction proceeds. Then she and Federica Rescigno (Exeter) helped us install the data-reduction software for the ESO instruments (ESPRESSO, HARPS-N, etc) and we started reducing raw echelle data.

Before all this there was a wide-ranging discussion of measuring 3-point functions of radial-velocity time series data. This was inpired by the question: Is a Gaussian process a good model for these data? I hope this turns into a project or set of projects.

Terra Hunting Fall Science Meeting, day 2

noreply@blogger.com (Hogg) — Tue, 28 Nov 2023 22:12:00 +0000

So many good things happened in the meeting today! Highlights were presentations by Niamh O'Sullivan (Oxford) Ben Lakeland (Exeter) who showed amazing results running models of stellar variability on data from the Sun. O'Sullivan can see that the sun goes through many different phases of spots, granulation, and super-granulation. She finds these by fitting Gaussian processes of certain forms. Related: Suzanne Aigrain (Oxford) showed that even in very gappy data, the GP fits are unbiased, whereas naive use of periodograms is biased!

Lakeland showed that super-granulation can in principle be modeled in the Solar time series, and maybe the tiniest hint that when he corrects for super-granulation well, the RV variability might be even lower than at times at which there is no super-granulation in play at all. Does super-granulation suppress other kinds of variability?

I'm very optimistic—between Liang yesterday, Zhou's work at Flatiron, and these presentations—that we will be able to mitigate many difficult sources of stellar variability. I was inspired to outline a conceptual paper on why or how this is all going to work.

Terra Hunting Fall Science Meeting, day 1

noreply@blogger.com (Hogg) — Mon, 27 Nov 2023 23:59:00 +0000

Today was the first day of the Terra Hunting annual science meeting. One highlight of the day was a presentation by Yan Liang (Princeton), who is modeling stellar spectral variability (the tiny variability) that affects extremely precise radial-velocity measurements. Her method involves a neural network, which is trained to distinguish RV variations and spectral shape variations through a self-supervised approach (with a data augmentation). Then it separates true stellar RV variations from spectral-variability-induced wrong RV variations by requiring (essentially) that the RV variations be uncorrelated with the (latent) description of the stellar spectral shape. This connects to various themes I am interested in, including wobble by Bedell, a spectral variability project by Zhao, and causal structure in machine learning.