Hogg's Research: causality

Showing posts with label causality. Show all posts

2023-11-13

radical papers I want to write (or will never write)

I have to finish my NSF proposal with Mike Blanton (NYU), so naturally I am in procrastination mode. Here are three papers I wish I would write. Maybe I should post them on my ideas blog:

Occam's Razor is wrong: This paper, co-authored with Jennifer Hill (NYU), would be about the fact that, in the real, observed world, the simplest explanation is always wrong or at least incomplete.

Causation is just causality: This paper, maybe co-authored with David Blei (Columbia) or Bernhard Schölkopf (MPI-IS) or Hill, shows that you don't need to have free will in order to have cogent causal explanations of data. That is, you don't need to phrase causality in terms of predictions for counter-factual experiments that you might have chosen to do.

You don't ever want evidence: This paper shows that any time you are computing the Bayesian evidence—what I call the fully marginalized likelihood (fml)—you are doing the wrong integral and solving the wrong problem. For both practical and theoretical (principled) reasons.

2022-03-17

causality and time ordering

I had a nice chat with David Blei (Columbia) at the end of the day about the question of whether causal inference (a subject in statistics) can be re-phrased in terms of making predictions about the time-ordering of events. He was not extremely positive about that project! But we talked about the causal-inference approaches. I don't like many of them! Because many of them somehow assume that it is possible to intervene on the situation, and how can you intervene on a unitary system (like, say, the Universe)? Does causality not exist in physics? Does the force cause the acceleration or does the acceleration cause the force? There isn't an answer to that in physics.

2021-06-02

orthogonalization in SR, continued

Soledad Villar (JHU) and I discussed more the problem of orthogonalization of vectors—or finding orthonormal basis vectors that span a subspace—in special (and general) relativity. She proposed a set of hacks that correct the generalization of Gram–Schmidt orthogonalization that I proposed a week or so ago. It's complicated, because although the straightforward generalization of GS works with probability one, there are cases you can construct that bork completely. The problem is that the method involves division by an inner product, and if the vector becomes light-like, that inner product vanishes.

2021-02-04

our forwards-backwards results are fading

Gaby Contardo (Flatiron) and I have been working on predicting light-curve data points from their pasts and their futures, to see if there is a time asymmetry. And we have been finding one! But today we discussed results in which Contardo was much more aggressive in removing data at or near spacecraft issues (this is NASA Kepler data). And most of our results go away! So we have to decide where we go from here. Obviously we should publish our results even if they are negative! But how to spin it all...?

2020-12-17

forwards vs backwards modeling of light curves

My day started with a conversation with Gaby Contardo (Flatiron) about modeling light curves of stars. We have projects in which we try to predict forwards and backwards in time, and compare the results. We're trying to make a good scope for a paper, which could involve classification, regression, or causal inference. Or all three. As usual, we decided to write an abstract to help us picture the full scope of the first paper.

2018-02-12

FML, and the Big Bounce

The day started with a realization by Price-Whelan (Princeton) and me that, in our project The Joker, because of how we do our sampling, we have everything we need at the end of the sampling to compute precisely the fully marginalized likelihood of the input model. That's useful, because we are not just making posteriors, we are also making decisions (about, say, what to put in a table or what to follow up). Of course (and as my loyal reader knows), I don't think it is ever a good idea to compute the FML!

At lunch, Paul Steinhardt (Princeton) gave a great black-board talk about the idea that the Universe might have started in a bounce from a previously collapsing universe. His main point (from my perspective; he also has particle-physics objectives) is that the work that inflation does with a quantum mechanism might be possible to achieve with a classical mechanism, if you could design the bounce right. I like that, of course, because I am skeptical that the original fluctuations are fundamentally quantum in nature. I have many things to say here, but I'll just say a few random thoughts: One is that the strongest argument for inflation is the causality argument, and that can be achieved with other space-time histories, like a bounce. That is, the causality (and related problems) are fundamentally about the geometry of the space and the horizon as a function of time, and there are multiple possible universe-histories that would address the problem. So that's a good idea. Another random thought is that there is no way to make the bounce happen (people think) without violating the null-energy condition. That's bad, but so are various things about inflation! A third thought is that the pre-universe (the collapsing one) probably has to be filled with something very special, like a few scalar fields. That's odd, but so is the inflaton! And those fields could be classical. I walked into this talk full of skepticism, and ended up thinking it's a pretty good program to be pursuing.

2017-05-16

falsifying results by philosophical argument

I finally got some writing done today, in the Anderson paper on the empirical, deconvolved color-magnitude diagram. We are very explicitly structuring the paper around the assumptions, and each of the assumptions has a name. This is part of my grand plan to develop a good, repeatable, useful, and informative structure for a data-analysis paper.

I missed a talk last week by Andrew Pontzen (UCL), so I found him today and discussed matters of common interest. It was a wide-ranging conversation but two highlight were the following: We discussed causality or causal explanations in a deterministic-simulation setting. How could it be said that “mergers cause star bursts”? If everything is deterministic, isn't it equally true that star bursts cause mergers? One question is the importance of time or time ordering (or really light-cone ordering). For the statisticians who think about causality this doesn't enter explicitly. I think that some causal statements in galaxy evolution are wrong on philosophical grounds but we decided that maybe there is a way to save causality provided that we always refer to the initial conditions (kinematic state) on a prior light cone. Oddly, in a deterministic universe, causal explanations are mixed up with free will and subjective knowledge questions.

Another thing we discussed is a very neat trick he figured out to reduce cosmic variance in simulations of the Universe: Whenever you simulate from some initial conditions, also simulate from the negative of those initial conditions (all phases rotated by 180 degrees, or all over-densities turned to under, or whatever). The average of these two simulations will cancel out some non-trivial terms in the cosmic variance!

The day ended with a long call with Megan Bedell (Chicago), going over my full list of noise sources in extreme precision radial-velocity data (think: finding and characterizing exoplanets). She confirmed everything in my list, added a few new things, and gave me keywords and references. I think a clear picture is emerging of how we should attack (what NASA engineers call) the tall poles. However, it is not clear that the picture will get set down on paper in time for the Exoplanet Research Program funding call!

2015-04-13

vary all the exposure times!

Ruth Angus showed up for a few days, and we talked out the first steps to make an argument for taking time-series data with variable exposure times. We all know that non-uniform spacing of data helps with frequency recovery in time series; our new intuition is that non-uniform exposure time will help as well, especially for very high frequencies (short periods). We are setting up tests now with Kepler data but an eye to challenging the TESS mission to biting a big, scary bullet.

After complaining for the millionth time about PCA (and my loyal reader—who turns out to be Todd Small at The Climate Corporation—knows I love to hate on the PCA), Foreman-Mackey and I finally decided to fire up the robust PCA or PCP method from compressed sensing (not the badly-re-named "robust PCA" in the astronomy literature). The fundamental paper is Candès et al; the method has no free parameters, and the paper includes ridiculously simple pseudo-code. It looks like it absolutely rocks, and obviates all masking or interpolation of missing or bad data!

At lunch, Gabriele Veneziano (Paris, NYU) spoke about graviton–graviton interactions and causality constraints. Question that came up in the talk: If a particle suffers a negative time delay (like the opposite of a gravitational time delay), can you necessarily therefore build a time machine? That's something to dine out on.

2014-08-13

causal pixel modeling, day 2

In re-reading yesterday's post, I found it strange to hear myself say that the model was over-fitting stellar variability and then we decided to make the model far more flexible! Today we decided that we don't yet have the technology (well, perhaps not the patience, since we want to detect exoplanets asap) to fully separate stellar variability from spacecraft-induced issues, or at least we would have to do something that pooled much more data to do it—we wouldn't be able to work on one light-curve at a time. So we de-scoped to exoplanet science and decided that we would try to fit out everything except the exoplanet transits. This is not unlike what others are doing, except that we are trying to be extremely principled about not letting information in the data about any exoplanet transits "leak" into our modeling of the variability of the light-curve. We are doing this with a censoring or a train-and-test framework.

Because we decided to eradicate all variability—spacecraft and stellar—we had Wang work on auto-regressive models, in which the past and future of the star is used to predict the present of the star. The first results are promising. We also had Foreman-Mackey put all the other stars into the Gaussian Process predictions we are making. This means we are are doing Gaussian Process regression and prediction with thousands of ambient dimensions (features). That seems insane to me, but Schölkopf insists that it will work—being non-parametric, GPs scale in complexity with the number of data points, not the number or size of the features. I will believe it when I see it. The curse of dimensionality and all that!

In the afternoon, we had discussions with Krik Muandet (MPI-IS) and David Lopez-Paz (MPI-IS) about false-positive classification for exoplanet search using supervised methods and a discussion with Michael Hirsch (MPI-IS) about non-parametric models for the imaging PSF. More on the former tomorrow, I very much hope!

2014-08-12

causal pixel modeling, day 1

Today Foreman-Mackey and I arrived in Tübingen to work with Schölkopf. On arrival, we got Dun Wang on the phone, because our trip to MPI-IS is designed to make huge progress on Wang's recalibration of the Kepler satellite detector pixels, using the variations that are found in common across stars. The way Schölkopf likes to say it is that we are capitalizing on the causal structure of the problem: If stars (or, really, pixels illuminated by stars) co-vary it must be because of the telescope, since the stars are causally disconnected. The goal of our work on this is to increase the sensitivity of the satellite to exoplanet transits.

We opened the day with two questions: The first was about why, despite this causal argument, we seem to be able to over-fit or fit out stellar variability. We are being careful with the data (using a train-and-test framework) to ensure that no information about the short-term variability of the star near any putative transit is leaking into the training of the predictive model. My position is that it is because our set of prediction stars might span the full basis of anything a star can do. We are using thousands of stars as features!

The second question was about why, in our residuals, there seems to be some trace of the spacecraft variability. We don't know that for sure, but just at an intuitive visual level it looks like the fitting process is not only not removing the spacecraft, but actually increasing the calibration "noise". We started Wang on tests of various hypotheses, and put Foreman-Mackey on trying models that are far more flexible than Wang's purely linear model.

2014-07-29

DDD meeting, day 2

On the second day of the Moore Foundation meeting, I gave my talk (about flexible models for exoplanet populations, exoplanet transits, and exoplanet-discovering hardware calibration). After my talk, I had a great conversation with Emmanuel Candès (Stanford), who asked me very detailed questions about my prior beliefs. I realized in the conversation that I have been violating all my own rules: I have been setting my prior beliefs about hyper-parameters in the space of the hyper-parameters and not in the space of the data. That is, you can only assess the influence and consistency of the prior pdf (consistency with your actual beliefs) by flowing the prior through the probabilistic model and generating data from it. I bet if I did that for some of the problems I was showing, I would find that my priors are absurd. This is a great rule, which I often say to others but don't do myself: Always sample data from your prior (not just parameters). This is a rule for Bayes but also a rule for those of us who eschew realism! More generally, Candès's expressed the view that priors should derive from data—prior data—a view with which I agree deeply. Unfortunately, when it comes to exoplanet populations, there really aren't any prior data to speak of.

There were many excellent talks again today; again this is an incomplete set of highlights for me: Titus Brown (MSU) explained his work on developing infrastructure for biology and bioinformatics. He made a number of comments about getting customer (or user) stories right and developing with the current customer in mind. These resonated for me in my experiences of software development. He also said that his teaching and workshops and outreach are self-interested: They feed back deep and valuable information about the customer. Jeffrey Heer (UW) said similar things about his development of DataWrangler, d3.js, and other data visualization tools. (d3.js is github's fourth most popular repository!) He showed some beautiful visualizations. Heer's demo of DataWrangler simply blew away the crowd, and there were questions about it for the rest of the day.

Carl Kingsford (CMU) caused me (and others) to gasp when he said that the Sequence Read Archive of biological sequences cannot be searched by sequence. It turns out that searching for strings in enormous corpuses of strings is actually a very hard problem (who knew?). He is using a new structure called a Bloom Filter Tree, in which k-mers (length-k subsections) are stored in the nodes and the leaves contain the data sets that contain those k-mers. It is very clever and filled with all the lovely engineering issues that the Astrometry.net data structures were filled with lo so many years ago. Kingsford focuses on writing careful code, so the combination of clever data structures and well written code gets him orders of magnitude speed-ups over the competition.

Causal inference was an explicit or implicit component of many of the talks today. For example, Matthew Stephens (Chicago) is using natural genetic variations as a "randomized experiment" to infer gene expression and function. Laurel Larson (Berkeley) is looking for precursor events and predictors for abrupt ecological changes; since her work is being used to trigger interventions, she requires a causal model.

Blair Sullivan (NC State) spoke about performing inferences with provable properties on graphs. She noted that most interesting problems are NP hard on arbitrary graphs, but become easier on graphs that can be embedded (without crossing the edges) on a planar or low-genus space. This was surprising to me, but apparently the explanation is simple: Planar graphs are much more likely to have small sets of vertices that split the graph into disconnected sub-graphs. Another surprising thing to me is that "motif counting" (which I think is searching for identical subgraphs within a graph) is very hard; it can only be done exactly and in general for very small subgraphs (six-ish nodes).

The day ended with Laura Waller (Berkeley) talking about innovative imaging systems for microscopy, including light-field cameras, and then a general set of cameras that do non-degenerate illumination sequences and infer many properties beyond single-plane intensity measurements. She showed some very impressive demonstrations of light-field inferences with her systems, which are sophisticated, but built with inexpensive hardware. Her work has a lot of conceptual overlap with astronomy, in the areas of adaptive optics and imaging with non-degenerate masks.