tag:blogger.com,1999:blog-10448119Mon, 27 Apr 2015 00:44:26 +0000talkingimagingmodelbayesstatisticsseminardatawritingcodegalaxystarexoplanetsdssastrometryspectroscopycalibrationtimeMilky Waynot researchdynamicsphotometryinformationKeplerquasarcosmologygalexMCMCmeetingsubstructurepracticekinematicsoptimizationgravitational lensingproposaltelescopeGaussian processliteraturegraphical modelproper motionpoint-spread functiontravelphilosophyfundingstar formationcatalogclassificationnoiseatlasdecisioncomputingclusteringtractormathematicschemistryvisualizationblack holegravitydustgaiadark sectorspitzermeta datasupernovaCDMmerginglinear algebrapanstarrsradial velocityweb 2.0thinkingSolar SystemclustersearchLSSTLTFDFCFinterstellar mediumwhite dwarfparticle physicsHSTintergalactic mediumlifereadingcosmographyradioregressioncausationcometlarge-scale structureasteroseismologybaryon acoustic featureHMFfundamental astronomytestingtransparencyFermiPHATpulsarEuclidhardwareenvironmentthresherTheCannonbrown dwarfcitizen sciencetheoryEarthdatabaseevolutionexperimentprimusultravioletHerschelcharge-coupled devicediskhalophase spaceplanetroweiselectricity and magnetismengineeringpoliticsrobotamateurdigital cameraemailobservingparallaxwisePlanckTESScosmic raygamma-ray bursthipparcosinterferometryusno-bAPIarchetypeopen scienceosssproject managementsciencetextanthropicarchiveminor planetrelativitythermodynamicsWMAPastrobiologycausalitycoffeedeep learninginflationmusicBartLHCad hockerycompressiondrinkingeatingeditinggastrophysicsnuclear physicspost-starburstquantum mechanicssonificationstring theory2massLIGOP1640PTFWFIRSTanthropologyarchitecturecorrelationdaftdissertationfarm machineryflickrgame theorynasaweatherChandraDESIKNNLMIRcamNuSTARSVMVLAWillman 1administrationballoonconfusionethnographyfrisbeegamegeometryhandicappingintelligencelawlearningmachine learningplasmapolemicpressscatteringsemanticssignal processingsocial mediasoundukidssvirtual observatoryweaponsx-rayHogg's Researchgalaxies, stellar dynamics, exoplanets, and fundamental astronomyhttp://hoggresearch.blogspot.com/noreply@blogger.com (Hogg)Blogger2400125tag:blogger.com,1999:blog-10448119.post-4538568180287080299Thu, 23 Apr 2015 03:59:00 +00002015-04-26T16:47:41.553-04:00asteroseismologybayesexoplanetKeplermeetingspectroscopystarstatisticstalkingtimestellar spectra with issues, probabilistic frequency differences<p>At group meeting, Kopytova described why she wants to measure C/O ratios for very cool stars and hot planets—it is to look at where they formed in the proto-planetary disk. We discussed the (frequently arising) point that the spectra have bad continuum normalization (or, equivalently, bad calibration) and so it is hard to compare the models to the data at the precision of the data. This problem is not easily solved; many investigators "do the same thing" to the data and the models to match the continuum normalizations. However, these continuum procedures are usually signal-to-noise-dependendent; models are rarely at the same signal-to-noise as the data! Anyway, we proposed a simple plan for Kopytova, very similar to Foreman-Mackey's <i>K2</i> work: We will instantiate many nuisance parameters (to cover calibration issues), infer them simultaneously, and marginalize them out. Similar stuff has crossed this space associated with the names Cushing and Czekala!</p><p>NYU CDS MSDS student Bryan Ball told us about his parameterized model of a periodogram, and his attempts to fit it using likelihood optimization. He is well on the way to having a probabilistic approach to obtaining the "large frequency difference" of great importance in asteroseismology. At the end of the meeting, Foreman-Mackey showed us an awesome demo of the method we are calling "PCP" that models a data matrix as a sum of a low-rank (PCA-like) matrix and a sparse (mostly zero) matrix. The latter picks up the outliers, and handles the missing data.</p>http://hoggresearch.blogspot.com/2015/04/stellar-spectra-with-issues.htmlnoreply@blogger.com (Hogg)1tag:blogger.com,1999:blog-10448119.post-6398949820713758654Wed, 22 Apr 2015 03:59:00 +00002015-04-25T21:46:39.954-04:00asteroseismologynot researchsignal processingstatisticsTESSthinkingtimeorthogonality of signals in data<p>In a low-research day—it was the site visit from the Moore and Sloan Foundations to the Moore–Sloan Data Science Environment at NYU—I got a few minutes of thinking in over lunch on my variable exposure-time project: In the same way that I can calculate the Cramér–Rao bound on the amplitudes at different frequencies, I can also look at the orthogonality or the mutual information in different frequencies. That is, I can show (I hope) that there is far less strict aliasing in the case of the variable exposure times than there is in the case of the uniform exposure times.</p>http://hoggresearch.blogspot.com/2015/04/orthogonality-of-signals-in-data.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-8635645637283853265Tue, 21 Apr 2015 03:59:00 +00002015-04-25T21:46:04.407-04:00asteroseismologycalibrationcodeLSSTpoint-spread functionstatisticstelescopeTESStimevary all the exposure times!<p><i>[OMG I have been behind on posting. I will catch up this weekend (I hope).]</i></p><p>I have been batting around for years the idea of writing a paper about varying the exposure times in a survey. Typically, I have been thinking about such variation to test the shutter, look for systematics (like non-linearity) in the devices, extend dynamic range (that is, vary the brightness at which saturation happens) , and benefit from the lucky-imaging-like variation in the point-spread function. For all these reasons, I think the <i>LSST</i> project would be crazy to proceed with its (current) plan of doing 15+15 sec exposures in each pointing.</p><p>Recently, in conversations with Angus and Foreman-Mackey, I got interested in the asteroseismology angle on this: Could we do much better on asteroseismology by varying exposure times? I came up with a Cramér–Rao-bound formalism for thinking about this and started to code it up. It looks like a survey with (slightly) randomized exposure times vastly outperforms a survey with uniform exposure times on many fronts. Here's a plot from the first stab at this:</p><a href="http://2.bp.blogspot.com/-ss_gOCeWJyk/VTqrlycdBkI/AAAAAAAAemU/BzvLANdVoGA/s1600/crb.png" imageanchor="1" ><img width="500px" border="0" src="http://2.bp.blogspot.com/-ss_gOCeWJyk/VTqrlycdBkI/AAAAAAAAemU/BzvLANdVoGA/s1600/crb.png" /></a><br />
http://hoggresearch.blogspot.com/2015/04/vary-all-exposure-times_20.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-7257547673841126456Sat, 18 Apr 2015 03:59:00 +00002015-04-20T20:40:52.751-04:00clusteringcosmologydark sectorgravitational lensinglarge-scale structurelinear algebrameetingseminartalkingtimePhil Marshall<p>Phil Marshall showed up today to give the astrophysics seminar. He also attended the CampHogg group meeting. In his seminar, he talked about finding and exploiting strong gravitational lenses in large sky surveys to make precise (and, importantly, accurate) inferences about the expansion history (or redshift—distance relation). He showed that when you are concerned that you might be affected by severe systematics, the best approach is to make your model <i>much more flexible</i> but then learn the relationships among the new nuisance parameters that make the much more flexible model nonetheless still informative. This requires hierarchical inference, which both Marshall and I have been pushing on the community for some years now.</p><p>In group meeting, he had the group members talk about the things they are most excited about. Among other things, this got Angus talking about periodograms with much better noise models under the hood and it got Foreman-Mackey talking about linear algebra tricks that might change our lives. Huppenkothen blew Marshall away with her example light-curves from GRS 1915. Marshall himself said he was excited about building a full three-dimensional model of all the mass density inside the Hubble volume, using both weak lensing and large-scale structure simultaneously. He has some ideas about baby steps that might make first projects tractable in the short run.</p>http://hoggresearch.blogspot.com/2015/04/phil-marshall.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-6137755911629293345Fri, 17 Apr 2015 03:59:00 +00002015-04-20T19:36:06.830-04:00not researchpracticeproject managementtalkingrecap<p>The only real research I did today was a recap of projects with Foreman-Mackey as he prepares to complete his thesis. There are a lot of projects open, and there is some decision-making about what ought to be highest priority.</p>http://hoggresearch.blogspot.com/2015/04/recap.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-2271917901042684809Thu, 16 Apr 2015 03:59:00 +00002015-04-19T14:57:52.452-04:00calibrationdataemailmeetingmodelsdssspectroscopytalkingtracking the sky, spectroscopically<p>At group meeting today, Blanton spoke at length about telluric corrections and sky subtraction in the various spectroscopic surveys that make up <i>SDSS-IV</i>. His feeling is that the number of sky and telluric-standard fibers assigned in the surveys might not be optimal given the variability of the relevant systematics. He enlisted our help in analyzing that situation. In particular, what model complexity does the data support? And, given that model complexity, what is the best sampling of the focal plane with telluric standards (and sky fibers)? I agreed to write down some ideas for the <i>SDSS-IV</i> mailing lists.</p>http://hoggresearch.blogspot.com/2015/04/tracking-sky-spectroscopically.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-7950166954041802290Wed, 15 Apr 2015 03:59:00 +00002015-04-15T09:53:17.619-04:00bayesexoplanetGaussian processgravitylinear algebraMCMCmodelphotometrypulsarseminarspectroscopystarstatisticstalkingtime#astrohackny, week N+1<p>The day started at #astrohackny with Foreman-Mackey and I arguing about convolutions of Gaussians. The question is: Consider a model (probability of the data given parameters) with two (linear) parameters of importance and 150 (linear) nuisance parameters. There is a very weak Gaussian prior on the nuisance parameters. How to write down the marginalized likelihood such that you only have to do a 2x2 least squares, not a 152x152 least squares? I had a very strong intuition about the answer but no solid argument. Very late at night I demonstrated that my intuition is correct, by the method of experimental coding. Not very satisfying, but my abilities to complete squares with high-dimensional linear operators are not strong!</p><p>Taisiya Kopytova (MPIA) is visiting NYU for a couple of months, to work on characterizing directly imaged extra-solar planets. We discussed the simultaneous fitting of photometry and spectroscopy, one of my favorite subjects! I, of course, recommended modeling the calibration (or, equivalently, continuum-normalization) issues simultaneously with the parameter estimation. We also discussed interpolation (of the model grid) and MCMC sampling and the likelihood function.</p><p>At Pizza Lunch at Columbia, Chiara Mingarelli (Caltech) talked about the Pulsar Timing Array and its project to detect the stochastic background of gravitational waves. The beautiful thing about the experiment is that it detects the motion of the <i>Earth</i> relative to the pulsars, not the individual motions of the pulsars, and it does so using time correlations in timing residuals as a function of angle between the pulsars. The assumption is that the ball of pulsars is far larger than the relevant wavelengths, and that different pulsars are causally unconnected in time. Interesting to think about the "multiple hypotheses" aspects of this with finite data.</p>http://hoggresearch.blogspot.com/2015/04/astrohackny-week-n1.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-4289760288949575535Tue, 14 Apr 2015 03:59:00 +00002015-04-14T23:32:16.549-04:00causalityKeplerlinear algebraparticle physicsseminarstatisticstalkingTESStimevary all the exposure times!<p>Ruth Angus showed up for a few days, and we talked out the first steps to make an argument for taking time-series data with variable exposure times. We all know that non-uniform spacing of data helps with frequency recovery in time series; our <i>new</i> intuition is that non-uniform exposure time will help as well, especially for very high frequencies (short periods). We are setting up tests now with <i>Kepler</i> data but an eye to challenging the <i>TESS</i> mission to biting a big, scary bullet.</p><p>After complaining for the millionth time about PCA (and my loyal reader—who turns out to be Todd Small at The Climate Corporation—knows I love to hate on the PCA), Foreman-Mackey and I finally decided to fire up the robust PCA or PCP method from compressed sensing (not the badly-re-named "robust PCA" in the astronomy literature). The fundamental paper is <a href="http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf">Candès et al</a>; the method has no free parameters, and the paper includes ridiculously simple pseudo-code. It looks like it absolutely rocks, and obviates all masking or interpolation of missing or bad data!</p><p>At lunch, Gabriele Veneziano (Paris, NYU) spoke about graviton–graviton interactions and causality constraints. Question that came up in the talk: If a particle suffers a negative time delay (like the opposite of a gravitational time delay), can you necessarily therefore build a time machine? That's something to dine out on.</p>http://hoggresearch.blogspot.com/2015/04/vary-all-exposure-times.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-6021176823599734926Fri, 10 Apr 2015 03:59:00 +00002015-04-10T14:44:18.598-04:00bayesdataexoplanetfarm machineryGaussian processseminartalkingweatherThe Climate Corporation<p>I spent a bit of today at <i>The Climate Corporation</i>, hosted by former astronomer Todd Small. He told me about things they work on, which include field-level predictions for farmers about rainfall and sunlight, precise prediction and instructions for irrigation and fertilization, and advice about crop density (distances between seeds). He said they get data feeds from Earth observing but also from soil testing and even the farm machinery itself (who knew: combine harvesters produce a data feed!). A modern piece of large-scale farm equipment is precisely located on its path by GPS, makes real-time decisions about planting or whatever it is doing, and produces valuable data. They even have a tractor simulator in the house for testing systems.</p><p>We talked for a bit about the challenging task of communicating with clients (farmers, in this case) about probabilistic information, such as measurement uncertainties and distributions over predictions. This is another motivation for developing an undergraduate data-science educational program: It doesn't matter what industry you are in, you will need to be able to understand and communicate about likelihoods and posterior probabilities. I gave an informal seminar to a part of the climate modeling team about our uses of Gaussian Processes in exoplanet discovery.</p>http://hoggresearch.blogspot.com/2015/04/the-climate-corporation.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-5047703017031591189Thu, 09 Apr 2015 03:59:00 +00002015-04-09T11:20:52.931-04:00exoplanetgalaxygravitational lensinghardwareimagingKeplerradial velocityseminarspectroscopystartalkingtelescopeTheCannontimeCaltech for a day<p><i>[On a research break this week; hence the lack of posts.]</i></p><p>I spent the day at Caltech Astronomy, with a dense schedule! Judy Cohen and I talked about RR Lyrae she is finding in the Milky Way halo and the challenge of getting a completeness function for statistical inferences. Pete Mao described to me the fiber positioners they are building for the <i>Prime Focus Spectrograph</i> multi-object spectrograph for Subaru. They have 50 seconds to get 2500 actuators in place (to a tolerance of 10-ish microns). Leslie Rogers and Ellen Price told me about Price's senior-thesis project to work out the composition constraints on short-period planets from tidal stress and the requirement that they not break up or mass-transfer. I was surprised to learn that there are some planets so close in that the tidal constraints rule out a range of compositions.<p></p>Nick Konidaris showed me new spectrographs being built in the high bay, and we talked about choices for spectral resolution. Adi Zitrin showed me amazing image of massive galaxy clusters he is using as telescopes to magnify very high-redshift galaxies; he has amazing examples, and also some nice techniques for building mass models. He does a great job of explaining arc multiplicity and morphology.</p><p>At lunch, students Rebecca Jensen-Clem and Michael Bottom told me about projects in high-contrast imaging, and Allison Strom and Sirio Belli told me about measuring the physical properties of galaxies in the redshift range 2 to 3. I tried to pitch my data-driven approaches at them: In the former area you might think about learning the actuator commands given the wavefront data directly from an optimization (with, presumably, the quality of the science image encapsulated in the objective function). In the latter, you might think about making a graph of galaxy spectra, where galaxies are joined by edges in the graph if their spectra are similar under any (reasonable) assumption about dust extinction. The students were (rightly) suspicious about both options!</p><p>Adam Miller and I discussed correct uses of machine learning in astronomy (since he is a great practitioner), and I once again pitched at him the possibility that we <i>could</i> try to replace their random-forest classifier in the time domain with a generative model of all stellar variable types. It would be hard, but exceedingly satisfying to do that. We discussed training-set imbalance and some clever ideas he has about combating it.</p><p>I asked Heather Knutson about how to get our (new) single transits we are finding in <i>Kepler</i> followed up with radial-velocity spectrographs. She made a great point about our future population inferences: Anything we infer from single transits makes a prediction for radial-velocity surveys and we should try to encourage the radial-velocity groups to confirm or deny in parallel. I discussed cadence and exposure-time issues for the <i>Zwicky Transient Factory</i> with Eric Bellm and visualization and interface for the <i>LSST</i> data with George Helou. I gave the Astrophysics Colloquium there about <i>The Cannon</i> and data-driven models (by my definition) and the day ended with a great dinner with Tom Soifer, Chuck Steidel, George Djorgovski, and Judy Cohen. It was a great trip!</p>http://hoggresearch.blogspot.com/2015/04/caltech-for-day.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-1447066218504707043Fri, 03 Apr 2015 03:59:00 +00002015-04-09T11:14:34.983-04:00bayescoffeecosmologydark sectordeep learninggalaxyimaginglarge-scale structurelinear algebraMCMCMilky Wayseminartalkingcosmology, now with less approximation; RNADE<p>In the morning, Mike O'Neil (NYU), Iain Murray (Edinburgh), Malz, Vakili, and I met for coffee to talk cosmology and cosmological inference. We discussed the linear algebra of making full (non-approximate) likelihood calls for cosmic background inference, which includes determinant and solve calls for enormous (dense) matrices. A few days ago, Wandelt made the (perhaps obscure) comment to me that he did his Gibbs sampling in the CMB to avoid making determinant calls. I finally understood this: The determinant is big and expensive in part because it is like an integral or marginalization over the nuisance parameters (which are the initial conditions or phases). If you can compute the determinant, you get something you can think of as being a marginalized likelihood function. The Gibbs sampling is part of a system that does posterior sampling, so it does the marginalization but doesn't return the amplitude of the likelihood function. If you <i>need</i> a marginalized likelihood function, you can't do it with the Gibbs sampling. Murray once again made the simple point that the posterior (for the cosmological parameters) will in general be much easier to compute and represent than the likelihood function (the probability of the data given the cosmological parameters) because the former is a much smaller-dimensional thing. That's a deep point!</p><p>That deep point played also into his talk in the afternoon about his <i>RNADE</i> method (and code) to use deep learning to represent very complex densities, or estimate a density given a sampling. One of his application areas is a project to obtain posterior constraints on the mass of the Milky Way given (as data) the properties of the Magellanic Clouds. The theory in this case is represented by a large number of numerical simulations, but Murray wants a continuous density to do inference. <i>RNADE</i> is really impressive technology.</p>http://hoggresearch.blogspot.com/2015/04/cosmology-now-with-less-approximation.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-2926451030588366452Thu, 02 Apr 2015 03:59:00 +00002015-04-04T12:47:33.799-04:00bayescosmologyexoplanetgalaxygraphical modelinformationmeetingmodelstartalkingtimechallenging inferences<p>Group meeting featured discussion from our two distinguished visitors this week, Brendon Brewer (Auckland) and Iain Murray (Edinburgh). Brewer described how he and Foreman-Mackey intend to re-do some exoplanet populations inference with something that he calls "joint-space sampling" and in the context of what you might call "likelihood-free inference" (although Brewer objects to even that label) or "approximate Bayesian computation" (a label I despise, because aren't <i>all</i> Bayesian computations approximate?).</p><p>The idea is that we have the counts of transiting systems with 0, 1, 2, or etc transiting planets. What is the true distribution of planetary system multiplicities implied by those counts? Brewer calls it joint-space sampling because to answer this question requires sampling in the population parameters, <i>and</i> all the parameters of all the individual systems. The result of the posterior inference, of course, depends on everything we assume about the systems (radius and period distributions, detectability, and so on). One point we discussed is what is lost or gained by restricting the data: In principle we should always use <i>all</i> the data, as opposed to just the summary statistics (the counts of systems). That said, the approach of Brewer and Foreman-Mackey is going to be fully principled, subject to the (strange) constraint that all you get (as data) are the counts.</p><p>Murray followed this up by suggesting a change to the ABC or LFI methods we usually use. Usually you do adaptive sampling from the prior, and reject samples that don't reproduce the data (accurately enough). But since you did lots of data simulations, you could just <i>learn</i> the conditional probability of the parameters given the data, and evaluate it at the value of the data you have. In general (his point is), you can learn these conditional probabilities with deep learning (as his <a href="http://arxiv.org/abs/1306.0186">RNADE code and method</a> does routinely).</p><p>Murray also told us about a project with Huppenkothen and Brewer to make a hierarchical generalization of our <i>Magnetron</i> project <a href="http://arxiv.org/abs/1501.05251">published here</a>. In this, they hope to hierarchically infer the properties of all bursts (and the mixture components or words that make them up). The challenge is to take the individual-burst inferences and combine them subsequently. That's a common problem here at <i>CampHogg</i>; the art is deciding where to "split" the problem into separate inferences, and how to preserve enough density of samples (or whatever) to pass on to the data-combination stage.</p><p>This was followed by Malz telling us about his project to infer the redshift density of galaxies given noisy photometric-redshift measurements, only just started. We realized in the conversation that we should think about multiple representations for surveys to output probabilistic redshift information, including quantile lists, which are fixed-length representations of pdfs. As I was saying that we need likelihoods but most surveys produce posteriors, Murray pointed out that in general posteriors are much easier to produce and represent (very true) so we should really think about how we can work with them. I agree completely.</p>http://hoggresearch.blogspot.com/2015/04/challenging-inferences.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-2429075728820465363Wed, 01 Apr 2015 03:59:00 +00002015-04-01T15:42:27.046-04:00bayescodecosmologydataimagingkinematicsmodelPlanckradial velocitysdssspectroscopystartalking#astrohackny, day N<p>At #astrohackny, Ben Weaver (NYU) showed a huge number of binary-star fits to the <i>APOGEE</i> individual-exposure heliocentric radial velocity measurements. He made his code fast, but not yet sensible, in that it treats all possible radial-velocity curves as equally likely, when some are much more easily realized physically than others. In the end, we hope that he can adjust the <i>APOGEE</i> shift-and-add methodology and make better combined spectra.</p><p>Glenn Jones (Columbia) and Malz showed some preliminary results building a linear Planck foreground model, using things that look a lot like PCA or HMF. We argued out next steps towards making it a probabilistic model with more realism (the beam and the noise model) and more flexibility (more components or nonlinear functions). Also, the model has massive degeneracies; we talked about breaking those.</p>http://hoggresearch.blogspot.com/2015/03/astrohackny-day-n.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-3515786999793110016Tue, 31 Mar 2015 03:59:00 +00002015-03-31T21:04:41.964-04:00bayesdustimagingmodelseminarstarstatisticstalkingtimelight echos<p>I hung out in the office while Tom Loredo (Cornell), Brendon Brewer (Auckland), Iain Murray (Edinburgh), and Huppenkothen all argued about using dictionary-like methods to model a variable-rate Poisson process or density. Quite an assemblage of talent in the room! At lunch-time, Fed Bianco talked about light echos. It is such a beautiful subject: If the light echo is a linear response to the illumination, I have this intuition that we could (in principle) infer the full three-dimensional distribution of all the dust in the Galaxy and the time-varible illumination from all the sources. In principle!</p>http://hoggresearch.blogspot.com/2015/03/light-echos.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-2483764143786192242Sun, 29 Mar 2015 03:59:00 +00002015-03-31T21:05:38.101-04:00asteroseismologycalibrationcodeinformationstarTESSthinkingtimeradical fake TESS data<p>This past week, the visit by Zach Berta-Thompson (MIT) got me thinking about possible imaging surveys with non-uniform exposure times. In principle, at fixed bandwidth, there might be far more information in a survey with jittered exposure times than in a survey with uniform exposure times. In the context of <i>LSST</i> I have been <a href="http://hoggresearch.blogspot.com/2014/08/my-very-busy-week-ended-with-seminar-on.html">thinking about this</a> in terms of saturation, calibration, systematics monitoring, dynamic range, and point-spread function. However, in the context of <i>TESS</i> the question is all about frequency content in the data: Can we do asteroseismology at frequencies way higher than the inverse mean exposure time if the exposure times are varied properly? This weekend I started writing some code to play in this sandbox, that is, to simulate <i>TESS</i> data but with randomized exposure times (though identical total data output).</p>http://hoggresearch.blogspot.com/2015/03/radical-fake-tess-data.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-5540542628833575349Fri, 27 Mar 2015 03:59:00 +00002015-03-31T20:58:52.815-04:00bayescodecosmic raycosmographycosmologyexoplanetgalaxyimaginglarge-scale structurestatisticstalkingTESSprobabilistic density inference, TESS cosmics<p>Boris Leistedt (UCL) showed up for the day; we discussed projects for the future when he is a Simons Postdoctoral Fellow at NYU. He even has a shared Google Doc with his plans, which is a very good idea (I should do that). In particular, we talked about small steps we can take towards fully probabilistic cosmology projects. One is performing local inference of large-scale structure to hierarchically infer (or shrink) posterior information about the redshift-space positions of objects with no redshift measurement (or imprecise ones).</p><p>Zach Berta-Thompson (MIT) reported on his efforts to optimize the hyper-parameters of my online robust statistics method for cosmic-ray mitigation in the <i>TESS</i> spacecraft. He found values for the two hyper-parameters such that, for some magnitude ranges, my method beats his simple and brilliant middle-eight-of-ten method. However, because my method is more complicated, and because it seems to have its success depends (possibly non-trivially) on his (somewhat naive) <i>TESS</i> simulation, he is inclined to stick with middle-eight-of-ten. I asked him for a full and complete search of the hyper-parameter space but agreed with his judgement in general.</p>http://hoggresearch.blogspot.com/2015/03/probabilistic-density-inference-tess.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-8785268442381122738Thu, 26 Mar 2015 03:59:00 +00002015-03-31T09:21:11.813-04:00asteroseismologycalibrationcodecosmic rayimagingphotometrypracticestatisticstalkingTESSonline, on-board robust statistics<p>Zach Berta-Thompson (MIT) showed up at NYU today to discuss the on-board data analysis performed by the <i>TESS</i> spacecraft. His primary concern is cosmic rays: With the thick detectors in the cameras, cosmic rays will affect a large fraction of pixels in a 30-minute exposure. Fundamentally, the spacecraft takes 2-second exposures and co-adds them on-board, so there are lots of options for cosmic-ray mitigation. The catch is that the computation all has to be done <i>on board</i> with limited access to RAM and CPU.</p><p>Berta-Thompson showed that a "middle-eight-of-ten" strategy (every 10 sub-exposures average all but the highest and the lowest) does a pretty good job. I proposed something that looks like the standard "iteratively reweighted least squares" algorithm, but operating in an "online" mode where it can only see the last few elements of the past history. Berta-Thompson, Foreman-Mackey, and I tri-coded it in the Center for Data Science studio space. The default algorithm I wrote down didn't work great (right out of the box) but there are two hyper-parameters to tune. We put Berta-Thompson onto tuning.</p>http://hoggresearch.blogspot.com/2015/03/online-on-board-robust-statistics.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-3187032664081159182Wed, 25 Mar 2015 03:59:00 +00002015-03-30T07:31:21.509-04:00calibrationdissertationeatingexoplanetgalexKeplermachine learningstatisticstalkingdissertation transits<p>Schölkopf, Foreman-Mackey, and I discussed the single-transit project, in which we are using standard machine learning and a lot of signal injections into real data to find single transits in the <i>Kepler</i> light curves. This is the third chapter of Foreman-Mackey's thesis, so the scope of the project is limited by the time available! Foreman-Mackey had a breakthrough on how to split the data (for each star) into train, validate, and test such that he could just do three independent trainings for each star and still capture the full variability. False positives remain dominated by rare events in individual light curves.</p><p>With Dun Wang, we discussed the <i>GALEX</i> photon project; his job is to see what about the photons is available at <i>MAST</i>, if anything, especially anything about the focal-plane coordinates at which they were detected (as opposed to celestial-sphere coordinates). This was followed by lunch at <i>facebook</i> with Yann LeCun.</p>http://hoggresearch.blogspot.com/2015/03/dissertation-transits.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-3252799486232088916Tue, 24 Mar 2015 03:59:00 +00002015-03-24T15:20:30.524-04:00cosmologydataeatingexoplanetimagingKeplerlinear algebraPlanckradioseminartalkingSimons Center for Data Analysis<p>Bernhard Schölkopf arrived for a couple of days of work. We spent the morning discussing radio interferometry, <i>Kepler</i> light-curve modeling, and various thing philosophical. We headed up to the <i>Simons Foundation</i> to the Simons Center for Data Analysis for lunch. We had lunch with Marina Spivak (Simons) and Jim Simons (Simons). With the latter I discussed the issues of finding exoplanet rings, moons, and Trojans.</p><p>After lunch we ran into Leslie Greengard (Simons) and Alex Barnett (Dartmouth), with whom we had a long conversation about the linear algebra of non-compact kernel matrices on the sphere. This all relates to tractable non-approximate likelihood functions for the cosmic microwave background. The conversation ranged from cautiously optimistic (that we could do this for <i>Planck</i>-like data sets) to totally pessimistic, ending on an optimistic note. The day ended with a talk by Laura Haas (IBM) about infrastructure (and social science) she has been building (at IBM and in academic projects around data-driven science and discovery. She showed a great example of drug discovery (for cancer) by automated "reading" of the literature.</p>http://hoggresearch.blogspot.com/2015/03/simons-center-for-data-analysis.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-3939431033833315516Sat, 21 Mar 2015 03:59:00 +00002015-03-24T15:27:32.780-04:00practicereadingwritinghealth<p>I took a physical-health day today, which means I stayed at home and worked on my students' projects, including commenting on drafts, manuscripts, or plots from Malz, Vakili, and Wang.</p>http://hoggresearch.blogspot.com/2015/03/health.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-6287319896001743940Fri, 20 Mar 2015 03:59:00 +00002015-03-21T17:11:47.233-04:00exoplanetgraphical modelinformationintelligencekinematicsLTFDFCFmodelradial velocitysdssspectroscopystarstatisticstalkingrobust fitting, intelligence, and stellar systems<p>In the morning I talked to Ben Weaver (NYU) about performing robust (as in "robust statistics") fitting of binary-star radial-velocity functions to the radial velocity measurements of the individual exposures from the <i>APOGEE</i> spectroscopy. The goal is to identify radial-velocity outliers and improve <i>APOGEE</i> data analysis, but we might make a few discoveries along the way, <i>a la</i> what's implied by <a href="http://arxiv.org/abs/1502.05035">this paper</a>.</p><p>At lunch-time I met up with Bruce Knuteson (Kn-X) who is starting a company (see <a href="https://www.kn-x.com/">here</a>) that uses a clever but simple economic model to obtain true information from untrusted and anonymous sources. He asked me about possible uses in astrophysics. He also asked me if I know anyone in US intelligence. I don't!</p><p>In the afternoon, Tim Morton (Princeton) came up to discuss things related to multiple-star and exoplanet systems. One of the things we discussed is how to parameterize or build pdfs over planetary <i>systems</i>, which can have very different numbers of elements and parameters. One option is to classify systems into classes, and build a model of each (implicitly qualitatively different) class and then model the full distribution as a mixture of classes. Another is to model the "biggest" or "most important" planet first; in this case we build a model of the pdf over the "most important planet" and then deal with the rest of the planets later. Another is to say that every single star has a <i>huge number of planets</i> (like thousands or infinity) and just most of them are unobservable. Then the model is over the an (effectively) infinite-dimensional vector for every system (most elements of which describe planets that are unobservable or will not be observed any time soon).</p><p>This infinite-planet descriptor sounds insane, but there are lots of tractable models like this in the world of non-parametrics. <i>And</i> the Solar System certainly suggests that most stars probably do have many thousands of planets (at least). You can guess from this discussion where we are leaning. Everything we figure out about planet systems applies to stellar systems too.</p>http://hoggresearch.blogspot.com/2015/03/robust-fitting-intelligence-and-stellar.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-1909990060500451672Thu, 19 Mar 2015 03:59:00 +00002015-03-20T13:29:04.011-04:00bayesclusteringenvironmentexoplanetgalaxygraphical modelliteraturemeetingprimussdssspectroscopystatisticstalkingBlanton-Hogg group meeting<p>Today was the first-ever instance of the new Blanton–Hogg combined group meeting. Chang-Hoon Hahn (NYU) presented work on the environmental dependence of galaxy populations in the <i>PRIMUS</i> data set and a referee report he is responding to. We discussed how the redshift incompleteness of the survey might depend on galaxy type. Vakili showed some preliminary results he has on machine-learning-based photometric redshifts. We encouraged him to go down the "feature selection" path to start; it would be great to know what <i>SDSS</i> catalog entries are most useful for predicting redshift! Sanderson presented issues she is having with building a hierarchical probabilistic model of the Milky Way satellite galaxies. She had issues with the completeness (omg, how many times have we had such issues at Camp Hogg!) but I hijacked the conversation onto the differences between binomial and Poisson likelihood functions. Her problem is very, very similar to that <a href="http://arxiv.org/abs/1406.3020">solved by Foreman-Mackey for exoplanets</a>, but just with different functional forms for everything.</p>http://hoggresearch.blogspot.com/2015/03/blanton-hogg-group-meeting.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-2262961462827905008Wed, 18 Mar 2015 03:59:00 +00002015-03-20T10:02:41.663-04:00bayescosmologygraphical modellinear algebrameetingPlanckstartalkingtestingtimewhite dwarf#astrohackny, CMB likelihood<p>I spent most of #astrohackny arguing with Jeff Andrews (Columbia) about white-dwarf cooling age differences and how to do inference given measurements of white dwarf masses and cooling times (for white dwarfs in coeval binaries). The problem is non-trivial and is giving Andrews biased results. In the end we decided to obey the advice I usually give, which is to beat up the likelihood function before doing the full inference. Meaning: Try to figure out if the inference issues are in the likelihood function, the prior, or the MCMC sampler. Since all these things combine in a full inference, it makes sense to "unit test" (as it were) the likelihood function first.</p><p>Late in the day I discussed the CMB likelihood function with Evan Biederstedt. Our goal is to show that we can perform a non-approximate likelihood function evaluation in real space for a non-uniformly observed CMB sky (heteroskedastic and cut sky). This involves solving—and taking the determinant of—a large matrix (50 million squared in the case of <i>Planck</i>). I, for one, think we can do this, using <a href="http://arxiv.org/abs/1403.6015">our brand-new linear algebra foo</a>.</p>http://hoggresearch.blogspot.com/2015/03/astrohackny-cmb-likelihood.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-4212880031268747056Tue, 17 Mar 2015 03:59:00 +00002015-03-20T09:28:42.138-04:00asteroseismologybayesgraphical modelmodelsdssspectroscopystarstatisticstalkingTheCannonprobabilistic Cannon<p>The biggest conceptual issue with <i>The Cannon</i> (our data-driven model of stellar spectra) is that the system is a pure optimization or frequentist or estimator system: We presume that the training-data labels are precise and accurate, and we obtain, for each test-set spectrum, best-fit labels. In reality our labels are noisy, there are stars that <i>could</i> be used for training but they only have partial labels (log<I>g</I> only from asteroseismology, for example), and we don't have <i>zero</i> knowledge about the labels of the unlabeled spectra. This calls for Bayes. Foreman-Mackey drew a graphical model in the morning and suggested variational inference. Late in the afternoon, David Sontag (NYU) drew that same model and made the same suggestion! Sontag also pointed out that there are some new ideas in variational inference that might make the project an interesting project in the computer-science-meets-statistics literature too. Any takers?</p>http://hoggresearch.blogspot.com/2015/03/probabilistic-cannon.htmlnoreply@blogger.com (Hogg)0tag:blogger.com,1999:blog-10448119.post-1910292940104811201Sat, 14 Mar 2015 03:59:00 +00002015-03-16T22:50:48.919-04:00seminarTheCannontravelTufts<p>I spent the day at Tufts, where I spoke about <i>The Cannon</i>. Conversation with the locals centered on galaxy evolution, about which there are many interesting projects brewing.</p>http://hoggresearch.blogspot.com/2015/03/tufts.htmlnoreply@blogger.com (Hogg)0