RajLab: June 2013

Thursday, June 20, 2013

Marshall graduated!

Marshall successfully defended his thesis last Tuesday, June 11th. He was my first student and did a really amazing job, not just with his science, but in helping me get the lab up and running. All the best to Marshall as he starts his new job with eMolecules in San Diego!

It was also a really fun moment for the entire lab:

And yes, someone in the lab actually has a mohawk hairdo (Andrew).

Wednesday, June 19, 2013

The magical results of reviewer experiments

So, for those of you who aren't as familiar with it, the way the peer review process theoretically works is that you submit your paper, they (hopefully) send your paper to review by other scientists, and those scientists give their comments. Then you have to address those comments, and if you do so to the editor's satisfaction, then they will probably accept the paper. Often times, the reviewer comments will go something like "In order to make the claim that XYZ, the authors must show ZYX via experiment XZY." And, amazingly, the authors seemingly ALWAYS get the results that they want (i.e., the ones that will get the paper accepted). Now doesn't this seem a little bit unlikely? Of course it does! But at that point, the pressure to complete the deal (for all parties involved) is very high and so, well... things just seem to work out. :) Does this seem shady to you? Is that really the way it goes down? Well, let me ask you: have you ever reviewed a paper, suggested an important experiment, and had the authors actually pull the claim in response to a negative result? Certainly hasn't happened to me.

I think the idea with peer review is that it raises the quality of the paper. Overblown, in my estimation. Because of this pernicious "reviewer-experiment-positive-result" effect, I don't think that the additional experiments really make a bad paper better. Good papers tend to get reviews that ask for peripheral experiments that are usually a waste of time, and bad papers tend to generate a list of crucial controls that somehow always seem to work out in their favor. Note that I'm not (necessarily) saying that authors are being underhanded. It's just that reviewer experiments tend to get much less care and attention, and since there is now a very strong vested interest in a particular result, the authors are much more likely to analyze the hell out of their data until it gives you what you want. And once they get something vaguely positive, you can't really say much as a reviewer. Here's a hypothetical example, very loosely based on experience...

Reviewer: "If the authors' model is correct, then the expression of gene A should increase upon induction".

Authors: "We thank the reviewer for their very insightful comment [gritting teeth behind fake smile]. We have now measured expression of gene A upon induction, and found it increased by 1.13 fold with a p of 0.02 (see supplementary figure 79). We believe that this result has greatly strengthened the conclusions of our manuscript."

What the reviewer says: "The authors have satisfied my concerns and I now recommend publication."

What the reviewer thinks: "Whatever..."

What is the reviewer to do? They have made a measurement consistent with the hypothesis and showed it to be statistically significant. Now, if the authors had encountered this untoward result in their initial experiments and the point were indeed critical, would they have continued to follow up on their claims? Probably not. But now there's a vested interest in the publication of a project that may have taken years, and so, well, time to bust out the z-score.

What to do about it? No clue...

Tuesday, June 11, 2013

The truth

Perusing this previous post and pondering science in general got me thinking about the (apparently biblical) saying "And the truth shall set you free." Given my experiences in research, this really makes me wonder if whoever said this ever actually spent any quality time together with the truth...

Friday, June 7, 2013

The death of annotation

So by now even my mom knows that we live in the world of "Big Data". At first glance, it seems that the only way to make sense of it all is to tag it, folderize it, database it, or somehow organize in some manner. The big problem: we're making data too rapidly to tag, and nobody has the energy to properly tag things. Nor do we have a systematic way to tag. That's how you end up with the gigantic mess that is GO (gene ontology).

I think the solution is to just not worry about it. I think the solution is search, ala Google et al. Just sit back and let the computers do all the organizing for us. It's probably the only way to make sure everything is tagged uniformly and appropriately, but even more importantly, computers won't get bored of doing it. And so it might actually work. A great example is Evernote, which now has automatic tagging. Now my notes are actually organized! I found that before, the note I was searching for would never be in the category I was looking for, because I had somehow forgotten to tag that particular note.

Actually, the best thing about search is that it's sort of like the ultimate in tagging: it's like having a huge number of tags, all customized and weighted for every document. A prime example is Google Docs, where you can just store all your stuff in one huge basket and then search for it at will. This makes it so much easier to actually get to what you want–and get there fast! It's great, because Google search is so great. I really feel like Google's search prowess really is a huge competitive advantage in virtually everything they do.

Same sort of thoughts also apply to things in research like bioinformatics. I was lamenting to a couple people recently about how everyone stores their data for high throughput stuff in a different way, and it makes it so hard to compare datasets and to build a good story, etc. Common complaint, and most would agree that it's a woeful state of affairs. Two of us (including me) tried to think up some answers on how to organize things. The other person's response: "Well, another thing we could do is not care". I thought about it some more, and I'm actually convinced that's the right answer! Over time, computers will be able to help us analyze and compare these datasets without our having to write those endless little Perl scripts to convert this type of format to that type of format and the such. And it will do it so much faster and better than we can. As humans, we are good at making messes, and it's a losing battle to try and have lots of people clean up after themselves. The right move is to let the computers clean up the mess for us. Like a Roomba. Which I really want to get one day. Or this robot that folds towels:

Reality-based science

Most of the published biological literature is wrong. It's a perhaps less than well known fact, but it is a fact nonetheless: estimates from industry are that around 75%+ of published results are wrong, and our own experience in lab backs that up. I personally think the reasons for this range from black to white, with black being outright fraud (which I think is relatively rare), and white being honest mistakes or misinterpretations of data (probably reasonably common). Then there are all those things in the gray area, including selective exclusion of results that don't fit the "story". If I had to guess, I would bet that most of the reasons why results don't replicate fall into this gray zone. There are lots of potential reasons for these sorts of problems, funding, pressure for positive results, etc. Perhaps it is just our own humanity that gets in the way.

So what to do about it? I've read about people who have suggested two options, the carrot and the stick. The stick would be that somebody has to reproduce every study, and you are somehow shamed or shunned or something like that if you are caught out. Trouble is, who's going to do all this reproduction? That's a lot of time and effort. Then there's the carrot. I like this a bit better. Here, if you want, you can submit your study for verification. If it verifies, then you get some sort of seal of approval. Over time, if this idea catches on, then you would basically have to submit your work for verification for anyone to believe it, and the carrot turns into the stick.

Meanwhile, back in reality, I think it's very unlikely for any of this to ever happen. First off, who's going to pay for it? And is it worth paying for? In a way, pharma companies who actually need their stuff to really work are the ones paying for it now–they try to replicate the experiments they need, finding that the majority (if not the vast majority) are wrong, and then follow up on the ones that prove reproducible. It's perhaps not ideal to have the literature cluttered up with all these wrong results, but what's the alternative? Plus, just because something is not reproducible in someone else's lab doesn't mean that it's wrong per se. Often times, the issue is that biological systems just behave differently for reasons we don't understand, giving different results in different labs on different days. I think we've all had those "must be something in the water" moments. Umm... don't have much useful to say about that... :)

I think that another option is to train our students to better sort out the good stuff from the bad stuff. This is so difficult! People commonly note that a well-trained student can quickly rip almost any paper to shreds. Which is true and counterproductive. I've found that one strategy that has worked for us is to start with a basic question and then look up the papers that help to establish a system in which to answer said question. These papers tend to be (but are not always) less high profile, but they also seem to be at least somewhat more reproducible than high-profile papers, so you can actually build off them. So many times, I feel like successful projects are really based on a well-thumbed handful of solid, detailed, technical papers.

The other thing to do is to surround yourself with zero bullshit people who will challenge your assumptions at every step. It can be hard to deal with at times because doing science is just so generally crazy and risky, but it will make you a better scientist. And a reality-based scientist.

Lenny Teytelman is awesome...

So I have been working with a couple folks who want to do single molecule RNA FISH in yeast, and they wanted to know if there were any updated protocols. I used to use a variant of an old protocol from the Singer lab, but I had heard rumblings about some sort of issues with cell wall digestion using zymolyase, and I got referred to Lenny Teytelman (founder of ZappyLab). Lenny then related to me a one and a half year saga about how he finally figured out that you need to use a lot more zymolyase than we had been using before. Main point: not only did Lenny openly share his little protocol tip, but he did so without any wish other than to help other people out. That's the scientific spirit at its best. Lenny is cool! So cool, in fact, that he let me share the tip with the world on our RNA FISH website.

I know many people who purposefully omit these little tips and tricks from their published protocols so that others can't use them. What's the point of that? Usually, I think the rationale is that "Oh, I have this new technique and I'm gonna rule the world, so I better keep it to myself". From what I've seen, this is just a good way to limit other people using your method and ultimately to limit your impact as a scientist. The notion that you can get a steady stream of new papers from a new technique is in my mind also pretty wrong: usually, you can get (maybe) 1-2 high profile papers that really bank primarily on the novelty of the method as applied to some particular area of biological research, but really not much beyond that. So you're really best off just following your ideals and actually just trying to make your methods as transparent and practicable as possible.

Anyway, Lenny is also cool because he has been thinking a lot about the publication process, open access, and disseminating scientific information. Check out ZappyLab!