How drastic would the result be if I use fasta or reference assembly from ucsc and gtf from gencode?PASA pipeline: compare experimental transcripts to the reference annotationDifference between de novo transcriptome assembly methodsExon-exon junctions: compare experimental transcripts to reference annotationCufflinks Error: sort order of reads in BAMs must be the sameCount files using htseq-count?RNA seq fasta file annotation from alignment to reference matchesMaking a bed file for RSeQCNormalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genesAssembly by stringtieRNA-seq analysis of mixed viral/host reads with salmon

How does a pilot select the correct ILS when the airport has parallel runways?

Can Ogre clerics use Purify Food and Drink on humanoid characters?

Array initialization optimization

Appropriate way to say "see you tomorrow" when meeting online

How do I turn off a repeating trade?

Output of "$OSTYPE:6" on old releases of Mac OS X

"How can you guarantee that you won't change/quit job after just couple of months?" How to respond?

.NET executes a SQL query and Active Monitor shows multiple rows blocking each other

How do I professionally let my manager know I'll quit over smoking in the office?

Do I have to explain the mechanical superiority of the player-character within the fiction of the game?

Why is it recommended to mix yogurt starter with a small amount of milk before adding to the entire batch?

JSON selector class in Python

How long would it take to cross the Channel in 1890's?

What was the Shuttle Carrier Aircraft escape tunnel?

Find the C-factor of a vote

What does "play with your toy’s toys" mean?

How many people are necessary to maintain modern civilisation?

Is "qch. est à mourir" considered an anglicism calqued from "sth is to die for"? How commonly is it used?

What did River say when she woke from her proto-comatose state?

Count All Possible Unique Combinations of Letters in a Word

Loss of power when I remove item from the outlet

What size of powerbank will I need to power a phone and DSLR for 2 weeks?

Is it illegal to withhold someone's passport and green card in California?

What does the hyphen "-" mean in "tar xzf -"?

How drastic would the result be if I use fasta or reference assembly from ucsc and gtf from gencode?

PASA pipeline: compare experimental transcripts to the reference annotationDifference between de novo transcriptome assembly methodsExon-exon junctions: compare experimental transcripts to reference annotationCufflinks Error: sort order of reads in BAMs must be the sameCount files using htseq-count?RNA seq fasta file annotation from alignment to reference matchesMaking a bed file for RSeQCNormalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genesAssembly by stringtieRNA-seq analysis of mixed viral/host reads with salmon

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

There are difference annotation file for UCSC and gencode.

But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:50

$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58

add a comment |

There are difference annotation file for UCSC and gencode.

But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:50

$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58

add a comment |

There are difference annotation file for UCSC and gencode.

But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

There are difference annotation file for UCSC and gencode.

But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?

rna-seq

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

edited Jun 13 at 7:49

Devon Ryan♦

14.8k21742

asked Jun 13 at 7:20

krushnach Chandra

51639

asked Jun 13 at 7:20

krushnach Chandra

51639

asked Jun 13 at 7:20

krushnach Chandra

51639

$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:50

$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58

add a comment |

$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:50

$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58

Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.

– Devon Ryan♦
Jun 13 at 7:50

so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?

– krushnach Chandra
Jun 13 at 7:58

add a comment |

2 Answers
2

active

oldest

votes

Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.

In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.

Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13

1

$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan♦
Jun 13 at 8:15

$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16

2

$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon♦
Jun 13 at 8:43

1

$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon♦
Jun 13 at 8:49

|
show 3 more comments

The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.

answered Jun 13 at 7:46

JenG

1841

$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:48

$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "676"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f8794%2fhow-drastic-would-the-result-be-if-i-use-fasta-or-reference-assembly-from-ucsc-a%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13

1

$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan♦
Jun 13 at 8:15

$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16

2

$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon♦
Jun 13 at 8:43

1

$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon♦
Jun 13 at 8:49

|
show 3 more comments

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13

1

$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan♦
Jun 13 at 8:15

$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16

2

$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon♦
Jun 13 at 8:43

1

$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon♦
Jun 13 at 8:49

|
show 3 more comments

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

edited Jun 13 at 9:00

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

answered Jun 13 at 8:08

Devon Ryan♦

14.8k21742

$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13

1

$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan♦
Jun 13 at 8:15

$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16

2

$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon♦
Jun 13 at 8:43

1

$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon♦
Jun 13 at 8:49

|
show 3 more comments

$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13

1

$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan♦
Jun 13 at 8:15

$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16

2

$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon♦
Jun 13 at 8:43

1

$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon♦
Jun 13 at 8:49

to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .

– krushnach Chandra
Jun 13 at 8:13

You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).

– Devon Ryan♦
Jun 13 at 8:15

##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path

– krushnach Chandra
Jun 13 at 8:16

@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.

– terdon♦
Jun 13 at 8:43

Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?

– terdon♦
Jun 13 at 8:49

|
show 3 more comments

answered Jun 13 at 7:46

JenG

1841

$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:48

$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

add a comment |

answered Jun 13 at 7:46

JenG

1841

$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:48

$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

add a comment |

answered Jun 13 at 7:46

JenG

1841

answered Jun 13 at 7:46

JenG

1841

answered Jun 13 at 7:46

JenG

1841

answered Jun 13 at 7:46

JenG

1841

answered Jun 13 at 7:46

JenG

1841

$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:48

$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

add a comment |

$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan♦
Jun 13 at 7:48

$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

Both use the chr prefix for the most recent human and mouse releases.

– Devon Ryan♦
Jun 13 at 7:48

so its with the annotation file my results would vary but not much with the references?

– krushnach Chandra
Jun 13 at 7:59

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Bioinformatics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttdfjt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

2 Answers
2

2 Answers
2

2 Answers
2