How drastic would the result be if I use fasta or reference assembly from ucsc and gtf from gencode?PASA pipeline: compare experimental transcripts to the reference annotationDifference between de novo transcriptome assembly methodsExon-exon junctions: compare experimental transcripts to reference annotationCufflinks Error: sort order of reads in BAMs must be the sameCount files using htseq-count?RNA seq fasta file annotation from alignment to reference matchesMaking a bed file for RSeQCNormalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genesAssembly by stringtieRNA-seq analysis of mixed viral/host reads with salmon

How does a pilot select the correct ILS when the airport has parallel runways?

Can Ogre clerics use Purify Food and Drink on humanoid characters?

Array initialization optimization

Appropriate way to say "see you tomorrow" when meeting online

How do I turn off a repeating trade?

Output of "$OSTYPE:6" on old releases of Mac OS X

"How can you guarantee that you won't change/quit job after just couple of months?" How to respond?

.NET executes a SQL query and Active Monitor shows multiple rows blocking each other

How do I professionally let my manager know I'll quit over smoking in the office?

Do I have to explain the mechanical superiority of the player-character within the fiction of the game?

Why is it recommended to mix yogurt starter with a small amount of milk before adding to the entire batch?

JSON selector class in Python

How long would it take to cross the Channel in 1890's?

What was the Shuttle Carrier Aircraft escape tunnel?

Find the C-factor of a vote

What does "play with your toy’s toys" mean?

How many people are necessary to maintain modern civilisation?

Is "qch. est à mourir" considered an anglicism calqued from "sth is to die for"? How commonly is it used?

What did River say when she woke from her proto-comatose state?

Count All Possible Unique Combinations of Letters in a Word

Loss of power when I remove item from the outlet

What size of powerbank will I need to power a phone and DSLR for 2 weeks?

Is it illegal to withhold someone's passport and green card in California?

What does the hyphen "-" mean in "tar xzf -"?



How drastic would the result be if I use fasta or reference assembly from ucsc and gtf from gencode?


PASA pipeline: compare experimental transcripts to the reference annotationDifference between de novo transcriptome assembly methodsExon-exon junctions: compare experimental transcripts to reference annotationCufflinks Error: sort order of reads in BAMs must be the sameCount files using htseq-count?RNA seq fasta file annotation from alignment to reference matchesMaking a bed file for RSeQCNormalization for two bulk RNA-Seq samples to enable reliable fold-change estimation between genesAssembly by stringtieRNA-seq analysis of mixed viral/host reads with salmon






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


There are difference annotation file for UCSC and gencode.



But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?










share|improve this question











$endgroup$











  • $begingroup$
    Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:50










  • $begingroup$
    so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:58

















1












$begingroup$


There are difference annotation file for UCSC and gencode.



But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?










share|improve this question











$endgroup$











  • $begingroup$
    Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:50










  • $begingroup$
    so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:58













1












1








1





$begingroup$


There are difference annotation file for UCSC and gencode.



But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?










share|improve this question











$endgroup$




There are difference annotation file for UCSC and gencode.



But if I use the reference assembly from UCSC and the GTF from Genocode or vice versa would my downstream results would be wrong?







rna-seq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 13 at 7:49









Devon Ryan

14.8k21742




14.8k21742










asked Jun 13 at 7:20









krushnach Chandrakrushnach Chandra

51639




51639











  • $begingroup$
    Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:50










  • $begingroup$
    so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:58
















  • $begingroup$
    Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:50










  • $begingroup$
    so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:58















$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan
Jun 13 at 7:50




$begingroup$
Do you want to include older UCSC/gencode releases or only the most recent ones? They should now be fully compatible.
$endgroup$
– Devon Ryan
Jun 13 at 7:50












$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58




$begingroup$
so i used gencode.v21 as my gtf file and not sure about the reference assembly,where i have used hg38 but im not sure its from gencode or ucsc .Is there a way to find out the what reference assembly im using from the reference assembly ?
$endgroup$
– krushnach Chandra
Jun 13 at 7:58










2 Answers
2






active

oldest

votes


















5












$begingroup$

Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.



In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.



Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.






share|improve this answer











$endgroup$












  • $begingroup$
    to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:13






  • 1




    $begingroup$
    You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
    $endgroup$
    – Devon Ryan
    Jun 13 at 8:15











  • $begingroup$
    ##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:16







  • 2




    $begingroup$
    @krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
    $endgroup$
    – terdon
    Jun 13 at 8:43






  • 1




    $begingroup$
    Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
    $endgroup$
    – terdon
    Jun 13 at 8:49


















1












$begingroup$

The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.






share|improve this answer









$endgroup$












  • $begingroup$
    Both use the chr prefix for the most recent human and mouse releases.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:48











  • $begingroup$
    so its with the annotation file my results would vary but not much with the references?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:59













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "676"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f8794%2fhow-drastic-would-the-result-be-if-i-use-fasta-or-reference-assembly-from-ucsc-a%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$

Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.



In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.



Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.






share|improve this answer











$endgroup$












  • $begingroup$
    to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:13






  • 1




    $begingroup$
    You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
    $endgroup$
    – Devon Ryan
    Jun 13 at 8:15











  • $begingroup$
    ##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:16







  • 2




    $begingroup$
    @krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
    $endgroup$
    – terdon
    Jun 13 at 8:43






  • 1




    $begingroup$
    Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
    $endgroup$
    – terdon
    Jun 13 at 8:49















5












$begingroup$

Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.



In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.



Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.






share|improve this answer











$endgroup$












  • $begingroup$
    to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:13






  • 1




    $begingroup$
    You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
    $endgroup$
    – Devon Ryan
    Jun 13 at 8:15











  • $begingroup$
    ##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:16







  • 2




    $begingroup$
    @krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
    $endgroup$
    – terdon
    Jun 13 at 8:43






  • 1




    $begingroup$
    Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
    $endgroup$
    – terdon
    Jun 13 at 8:49













5












5








5





$begingroup$

Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.



In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.



Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.






share|improve this answer











$endgroup$



Never use genomes or annotations from UCSC, they're poorly versioned and only recently with mouse and human have they even included all of the contigs. For fasta/GTF files from early in the GRCh38 release, you can tell whether you're using UCSC or Gencode by the presence/absence of _random contigs, which will only exist for UCSC. These were mostly later split into the actual contigs, so recent download from UCSC should more closely match what you find at Gencode/Ensembl. Further, that time predated UCSC beginning to adopt Gencode's vastly superior annotations, so if your GTF file has instances where the same gene ID is on either multiple strands or multiple chromosomes (this is obviously biologically impossible) then you have a UCSC GTF file.



In general, with early GRCh38 releases your only real issues will be with _random contigs, which are a minority of the genome and don't have all that many genes. But really, you should be keeping track of the sources of your files and ensuring that they're compatible.



Update: I should expand a bit on my "Never use genomes or annotations from UCSC" comment. In point of fact the genomes themselves aren't so terrible. Early on UCSC had the bad habit of concatenating contigs together into _random "chromosomes", but they seem to have mostly kicked that habit as of late. Note, however, that there are no versions for their genomes. Since reference genomes continue to get updates over time (mostly through the addition of patches) the lack of actual versions means you have to manually check if a recently downloaded file matches what may have been downloaded either previously or by someone else. This has obvious consequences when it comes to reproducibility. The same issue occurs for annotations from UCSC, but they have the additional problem of historically having biologically incoherent concepts of genes. That is, they will contain the same gene in multiple places with multiple orientations, which will break many tools in both obvious and completely unclear ways. For example, DEXSeq will simply break with an error message if given a UCSC annotation, since they break biological plausibility. If you were to use these annotations files with deepTools, you wouldn't get an error message, but the resulting output would be only partial, due to the biologically impossible annotation effectively corrupting most obvious ways of storing annotation data in a data structure (i.e., you can no longer treat IDs as unique). This could have downstream ramifications on biological interpretation of results.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jun 13 at 9:00

























answered Jun 13 at 8:08









Devon RyanDevon Ryan

14.8k21742




14.8k21742











  • $begingroup$
    to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:13






  • 1




    $begingroup$
    You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
    $endgroup$
    – Devon Ryan
    Jun 13 at 8:15











  • $begingroup$
    ##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:16







  • 2




    $begingroup$
    @krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
    $endgroup$
    – terdon
    Jun 13 at 8:43






  • 1




    $begingroup$
    Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
    $endgroup$
    – terdon
    Jun 13 at 8:49
















  • $begingroup$
    to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:13






  • 1




    $begingroup$
    You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
    $endgroup$
    – Devon Ryan
    Jun 13 at 8:15











  • $begingroup$
    ##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
    $endgroup$
    – krushnach Chandra
    Jun 13 at 8:16







  • 2




    $begingroup$
    @krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
    $endgroup$
    – terdon
    Jun 13 at 8:43






  • 1




    $begingroup$
    Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
    $endgroup$
    – terdon
    Jun 13 at 8:49















$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13




$begingroup$
to make sure if im have gencode so i used this "cat gencode.v21.annotation.gtf | grep "_random" no instance of that word .
$endgroup$
– krushnach Chandra
Jun 13 at 8:13




1




1




$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan
Jun 13 at 8:15





$begingroup$
You could also just head the file. Gencode files tend to start with a few comment lines (granted, given what the file is named, the odds of it being from Gencode were very high to begin with).
$endgroup$
– Devon Ryan
Jun 13 at 8:15













$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16





$begingroup$
##description: evidence-based annotation of the human genome (GRCh38), version 21 (Ensembl 77) ##provider: GENCODE ##contact: gencode@sanger.ac.uk ##format: gtf ##date: 2014-09-29 im sure now im in right path
$endgroup$
– krushnach Chandra
Jun 13 at 8:16





2




2




$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon
Jun 13 at 8:43




$begingroup$
@krushnachChandra not really relevant, but just so you know, grep can read files. You don't need cat file | grep foo, you can always do grep foo file.
$endgroup$
– terdon
Jun 13 at 8:43




1




1




$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon
Jun 13 at 8:49




$begingroup$
Devon, could you elaborate on your advice never to use UCSC genomes. Never seems a bit extreme. Can you explain what they're missing in a bit more detail?Is there any reason not to use them for human reference? Yes, they may have *random in them, but is that such a problem?
$endgroup$
– terdon
Jun 13 at 8:49













1












$begingroup$

The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.






share|improve this answer









$endgroup$












  • $begingroup$
    Both use the chr prefix for the most recent human and mouse releases.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:48











  • $begingroup$
    so its with the annotation file my results would vary but not much with the references?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:59















1












$begingroup$

The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.






share|improve this answer









$endgroup$












  • $begingroup$
    Both use the chr prefix for the most recent human and mouse releases.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:48











  • $begingroup$
    so its with the annotation file my results would vary but not much with the references?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:59













1












1








1





$begingroup$

The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.






share|improve this answer









$endgroup$



The main difference is in the way the chromosomes are named - UCSC uses the "chr" prefix (so chromosome 1 is "chr1") while in gencode the "chr" isn't used (so chromosome 1 is just "1"). Depending on your use case, this can obviously cause problems - if you're trying to match a locus (e.g. from gencode 1:1000002) between them, whatever tool you use is going to be looking in your aligned data for "1:1000002", but in your aligned data, it'll be named "chr1:1000002", so it won't match the two up.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jun 13 at 7:46









JenGJenG

1841




1841











  • $begingroup$
    Both use the chr prefix for the most recent human and mouse releases.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:48











  • $begingroup$
    so its with the annotation file my results would vary but not much with the references?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:59
















  • $begingroup$
    Both use the chr prefix for the most recent human and mouse releases.
    $endgroup$
    – Devon Ryan
    Jun 13 at 7:48











  • $begingroup$
    so its with the annotation file my results would vary but not much with the references?
    $endgroup$
    – krushnach Chandra
    Jun 13 at 7:59















$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan
Jun 13 at 7:48





$begingroup$
Both use the chr prefix for the most recent human and mouse releases.
$endgroup$
– Devon Ryan
Jun 13 at 7:48













$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59




$begingroup$
so its with the annotation file my results would vary but not much with the references?
$endgroup$
– krushnach Chandra
Jun 13 at 7:59

















draft saved

draft discarded
















































Thanks for contributing an answer to Bioinformatics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f8794%2fhow-drastic-would-the-result-be-if-i-use-fasta-or-reference-assembly-from-ucsc-a%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?