Understanding gene size

David fathers day 2015Father’s Day gifts from David

Originally posted 14 July 2015
Updated 28 March 2021
Available in Portuguese https://pmsbrasil.org.br/entendendo-o-tamanho-de-um-gene/

David has a terminal deletion of chromosome 22 caused by a balanced translocation.  Like nearly everyone with 22q13 deletion syndrome (Phelan-McDermid syndrome), he is missing a lot more than one gene.  What, exactly, does that mean?

DNA and genes

Each gene is made up of many “bases”.  DNA has two strands (strings) that grip each other tightly. Imagine a bunch of bar magnets threaded onto a string like pearls. Now, in your mind, take two of these strings and hold them near each other.  Slowly bring them close together.  When they get near, the north poles of magnets from one string will start to find the south poles from the other string. When the magnets come together, opposite poles will grab each other. Anywhere north faces north, or south faces south, that pair will repel each other until one flips around and the opposites unite.  DNA is made of chemical strings that have opposite poles. These opposites find their mate and the two DNA strands lock together. Each time a north meets a south you get a “base pair”.

Magnets can only make one type of partnership (north attracted to south).  DNA actually has two kinds of partnerships from four chemical bases.  The bases are abbreviate T, A, G and C.  T and A attract each other. G and C attract each other. If you make a string like this: -T-A-G-G-C-A-, the matching string will always look like this: -A-T-C-C-G-. That is, the strings stick to each other in this way:


Voilà! You have a small strand of DNA. This miniature DNA has 6 base pairs. The order of the base pairs describe the protein that this segment of DNA makes. The lower strand is kind of mirror of the upper strand. If you know what is on one strand you can always figure out the other strand. Thus, we now know a bunch of properties of DNA:  1) The sequence of base pairs describes how to make a protein, 2) DNA is strongly stuck to itself, 3) DNA keeps a mirror copy of itself available at all times, and 4) the length of the DNA can be measured by counting the number of base pairs. There is a lot more to learn about DNA, but this is enough to discuss gene size.

Big genes are easier to find

In a previous posting I explained that 95% of all people with 22q13 deletion syndrome are missing at least 1 megabase (Mb) from their chromosome (see Understanding deletion size). 1 Mb means 1,000,000 (1 million) base pairs along the two parallel strands of DNA. Genes are segments of the long strings, like chapters in a book. And, like many books, some chapters are long and some are short. There are 32 genes in the distal 1 Mb of 22q13, many of which influence brain function. Chromosome deletion syndromes are inherently difficult to study because so many genes are involved. It is hard enough to study and understand the impact of losing a single gene. It is much harder to study and understand 22q13 deletion syndrome, where many genes are missing.

This problem with studying multiple genes is not unique to 22q13 deletion syndrome. It shows up in neuropsychiatric disorders like autism and schizophrenia, each of which have hundreds of associated “risk factor” genes. Autism, for example, results from various combinations of these many genes (see review by Gratten et al., 2014). Chromosomal deletions are known to operate in a similar way (see contiguous gene syndrome). Each missing gene weakens the normal operation of the brain. No one gene needs to be “dominant” for the combined loss to be devastating, especially when so many brain-related genes are missing at once.

Not everyone thinks of 22q13 deletion syndrome this way. Much of the current thinking about the genes lost in 22q13 deletion syndrome focuses on one or two genes that code for synaptic proteins. The term “synaptopathy” has been used a lot recently, but that word originates from the study of the inner ear where they are able to clearly demonstrate the relationship between synaptic function and hearing loss (Sergeyenko et al., 2013). The relationship between genes and function is not nearly as clear in 22q13 deletion syndrome. Synapses are involved, but the synapse may be only one site of dysfunction (see Is 22q13 deletion syndrome a ciliopathy?). For many years no one thought primary cilia were important. Now, ciliopathies are a recognized type of brain dysfunction despite the fact that synapses are also involved. Science often goes off in a wrong direction; it is part of the process. The other thing to remember about 22q13 deletion syndrome is that it is a neurodevelopmental disorder. Something goes wrong during the growth and maturation of the brain. There are so many things that can go wrong with too few or too many neurons connecting between two sites in the brain, neurons connecting to wrong places, wrong proportions of excitatory and inhibitory neurons, etc. Human neural development is one of the most complex processes in the animal kingdom. Errors in neurodevelopment are not just problems with synapses.

There is another reason that synaptic genes have taken the spotlight. The synaptic genes of 22q13 are relatively large genes. This is the theme of our blog.

In general, large defects are easier to notice than small ones. If we look at the history of 22q13 deletion syndrome, the first cases were discovered in people with very large deletions and with the most “severe” phenotype (symptoms).  As research in 22q13 deletion syndrome advanced, smaller and smaller deletions were identified and studied. The gene that gets most attention is a large gene that has a large effect when disrupted. So, why does size matter?

Pie chart of mRNA size of first 1 mbase
Genes lost in a 1 Mb deletion of 22q13 sorted by their sizes (mRNA size).
Right click on the graph to see a full size image.

The pie chart shows the 32 genes that are missing from about 95% of patients with 22q13 deletion syndrome. The genes are sorted in order of size. The largest gene is SBF1 and the second largest is SHANK3. The genes continue in descending order of size in a counter-clockwise direction. Although the reality is a bit more complex, it is generally true that the likelihood of a gene getting accidently modified, or otherwise disrupted, depends on the gene’s size. This pie graph shows that the 10 largest genes account for half of the “protein-coding” DNA in the first 1 Mb. To put it another way, you are twice as likely to disrupt SHANK3 than it neighbor MAPK8IP2, simply because SHANK3 is twice as large. SHANK3 is 16 times larger than SYCE3. So, when studying gene disruption, SHANK3 can show up more often simply because it is big. Scientists are aware of this size effect. They have developed gene disruption scores that take into account the size of a gene (i.e., probability of loss-of-function intolerance, pLI).

As I noted above, no one has carefully studied the impact of a complete deletion of SHANK3 without disrupting other genes involved in brain development and function. It may seem surprising, but a damaged gene can actually have a more severe impact compared to deleting the gene altogether (see When missing a gene is a good thing). A pathogenic variant of SHANK3 (an atypical and harmful version of the gene often resulting from damage) can contribute to 22q13 deletion syndrome, especially when SHANK3 is the only gene affected. But SHANK3‘s contribution to 22q13 deletion syndrome when many genes are missing is remains poorly understood. There are other 22q13 genes that have severe neurodevelopmental consequences after deletion whether or not SHANK3 is involved. Future blogs will discuss some of these genes in detail.

The take-home message is that certain genes are more likely to come under the microscope (literally and figuratively) simply because they are larger genes. Being large makes a gene easier to study (usually), but it does not necessarily confer importance. Measures like pLI have been developed to separate size from importance. This measure was used in a study of Phelan-McDermid syndrome that is discussed in the blog Which PMS genes are most important?.

When a gene gets popularized in the scientific literature, lots of papers are published on that one gene, at least for a while. Scientists will focus on genes that get them grants and publications. That is how science typically works, even if it is not necessarily the best approach to finding effective treatments that families really need. The direction of science can be influenced by patient groups, but choosing the right direction requires a deep understanding of the science (the current state of research), science (the discipline) and scientists (who do science).


Previous posts:
Gene deletions versus mutations: sometimes missing a gene is better.
Is 22q13 deletion syndrome a ciliopathy?
Understanding deletion size
Can 22q13 deletion syndrome cause ulcerative colitis?
Can 22q13 deletion syndrome cause cancer?
22q13 deletion syndrome – an introduction


6 thoughts on “Understanding gene size

  1. Thank you, Curtis. I do my best to make genetics understandable for other parents. Parents are the real experts in 22q13 deletion syndrome, but they get overwhelmed by the science. The basic principles are not that difficult if they are presented properly. Having done both, I can honestly say that raising a child with 22q13 deletion syndrome is a lot harder than learning the science!



  2. Thanks Andy-very well done as it filled in areas i didnt understand very well. I have a question though-driven by a question someone asked me that i couldn’t answer or at least wasnt confident i knew the precise answer:

    1. The mirror image you discussed above-how does that work since only certain bases can sit next to other bases? It is confusing as there is the three of four holes make a codon part of this structure. Could you elaborate on this? This seems like the perfect place to show that part of the dna “rules” as it would help to visualize it for those of us who stumble with the visualation.. Similarly, some stumble on the shape of a chromosome vs, a DNA double helix.




    • Richard,

      Any arrangement (any order) of bases can be put on a string. That is how the body codes for each of the thousands of proteins it makes. However, once you make a string, the second (mirror) string has to follow the rules about matching bases. My drawing shows the two strings in a straight line. In the nucleus, the two strings are twisted into a corkscrew (helix) shape. Since there are two strings, you get a double helix. The Wikipedia page has some nice drawings and explanations (https://en.wikipedia.org/wiki/DNA). There is a nice description of just the structure in this YouTube video (https://www.youtube.com/watch?v=qy8dk5iS1f0).

      The sequences of base pairs that make up codons is another part of the story…for a future blog.



  3. I enjoyed reading the science, with regards to the deletions. It appeared, in your descriptions, that the larger the size of the structure the more open it is to mutate, or have deletions. This seems simplistic: the part I have grasped–and I have the basics of the structure–my gut response is that the surface area is the vulnerable part which makes a difference.

    Does this depend on strictly the inheritable part of genetic science, or does environment make its impact? I was interested since they interviewed Steve Silber, on Fresh Air, NPR radio (http://www.npr.org/sections/health-shots/2015/09/02/436742377/neurotribes-examines-the-history-and-myths-of-the-autism-spectrum), about his book on whether science is doing all that society needs done, in order to make the person with a disability welcome or able to lead a full life. This interview seemed to point out that understanding the science can be wonderfully informative, but questions still, the purport–of whether the information will be used in a good way, to accomplish what is needed–for families. Granted, these two topics are not interchangeable for each concern, but I understand the need to further integrate people who are affected by these genetic deletions and mutations, into the everyday world.

    I would like to say this is a helpful blog and very encouraging. My brother who experienced Down Syndrome, passed away at the age of 42. His background was that as a resident from a progressive area (Northern Virginia/ DC) he had managed a full, active life, until pancreatic cancer caused his death.

    I credit the various families who helped him, and others of his friends with challenges, by being creative and supportive of early education and many of the community whose insistence on an adaptive environment for their children to be involved in, passage of laws for education, and social activity groups with the rigor of regular social events, Special Olympics, and community. This steady and helpful involvement would challenge and encourages abilities to be developed: to do as much as each person was ultimately able to do.

    David’s progress with his walking at a relatively young age, reflects that encouragement and assistance to learn and grow. Thank you for the information and encouraging discourse.
    Kathy Beckwith

    Liked by 1 person

  4. Kathy,

    Thank you for the nice comments. You are right that I simplified the relationship between gene size and impact. For general understanding, the relationship between number of base pairs and probability of mutation is not far off. I used mRNA sizes, which approximate total exon length. There are advanced algorithms that adjust for regions of genes that are highly conserved. That is, these regions do not change much, if at all, between species. The inference is that changing these regions will adversely impact the resultant protein’s performance. I also did not discuss non-coding DNA regions, which are filled with promotors, enhancers and inhibitors. Still, I think the graphs are informative and explain some of the logic we should use to interpret the frequency of mutations.

    Environment has impact in a number of ways. However, with syndromic autism, the primary impact seems to be from inherited genetic defects. Certainly, it is the case with 22q13 deletion syndrome. Whether or not any of this genetic information is used to the benefit of families speaks to the whole purpose of my arm22q13 blog. The blog is written with the intent to address issues that matter. I am a member of both the scientific community and the family community. All of what I do is aimed at bringing benefits to families from the science. Sadly, neither the vocal scientists nor the parent-run organization have done a good job making sure that the science benefits the greatest number of patients and the most severely affected families.

    I am sad for the loss of your brother. My son David is part of the community. Although he requires full-time attention, he is comfortable traveling on public transportation, attending events and shopping in the local stores. He does prefer being at home watching TV much of the time, but so do I!

    David benefited from his special education program, although I am quite unhappy about how much we had to fight to protect his right to a free and appropriate education. The Greater Washington, DC region is lucky to have sophisticated school systems, and talented and dedicated teachers. Through this education, David gained skills and independence that bring life-long benefits to him and his caregivers.

    David was encouraged to grow, but he also took it upon himself to work hard and learn as much as he could. David is an athlete and hero. His success was ultimately through his own hard work. He is an amazing man.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.