Abstract
Hydrophobicity is a prime determinant of the structure and function of
proteins. It is the driving force behind the folding of soluble proteins, and
when exposed on the surface, it is frequently involved in recognition and
binding of ligands and other proteins. The energetic cost of exposing
hydrophobic surface is proportional to its area,
... read more
and the question arises to
what extent proteins can tolerate large hydrophobic patches on their
surfaces. The current thesis is a study into such patches. Chapter 1
provides a general introduction into protein surface hydrophobicity. Chapter
2 describes a numerical algorithm for calculating the solvent accessible
surface area. It samples the protein surface in Shrake & Rupley fashion:
representing atoms as spherical distributions of points and summing the
points that are not buried by any atoms. A number of optimization strategies
is applied, yielding an exceptionally fast method. The quality of spherical
point distributions is assessed, and a novel, optimal tessellation of the
unit sphere is found. The accessible surface calculation method developed in
Chapter 2 forms the basis of the hydrophobic patch detection algorithm called
QUILT, presented in Chapter 3. The assumption is that hydrophobic surface
area is synonymous with solvent accessible carbon and sulfur
atoms. Connecting contiguous apolar atoms is not enough to delineate
hydrophobic patches, because the relatively strong hydrophobicity of the
protein surface (around 60%) results in one large hydrophobic surface. This
surface spans the entire protein, and is dotted with polar 'islands' formed
by the hydrophilic atoms, with hydrophobic connections through variously
sized 'channels' between these islands. To delineate the hydrophobic patches,
the channels are closed off by temporarily expanding the solvent-accessible
polar atoms. This way, the hydrophobic surface neatly divides into proper
patches which are subsequently identified, and adjacent surface area lost due
to the polar expansion is added back to the patches thus obtained. Only the
largest patches, having sizes exceeding expectation (based on randomizing the
protein's surface), are deemed meaningful. The method is applied to a small
number of structures to demonstrate the validity and utility of the method.
In Chapter 4, the QUILT method is applied to a large sample of monomeric
proteins, in order to survey general trends in the distribution of patch
sizes on proteins. The largest patch on each individual protein averages
around 400 Å2, but can range from 200 to 1200 Å2. Interestingly, these areas
do not correlate with the sizes of the proteins, and only weakly with their
apolar surface fraction. Trends regarding patch size distribution, amino
acid composition and preference, sequential vicinity, secondary structure and
mobility are discussed as well. Chapter 5 is devoted to a survey similar to
that described in Chapter 4, but here, the interfaces of obligate oligomeric
proteins are studied. As before, trends regarding amino acid composition and
preference and patch size distribution are described. The largest or second
largest patch on the accessible surface of the entire subunit was involved in
multimeric interfaces in 90% of the cases, in agreement with interfaces being
generally more hydrophobic than the rest of the protein surface. However,
hydrophobic patches are not complementary: they are not preferentially in
contact across associating subunits. This is perhaps surprising, but is to be
expected, because the free energy of subunit association, as far as the
hydrophobic patches are concerned, is largely due to the shielding of apolar
area from the solvent, rather than from gaining hydrophobic contacts. To
gain insight into the dynamic behaviour of hydrophobic patches, QUILT is
applied to molecular dynamics simulations of three different protein
structures. This is the subject of Chapter 6. The analysis requires an
additional method to relate QUILT-patches across time frames of the
trajectory, which is described as well. The resulting patch runs show that
the area fluctuations are considerable, at around 25% of their size. The most
frequently occurring mean patch size is approximately 50 Å2, but can reach
around 400 Å2. An uninterrupted patch run can last up to 150 picoseconds,
but, owing to protein mobility, is generally much shorter at around 4
ps. There is no clear relation between patch run durations and their average
size, but long-lasting patch runs have smaller fluctuations. Although the
formalism would allow this, the patches do not 'wander' over the protein
surface, indicating that they are genuine surface features. When the patch
runs are clustered, the truly persistent patches called recurrent patches are
obtained. Only about 25% of them have a strong 'liveness', that is, are
represented by an actual patch run most of the time. In amicyanin, the method
detects the hydrophobic patch known to be involved in the binding of
methylamine dehydrogenase. In phospholipase A2, a large persistent patch
consisting of Leu58 and Phe94 is found, the likely functional relevance of
which appears to be novel.
show less