Here a critical piece is the line in red. Given the object header we need to be able to locate all other objects referenced by it (or are children of it in object reference graph). There are two basic ways to do this
Conservative GC
In case a GC is being plugged into an existing system in the form of a library and there is no easy way to modify the runtime then a conservative (as opposed to precise) system is used. A conservative GC considers every bit pattern on the heap to be a valid reference to another heap object. For the above case given a object header the GC needs to have the following basic information
- The size of the object
- Alignment of pointers (lets assume it to be 32bit for our case)
So once the GC gets a object pointer pObj it considers all 32bit patterns in between pObj and (pObj + sizeof(Object)) to be references to other child objects. It then uses some basic elimination like whether the value points inside the current heap for the system. If it doesn’t match any elimination logic then it is considered to be a reference.
Even though this approach seems a bit extreme, it works out well and the only flip side is that it returns false positives and hence some garbage which should’ve been collected is left out (e.g. because an int containing a value accidentally matches the address of an object on the heap).
Precise GC
In precise GC the GC precisely knows every reference held in a given object. This is possible only if the runtime generates and preserves this information. There are various ways to implement a precise GC and I'll qiuckly go over one of the ways to do it,
Every object has an object header which in turn contains a pointer to it’s type/class descriptor. This class descriptor has a special 32-bit field named ReferenceMap which is used to store reference information stored by a given type
So given an object pointer pObj the GC fetches the object header from before the pointer and then de-references the class descriptor pointer to get to the corresponding type’s Class descriptor. It then reads the ReferenceMap stored into it. To do all of the above the GC needs to know only the object header and class descriptor layout and not any other type information.
All references are 32-bit aligned and the ReferenceMap contains one bit per 32-bits of the object. If the bit is set it means that the corresponding 32-bit is a reference.
Let’s take the following class
class MyType
{
public char a;
public char b;
public AnotherType t1;
public int i;
public AnotherType t2;
}
The memory layout of the class would be something like
The first two chars take 16 bits each. After that there are 3, 32-bit entities of which t1 and t2 (1st and 3rd) are references to other objects but i is not.
So the GC needs to know that given this object the 32-bits from 32nd and 96th are actually references. This is exactly what is encoded in RefMap field. It contains the bit-pattern 00000000000000000000000000001010 where the LSB is for the first 32 bits of the object (which is a+b in this case) and so on.
Since bit 1 of refmap is set it means (pObj + 1 * 32) is a reference. If that value is non-NULL then the GC can just de-reference it to get to the t1 object reference held by pObj.
Obviously it means that the maximum size of object that RefMap can encode is 32*4 = 128bytes. No points for guessing how it handles objects larger than that :)
No comments:
Post a Comment
Your Comments/Posts are invited...