sizeof的东西会被编译器直接替换掉,即使是汇编代码都只能看到一个常量,所以下面有童鞋说看反汇编源码是不行的,因为已经在编译器内部替换掉了(更严谨的说法是,VLA是特殊情况,这是后面的代码说明中有提到)。下面以Clang对sizeof的处理来看sizeof的实现。
在Clang的实现中,在lib/AST/ExprConstant.cpp中有这样的方法:
bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr
这个方法的实现如此:
switch(E->getKind()) { case UETT_AlignOf: { if (E->isArgumentType()) return Success(GetAlignOfType(Info, E->getArgumentType()), E); else return Success(GetAlignOfExpr(Info, E->getArgumentExpr()), E); } case UETT_VecStep: { QualType Ty = E->getTypeOfArgument(); if (Ty->isVectorType()) { unsigned n = Ty->castAs<VectorType>()->getNumElements(); // The vec_step built-in functions that take a 3-component // vector return 4. (OpenCL 1.1 spec 6.11.12) if (n == 3) n = 4; return Success(n, E); } else return Success(1, E); } case UETT_SizeOf: { QualType SrcTy = E->getTypeOfArgument(); // C++ [expr.sizeof]p2: "When applied to a reference or a reference type, // the result is the size of the referenced type." if (const ReferenceType *Ref = SrcTy->getAs<ReferenceType>()) SrcTy = Ref->getPointeeType(); CharUnits Sizeof; if (!HandleSizeof(Info, E->getExprLoc(), SrcTy, Sizeof)) return false; return Success(Sizeof, E); } } llvm_unreachable("unknown expr/type trait"); }
然后通过这个方法,我们可以顺藤摸瓜,发现sizeof的处理其实是在HandleSizeof这个方法内,结果是会存储在Sizeof这个CharUnits中,而一个CharUnits是Clang内部的一个表示,引用Clang的注释如下
/// CharUnits - This is an opaque type for sizes expressed in character units. /// Instances of this type represent a quantity as a multiple of the size /// of the standard C type, char, on the target architecture. As an opaque /// type, CharUnits protects you from accidentally combining operations on /// quantities in bit units and character units. /// /// In both C and C++, an object of type 'char', 'signed char', or 'unsigned /// char' occupies exactly one byte, so 'character unit' and 'byte' refer to /// the same quantity of storage. However, we use the term 'character unit' /// rather than 'byte' to avoid an implication that a character unit is /// exactly 8 bits. /// /// For portability, never assume that a target character is 8 bits wide. Use /// CharUnit values wherever you calculate sizes, offsets, or alignments /// in character units.
然后,我们找寻HandleSizeof方法:
/// Get the size of the given type in char units. static bool HandleSizeof(EvalInfo &Info, SourceLocation Loc, QualType Type, CharUnits &Size) { // sizeof(void), __alignof__(void), sizeof(function) = 1 as a gcc // extension. if (Type->isVoidType() || Type->isFunctionType()) { Size = CharUnits::One(); return true; } if (!Type->isConstantSizeType()) { // sizeof(vla) is not a constantexpr: C99 6.5.3.4p2. // FIXME: Better diagnostic. Info.Diag(Loc); return false; } Size = Info.Ctx.getTypeSizeInChars(Type); return true; }
走到这里,我们就知道了为什么会被替换掉了,如你这里是void或者Function type,编译器都直接替换为CharUnits::One()这个常量(即一个Char的大小),所以这就是汇编也只能看到常量的原因,毕竟汇编是后面CodeGen的事情,而这里是在CodeGen之前发生的了。而在这里也会判断Type是不是ConstantSizeType,因为需要在编译期计算出来,而注释则是针对VLA,有兴趣的同学可以按照注释的C99地方去看说的是什么。接下来则是把Type传给getTypeSizeInChars方法了。
OK,接下来我们再一步一步的走下去,看getTypeSizeInChars做了什么。
/// getTypeSizeInChars - Return the size of the specified type, in characters. /// This method does not work on incomplete types. CharUnits ASTContext::getTypeSizeInChars(QualType T) const { return getTypeInfoInChars(T).first; }
走到这里的时候,虽然我们就算不走下去都能知道这个方法是返回特定类型的大小了,但是我们还是要打破沙锅问到底,看到底是怎么实现的。于是我们继续走getTypeInfoChars()这个方法。
std::pair<CharUnits, CharUnits> ASTContext::getTypeInfoInChars(QualType T) const { return getTypeInfoInChars(T.getTypePtr()); }
走到这里,我们也知道为什么会有first了,因为这个方法返回的是一个std::pair,接下来我们可以发现调用的还是getTypeInChar方法,但是参数一个TypePointers,于是我们找这个重载方法:
std::pair<CharUnits, CharUnits> ASTContext::getTypeInfoInChars(const Type *T) const { if (const ConstantArrayType *CAT = dyn_cast<ConstantArrayType>(T)) return getConstantArrayInfoInChars(*this, CAT); TypeInfo Info = getTypeInfo(T); return std::make_pair(toCharUnitsFromBits(Info.Width), toCharUnitsFromBits(Info.Align)); }
随后,我们可以发现是getTypeInfo这个方法,然后我们找到对应的代码:
TypeInfo ASTContext::getTypeInfo(const Type *T) const { TypeInfoMap::iterator I = MemoizedTypeInfo.find(T); if (I != MemoizedTypeInfo.end()) return I->second; // This call can invalidate MemoizedTypeInfo[T], so we need a second lookup. TypeInfo TI = getTypeInfoImpl(T); MemoizedTypeInfo[T] = TI; return TI; }
然后我们找到了这个,对于MemorizedTypeInfo我们暂时不需要关心,我们也能发现需要的东西其实在getTypeInfoImpl里面
/// getTypeInfoImpl - Return the size of the specified type, in bits. This /// method does not work on incomplete types. /// /// FIXME: Pointers into different addr spaces could have different sizes and /// alignment requirements: getPointerInfo should take an AddrSpace, this /// should take a QualType, &c. TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { uint64_t Width = 0; unsigned Align = 8; bool AlignIsRequired = false; switch (T->getTypeClass()) { #define TYPE(Class, Base) #define ABSTRACT_TYPE(Class, Base) #define NON_CANONICAL_TYPE(Class, Base) #define DEPENDENT_TYPE(Class, Base) case Type::Class: #define NON_CANONICAL_UNLESS_DEPENDENT_TYPE(Class, Base) case Type::Class: assert(!T->isDependentType() && "should not see dependent types here"); return getTypeInfo(cast<Class##Type>(T)->desugar().getTypePtr()); #include "clang/AST/TypeNodes.def" llvm_unreachable("Should not see dependent types"); case Type::FunctionNoProto: case Type::FunctionProto: // GCC extension: alignof(function) = 32 bits Width = 0; Align = 32; break; case Type::IncompleteArray: case Type::VariableArray: Width = 0; Align = getTypeAlign(cast<ArrayType>(T)->getElementType()); break; case Type::ConstantArray: { const ConstantArrayType *CAT = cast<ConstantArrayType>(T); TypeInfo EltInfo = getTypeInfo(CAT->getElementType()); uint64_t Size = CAT->getSize().getZExtValue(); assert((Size == 0 || EltInfo.Width <= (uint64_t)(-1) / Size) && "Overflow in array type bit size evaluation"); Width = EltInfo.Width * Size; Align = EltInfo.Align; if (!getTargetInfo().getCXXABI().isMicrosoft() || getTargetInfo().getPointerWidth(0) == 64) Width = llvm::RoundUpToAlignment(Width, Align); break; } case Type::ExtVector: case Type::Vector: { const VectorType *VT = cast<VectorType>(T); TypeInfo EltInfo = getTypeInfo(VT->getElementType()); Width = EltInfo.Width * VT->getNumElements(); Align = Width; // If the alignment is not a power of 2, round up to the next power of 2. // This happens for non-power-of-2 length vectors. if (Align & (Align-1)) { Align = llvm::NextPowerOf2(Align); Width = llvm::RoundUpToAlignment(Width, Align); } // Adjust the alignment based on the target max. uint64_t TargetVectorAlign = Target->getMaxVectorAlign(); if (TargetVectorAlign && TargetVectorAlign < Align) Align = TargetVectorAlign; break; } case Type::Builtin: switch (cast<BuiltinType>(T)->getKind()) { default: llvm_unreachable("Unknown builtin type!"); case BuiltinType::Void: // GCC extension: alignof(void) = 8 bits. Width = 0; Align = 8; break; case BuiltinType::Bool: Width = Target->getBoolWidth(); Align = Target->getBoolAlign(); break; case BuiltinType::Char_S: case BuiltinType::Char_U: case BuiltinType::UChar: case BuiltinType::SChar: Width = Target->getCharWidth(); Align = Target->getCharAlign(); break; case BuiltinType::WChar_S: case BuiltinType::WChar_U: Width = Target->getWCharWidth(); Align = Target->getWCharAlign(); break; case BuiltinType::Char16: Width = Target->getChar16Width(); Align = Target->getChar16Align(); break; case BuiltinType::Char32: Width = Target->getChar32Width(); Align = Target->getChar32Align(); break; case BuiltinType::UShort: case BuiltinType::Short: Width = Target->getShortWidth(); Align = Target->getShortAlign(); break; case BuiltinType::UInt: case BuiltinType::Int: Width = Target->getIntWidth(); Align = Target->getIntAlign(); break; case BuiltinType::ULong: case BuiltinType::Long: Width = Target->getLongWidth(); Align = Target->getLongAlign(); break; case BuiltinType::ULongLong: case BuiltinType::LongLong: Width = Target->getLongLongWidth(); Align = Target->getLongLongAlign(); break; case BuiltinType::Int128: case BuiltinType::UInt128: Width = 128; Align = 128; // int128_t is 128-bit aligned on all targets. break; case BuiltinType::Half: Width = Target->getHalfWidth(); Align = Target->getHalfAlign(); break; case BuiltinType::Float: Width = Target->getFloatWidth(); Align = Target->getFloatAlign(); break; case BuiltinType::Double: Width = Target->getDoubleWidth(); Align = Target->getDoubleAlign(); break; case BuiltinType::LongDouble: Width = Target->getLongDoubleWidth(); Align = Target->getLongDoubleAlign(); break; case BuiltinType::NullPtr: Width = Target->getPointerWidth(0); // C++ 3.9.1p11: sizeof(nullptr_t) Align = Target->getPointerAlign(0); // == sizeof(void*) break; case BuiltinType::ObjCId: case BuiltinType::ObjCClass: case BuiltinType::ObjCSel: Width = Target->getPointerWidth(0); Align = Target->getPointerAlign(0); break; case BuiltinType::OCLSampler: // Samplers are modeled as integers. Width = Target->getIntWidth(); Align = Target->getIntAlign(); break; case BuiltinType::OCLEvent: case BuiltinType::OCLImage1d: case BuiltinType::OCLImage1dArray: case BuiltinType::OCLImage1dBuffer: case BuiltinType::OCLImage2d: case BuiltinType::OCLImage2dArray: case BuiltinType::OCLImage3d: // Currently these types are pointers to opaque types. Width = Target->getPointerWidth(0); Align = Target->getPointerAlign(0); break; } break; case Type::ObjCObjectPointer: Width = Target->getPointerWidth(0); Align = Target->getPointerAlign(0); break; case Type::BlockPointer: { unsigned AS = getTargetAddressSpace( cast<BlockPointerType>(T)->getPointeeType()); Width = Target->getPointerWidth(AS); Align = Target->getPointerAlign(AS); break; } case Type::LValueReference: case Type::RValueReference: { // alignof and sizeof should never enter this code path here, so we go // the pointer route. unsigned AS = getTargetAddressSpace( cast<ReferenceType>(T)->getPointeeType()); Width = Target->getPointerWidth(AS); Align = Target->getPointerAlign(AS); break; } case Type::Pointer: { unsigned AS = getTargetAddressSpace(cast<PointerType>(T)->getPointeeType()); Width = Target->getPointerWidth(AS); Align = Target->getPointerAlign(AS); break; } case Type::MemberPointer: { const MemberPointerType *MPT = cast<MemberPointerType>(T); std::tie(Width, Align) = ABI->getMemberPointerWidthAndAlign(MPT); break; } case Type::Complex: { // Complex types have the same alignment as their elements, but twice the // size. TypeInfo EltInfo = getTypeInfo(cast<ComplexType>(T)->getElementType()); Width = EltInfo.Width * 2; Align = EltInfo.Align; break; } case Type::ObjCObject: return getTypeInfo(cast<ObjCObjectType>(T)->getBaseType().getTypePtr()); case Type::Adjusted: case Type::Decayed: return getTypeInfo(cast<AdjustedType>(T)->getAdjustedType().getTypePtr()); case Type::ObjCInterface: { const ObjCInterfaceType *ObjCI = cast<ObjCInterfaceType>(T); const ASTRecordLayout &Layout = getASTObjCInterfaceLayout(ObjCI->getDecl()); Width = toBits(Layout.getSize()); Align = toBits(Layout.getAlignment()); break; } case Type::Record: case Type::Enum: { const TagType *TT = cast<TagType>(T); if (TT->getDecl()->isInvalidDecl()) { Width = 8; Align = 8; break; } if (const EnumType *ET = dyn_cast<EnumType>(TT)) { const EnumDecl *ED = ET->getDecl(); TypeInfo Info = getTypeInfo(ED->getIntegerType()->getUnqualifiedDesugaredType()); if (unsigned AttrAlign = ED->getMaxAlignment()) { Info.Align = AttrAlign; Info.AlignIsRequired = true; } return Info; } const RecordType *RT = cast<RecordType>(TT); const RecordDecl *RD = RT->getDecl(); const ASTRecordLayout &Layout = getASTRecordLayout(RD); Width = toBits(Layout.getSize()); Align = toBits(Layout.getAlignment()); AlignIsRequired = RD->hasAttr<AlignedAttr>(); break; } case Type::SubstTemplateTypeParm: return getTypeInfo(cast<SubstTemplateTypeParmType>(T)-> getReplacementType().getTypePtr()); case Type::Auto: { const AutoType *A = cast<AutoType>(T); assert(!A->getDeducedType().isNull() && "cannot request the size of an undeduced or dependent auto type"); return getTypeInfo(A->getDeducedType().getTypePtr()); } case Type::Paren: return getTypeInfo(cast<ParenType>(T)->getInnerType().getTypePtr()); case Type::Typedef: { const TypedefNameDecl *Typedef = cast<TypedefType>(T)->getDecl(); TypeInfo Info = getTypeInfo(Typedef->getUnderlyingType().getTypePtr()); // If the typedef has an aligned attribute on it, it overrides any computed // alignment we have. This violates the GCC documentation (which says that // attribute(aligned) can only round up) but matches its implementation. if (unsigned AttrAlign = Typedef->getMaxAlignment()) { Align = AttrAlign; AlignIsRequired = true; } else { Align = Info.Align; AlignIsRequired = Info.AlignIsRequired; } Width = Info.Width; break; } case Type::Elaborated: return getTypeInfo(cast<ElaboratedType>(T)->getNamedType().getTypePtr()); case Type::Attributed: return getTypeInfo( cast<AttributedType>(T)->getEquivalentType().getTypePtr()); case Type::Atomic: { // Start with the base type information. TypeInfo Info = getTypeInfo(cast<AtomicType>(T)->getValueType()); Width = Info.Width; Align = Info.Align; // If the size of the type doesn't exceed the platform's max // atomic promotion width, make the size and alignment more // favorable to atomic operations: if (Width != 0 && Width <= Target->getMaxAtomicPromoteWidth()) { // Round the size up to a power of 2. if (!llvm::isPowerOf2_64(Width)) Width = llvm::NextPowerOf2(Width); // Set the alignment equal to the size. Align = static_cast<unsigned>(Width); } }
一切真相大白了,已不需要解释了 :-)