[blog] Runtime polymorphism exploited

In this post I will be showing a dynamic polymorphism implemented in C to take a look on what C++ implements under the hood.

The idea is to have an abstract class Polygon which has a public method to calculate the area of a given general polygon.

In C++ the implementation is straightforward.

#include <iostream>
class Polygon {
public:
    virtual uint32_t area() = 0;
};
class Triangle : public Polygon {
public:
    Triangle(uint32_t height, uint32_t base, const char* name)
    {
        this->height = height;
        this->base = base;
        this->name = name;
    }
  
private:
    uint32_t height;
    uint32_t base;
    const char* name;
    uint32_t area() {
        printf("Identified %s as a triangle.\n\r", name);
        return ((this->height) * (this->base) / 2);
    }
};
class Rectangle : public Polygon {
public:
    Rectangle(uint32_t height, uint32_t base, const char* name)
    {
        this->height = height;
        this->base = base;
        this->name = name;
    }
  
private:
    uint32_t height;
    uint32_t base;
    const char* name;
    uint32_t area() {
        printf("Identified %s as a rectangle.\n\r", name);
        return ((this->height) * (this->base));
    }    
};
int main()
{
    Polygon* p1;
    Polygon* p2;
    Triangle t1{ 5,6, "polygon1"};
    Rectangle r1{ 9,5, "polygon2"};
    p1 = &t1;
    p2 = &r1;
    printf("area: %d\n\r", p1->area());
    printf("area: %d\n\r", p2->area());
}
C++ does not have an Interface keyword. You have to define interfaces through inheritance mechanisms, as we do to inherit members. To declare an interface, declare a pure virtual class. To implement an interface, derive from it.

Output

Identified polygon1 as a triangle.
area: 15
Identified polygon2 as a rectangle.
area: 45

The implementation in C can be done too. We need to know how to handle the memory structures to create the late-bind. An abstract class as “Polygon” in C , or the base class, can be seen as a pointer to a structure that is on the top (the lowest addresss) of the memory structure of every derived class. It means that a given polygon object, let it be a square or a hexagon will always be bound by this interface. But we have to wire them so they are useful together.

Note that the derived class is inheriting an i-n-t-e-r-f-a-c-e. There is a lot in common between different polygons and in how we manipulate them, so we can easily think of such an intefarce. Note that, in fact a triangle is a Polygon. Not every polygon is a triangle though. That's why we loosely say abstract classes can't be instantiated. 
UML class diagram and conceptual memory layout. This Polygon is composed as a rectangle.
//@File: polygon.h
#ifndef POLYGON_H 
#define POLYGON_H
#include <stdint.h>
struct VirtualTable;
typedef struct
{
	struct VirtualTable const* vptr;
} Polygon_t;
struct VirtualTable 
{
	uint32_t (*CalcArea)(Polygon_t const* const me);
};
void Polygon_Ctor(Polygon_t* const me);
//inline to save a function call
static inline uint32_t CalcArea(Polygon_t const* const me)
{
	return (*me->vptr->CalcArea)(me);
}
#endif
//@File: rectangle.h
#ifndef RECTANGLE_H
#define RECTANGLE_H
#include "polygon.h"
typedef struct
{
	Polygon_t Super; //<-Base Class
	uint32_t base;
	uint32_t height;
    const char* name;
} Rectangle_t;
void Rectangle_Ctor(Rectangle_t* this_rectangle, int base, int height, const char* name);
#endif
//@File: triangle.h
#ifndef TRIANGLE_H
#define TRIANGLE_H
#include "polygon.h"
typedef struct
{
	Polygon_t Super;
	int base;
	int height;
    const char* name;
} Triangle_t;
void Triangle_Ctor(Triangle_t* this_triangle, int base, int height, const char* name);
#endif
//@File polygon.c
#include "polygon.h"
#include <assert.h>
#include <stdlib.h>
static uint32_t CalcArea_(Polygon_t const* const me);
void Polygon_Ctor(Polygon_t* const this_polygon) 
{
	if (this_polygon != NULL) {
		static struct VirtualTable const vtbl =
		{
			&CalcArea_
		};
		this_polygon->vptr = &vtbl;
	}
}
static uint32_t CalcArea_(Polygon_t const* const me)
{
	assert(0); //should never be called
	return 0;
}

The base-class (or the Super class) has a pointer vptr to a VirtualTable structure. In this case this virtual table has a single function pointer. When called it relies on the caller to tell which function it should point to, what should be done. That is, depending on which derived class calls this function, it will perform a different computation. When constructing the derived object you bind an interface address to the object’s own base class’s virtual table. Before you have to construct the interface within the object.

//@File: triangle.c
#include <stdlib.h>
#include "polygon.h"
#include "triangle.h"
static uint32_t CalcAreaTriang_(Polygon_t const* const this_polygon);
void Triangle_Ctor(Triangle_t* this_triangle, int base, int height, const char* name)
{
	static struct VirtualTable const vtbl =
	{
		&CalcAreaTriang_ 
	};
	if (this_triangle != NULL)
	{
     //constructing the base-class
		Polygon_Ctor(&this_triangle->Super); 
     //binding this vtable to base-class table
		this_triangle->Super.vptr = &vtbl; 
    //defining class members
		this_triangle->base = base;
		this_triangle->height = height;
		this_triangle->name = name;
	}
}
static uint32_t CalcAreaTriang_(Polygon_t const* const me)
{
	Triangle_t* this_triangle = (Triangle_t*)me; //downcast
	printf("Identified %s as a triangle.\n\r", this_triangle->name);
	return ((this_triangle->height) * (this_triangle->base))/2;
}
//@File: rectangle.c
#include "polygon.h"
#include "rectangle.h"
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
static uint32_t CalcAreaRect_(Polygon_t const* const this_polygon);
void Rectangle_Ctor(Rectangle_t* this_rectangle, int base, int height, const char* name)
{
	static struct VirtualTable const vtbl = 
	{
		&CalcAreaRect_
	};
	if (this_rectangle != NULL) 
	{
		Polygon_Ctor(&this_rectangle->Super);
		this_rectangle->Super.vptr = &vtbl;
		this_rectangle->base = base;
		this_rectangle->height = height;
		this_rectangle->name = name;
	}
}
static uint32_t CalcAreaRect_(Polygon_t const* const me)
{
	Rectangle_t* this_rectangle = (Rectangle_t*)me;
	printf("Identified %s as a rectangle.\n\r", this_rectangle->name);
	return (this_rectangle->base) * (this_rectangle->height);
}

Since the private function which calculates the area receives a pointer to Polygon_t, we perform a downcast (a cast from base class to a derived class, Polygon_t to Triangle_t, e.g.) to access the data we need.

To use the CalcArea method we either perform an upcast on the object (e.g., from Triangle_t to Polygon_t) or access the Super class instance through the object address.

#include <stdio.h>
#include <stdlib.h>
#include "polygon.h"
#include "rectangle.h"
#include "triangle.h"
int main()
{
    //declaring objects
    Rectangle_t r1;
    Triangle_t t1;
    Rectangle_t r2 ;
    //constructing objects
    Rectangle_Ctor(&r1, 10, 20, "Polygon1");
    Triangle_Ctor(&t1, 5, 10, "Polygon2");
    Rectangle_Ctor(&r2, 300, 2, "Polygon3");
    //computing areas polymorphically
    printf("area: %d\n\r", CalcArea(&r1.Super));
    printf("area: %d\n\r", CalcArea((Polygon_t*)&t1););
    printf("area: %d\n\r", CalcArea(&r2.Super));
}
Identified Polygon1 as a rectangle.
area: 200
Identified Polygon2 as a triangle.
area: 25
Identified Polygon3 as a rectangle.
area: 600

[blog] C++ for C programmers #2

Returning user-defined types

In C, suppose we want to create a function that gets the position of a certain device. We usually would have created a Position type definition, and the functions to retrieve this position would look like one of these:

Position* get_position(); // function 1
void get_position(Position* this_position); // function 2

In function 1 the called function would allocate the memory for the Position object and return the address to the caller function. In function 2, the caller would allocate this memory and pass the object address to the function. In C++ you are allowed to return user defined types directly from functions.

struct Position {
   float x;
   float y;
};
Position get_position() {
//code
}
int foo() {
 auto p = get_position();
// code..
}

Generic Programming With Templates

The mind behind Generic Programming is Stepanov. Long story short, C++ takes code reuse to another level with templates:

#include <iostream>

template <class T>
void swap(T& x, T& y) 
{
	T temp = x;
	x = y;
	y = temp;
}

int main(void)
{
	int a = 6;
	int b = 10;
	float f = 3.7f;
	float g = 4.3f;
	double d = 3.14159265358979323846;
	double e = 6.00008573217894365218;
	swap(a, b);
	std::cout << "a is now " << a << "; b is now " << b << std::endl;
	swap(f, g);
	std::cout << "f is now " << f << "; g is now " << g << std::endl;
	swap(d, e);
	std::cout << "d is now " << d << "; e is now " << e << std::endl;
	return 0;
}

Output:

a is now 10; b is now 6
f is now 4.3; g is now 3.7
d is now 6.00009; e is now 3.14159

Object Life Cycle

In C, the storage duration of an object depends on how you declare them in your code. C++ works with constructors and destructors for the user-defined types, the classes. Classes are like structs that can have functions, loosely speaking.

An object’s constructor is called just after its storage duration begins and the destructor is called just before it ends. They have no return type and the same name as the struct. The destructor has a ~ to the beginning of the class name.

The compiler makes sure the constructor and destructor are invoked automatically for objects with static, local, and thread local storage duration. For objects with dynamic storage duration, you use the keywords new and delete instead of malloc and free.

#include <cstdio>

struct Car {
	Car(const float engine_arg) : engine(engine_arg) { // constructor
		printf("I am a car with a %.2f engine\n\r", engine);
	}
	~Car() { //destructor
		printf("I was a car with a %.2f engine\n\r", engine);
	}
	const float engine;
};

void local_car_30(void) {
	Car car_local{ 3.0 }; //memory allocated, constructor called
	return; //destructor called, memory deallocated
}
int main() {
	Car* car1 = new Car{2.0};  //memory allocated, constructor called
	local_car_30(); 
	auto car2 = new Car{1.0}; 
	delete car1; //destructor called, memory deallocated
	delete car2; 
}

The output:

I am a car with a 2.00 engine
I am a car with a 3.00 engine
I was a car with a 3.00 engine
I am a car with a 1.00 engine
I was a car with a 2.00 engine
I was a car with a 1.00 engine
There is a concept in C++ programming called "RAII" which means "resource allocation is initialization", sometimes also called "constructor acquires, destructor releases".
Initializing data in C++ is a mess. For C programmers, you must get the differences from initializing Fully Featured Classes, structs that have data members and methods from Plain-Old Data structures - pure data containers you already know. 

Smart Pointers

A raw pointer is a memory address, just that. And you have to take care of the memory management. The idea of smart pointers is to wrap dynamic objects so the compiler will take care of the memory management.

On the list above, suppose I had not used the delete keyword for car1 and car2 – which I have allocated dynamically (with the new keyword). The destructors would never be called and that would mean a memory leak. The use of a smart pointer, in this case the unique_pointer assures memory is cleaned up.

#include <cstdio>
#include <memory>
struct Car {
    Car(const float engine_arg) : engine(engine_arg) { // constructor
        printf("I am a car with a %.2f engine\n\r", engine);
    }
    ~Car() { //destructor
        printf("I was a car with a %.2f engine\n\r", engine);
    }
    const float engine;
};
void create_cars(void) {
    std::unique_ptr<Car> car1{ new Car{2.0} };
    auto car2 = new Car{ 1.0 }; 
    Car car3{ 3.0 };
} //no delete for car2. memory leak!
int main() {
    create_cars();  
} 

Output:

I am a car with a 2.00 engine
I am a car with a 1.00 engine
I am a car with a 3.00 engine
I was a car with a 3.00 engine
I was a car with a 2.00 engine

Note the destructor of car2 was never called.

[blog] C++ for C programmers #1

C is everywhere on system-programming because it allows high-level abstraction keeping low-overhead. The strong type system is an example. It is used at compile time to check for program correctness. After compile time the types will have disappeared and the emitted assembly code will show no trace of the type system. C++ also takes zero-overhead abstraction and direct mapping to hardware very seriously. In fact, some C++ features incur even less overhead than the corresponding C code.

C++ compilers will deal with most of your C programming habits, so you can take advantage of the C++ tactical niceties while avoiding the language deeper themes. Even for seasoned C programmers, in this this way of coding you need to learn new concepts to take full advantage of it. In some resourceless environments (e.g., embedded software, some operating system kernels) the available toolchains have incomplete C++ support, and in such situations it’s possible to benefit from at least some C++ idioms. (LOSPINOSO, C++ Crash Course)

Function Overloading

In C, each function must have a unique name. But in C++ functions can share names as long as their arguments differ. It is called function overloading.

// Standard C library to convert an integral type to a C-style string

//integer to string
char* itoa(int value, char* str, int base);

//long integer to string
char* ltoa(long value, char* buffer, int base);

//unsigned long integer to string
char* ultoa(unsigned long value, char* buffer, int base);

// C++ library for the same purpose
//integer to string
char* toa(int value, char* str, int base);

//long integer to string
char* toa(long value, char* buffer, int base);

//unsigned long integer to string
char* toa(unsigned long value, char* buffer, int base);

// C++ Code using function overloading
int main(void) {
  char buff[10];
  int a = 1; 
  long b = 2; 
  unsigned long c = 3; 
  toa(a, buff, 10);
  toa(b, buff, 10);
  toa(c, buff, 10);
  return 0;
}

The constexpr

Much of the benefits in C++ come on compile-time. The keyword constexpr is an example. It tells the compiler that if possible to evaluate that expression on compile time, saving room on your flash memory if you are on embedded environment and avoiding macro functions.

constexpr int compute_tick_numbers(int time_in_ms) {
 /*code*/
}
constexpr tick_numbers = compute_tick_numbers(5);

References

Pointers are crucial on system programming. They enable you to handle large amounts of data efficiently by passing around data address instead of the actual data. In C++ you got additional features available that defend you against null dereferences and unintentional pointer reassignments.

You declare references with & rather than * and you interact with members using “.” rather than “->”.

#include <cstdio>

struct Foo {
	bool bar;
};

typedef struct Foo Foo_t;

void make_foo_ptr(Foo_t* data_struct); //takes a pointer to 
void make_foo_ref(Foo_t& data_struct); //takes a reference

void make_foo_ptr(Foo_t* data_struct) {
	//if (data_struct == nullptr) return; 
	data_struct->bar = true;
}
void make_foo_ref(Foo_t& data_struct) {
	data_struct.bar = true;
}

int main(void) {
	Foo_t foo_var{.bar = false}; //c++ style init;
	Foo_t& fooBar_ref = foo_var; //assign the variable name
	Foo_t* fooBar_ptr = &foo_var;
	make_foo_ref(fooBar_ref);
	make_foo_ptr(fooBar_ptr);
	return 0;
}

The compiler produces similar code. This is part of the assembly code generated on my x86-64, they are identical for both functions:

; 16   : void make_foo_ref(Foo_t& data_struct) {
; 17   : 	data_struct.bar = true;
	mov	rax, QWORD PTR data_struct$[rbp]
	mov	BYTE PTR [rax], 1

; 12   : void make_foo_ptr(Foo_t* data_struct) {
; 14   : 	data_struct->bar = true;
	mov	rax, QWORD PTR data_struct$[rbp]
	mov	BYTE PTR [rax], 1

References are safer than raw pointers. When dealing with pointers in C you usually need to add a safe-check like on line 13. For references, at compile time they cannot be null.

Not being null does not mean they cannot be pointing referencing to garbage. Take this function that returns a reference to a temporary variable:

Foo_t& func1(void) {
  Foo_t foo;
  return foo;
}

Using this function return will lead to undefined runtime behavior. foo is destroyed, and the return of Foo_t will be a dangling reference.

Another safety feature references got is that once they are initialized, they cannot be changed to point to another memory address. To appreciate this, look at two codes doing the same, in C and C++

//C-style
int main(void) {
   int a = 20;
   int* a_ptr = &a;
   int b = 40;
   *a_ptr = b; //a=40
   return 0;
}
//C++ 
int main(void) {
   int a = 20;
   int& a_ref = a;
   int b = 40;
   a_ref = b; //a=40 !!
   return 0;
}

When you put a reference on the left-side of an equal sign you are setting the pointed-to value equal to the right side.

auto Initialization, and implicit typedef of struct, union, and enum

With the auto keyword the compiler deduces the type on the right side and sets the variable type to the same. This can be very handy when dealing with function returns, when the return type changes later on, during a code refactoring. Also, typedefs for structs/unions/enums are implicitly created.

#include <cstdlib>

enum color { yellow, blue };
struct Car{
 color carColor;
};
Car* make_car(color carColor) {
   Car* car = new Car{ carColor };
   return car;
}
int main(void) {
   auto car = make_car(blue);
   free(car);
   return 0;
}

Use the enum class for type safety, since they cannot be implicitly converted to other types, like integers or another enum. Furthermore, you can explicitly tell the underlying type that makes the enumeration.

enum class State : unsigned char {
	IDLE = 0x00,
	RUNNING = 0x01,
	DONE = 0xFF
};
void init_machine(Machine* machine) {
	machine->current_state = State::IDLE;
    /*...*/
}

Namespaces

Namespaces allow you to declare scopes for the same identifiers. Let’s say you wanted to create a Foo structure and also a Foo function (not a good idea), but if you really wish:

#include <cstdio>
namespace DataStruc {
	struct Foo {
		int bar;
		bool var;
	};
}
namespace Func {
	void Foo(DataStruc::Foo* foo_arg) {
		foo_arg->bar = 100;
		foo_arg->var = false;
		printf("bar = %d, var = %d \n", foo_arg->bar, foo_arg->var);
	}
}
// terrible identifiers usage to show off namespaces  
int main(void) { 
  DataStruc::Foo Foo;
  Func::Foo(&Foo);
  return 0;
}

The main program output:

bar = 100, var = 0

There is no overhead when using namespaces and they are really useful in large projects to separate code in different libraries.

Intermingling C and C++ Object Files

On system programming is very common the need for a C compiler to link object files emitted by a C++ compiler (and vice-versa). Two issues are related to linking the files. First, the calling conventions in C and C++ could potentially be mismatched – the protocols for how the stack and registers are set when you call a function could be different. Second, C++ compilers emit different symbols than C to accommodate function overloading, namespaces and other C++ features.

C linkers know nothing about this decoration. The fix is to wrap the code you want to compile with C-style linkage using the statement extern “C”.

// header.h
#ifdef __cplusplus
extern "C" {
#endif
void foo(void);

struct bar {
  int count;
};
#ifdef __cplusplus
}
#endif

This header can be shared between C and C++ code. __cplusplus is a special identifier that the C++ compiler defines (but the C does not). So the C compiler would see just:

// header.h
void foo(void);
struct bar {
  int count;
};

While the C++ compiler sees:

// header.h
extern "C" {
  void foo(void);
  struct bar {
    int count;
  };
}

This text is based on Lospinoso’s book “C++ Crash Course”. The author calls this style of coding as “Super C”.