================================= Data Structure for Unique Strings [TOC] ================================= ---- VIDEO ------------------------------ https://www.youtube.com/embed/yNu2gqc3r_k ----------------------------------------- Provided Material ================= # ---- SHELL (path=session17/git2, hide) ----------------------------------------- # rm -rf abc # git clone git@gitlab.com:uni-ulmulm-university-department-of-numerical-analysis/abc.git # git config --global advice.detachedHead false # -------------------------------------------------------------------------------- # # ---- SHELL (path=session17/ustr/abc, hide) ------------------------------------- # git checkout tags/ustr-quiz # -------------------------------------------------------------------------------- Here the header file and the test program for the unique strings: :import: session17/ustr/abc/ustr.h [fold] :import: session17/ustr/abc/xtest_ustr.c [fold] And here a skeleton for the source file _ustr.c_: ---- CODE (file=session17/ustr.c,fold) ----------------------------------------- #include #include #include #include "ustr.h" /* * List node for storing a unique string as element */ struct Node { struct Node *next; struct UStr ustr; }; /* * list head: list points to the first list node */ static struct Node *list; const struct UStr * UStrAdd_(const char *s, bool *added) { size_t len = strlen(s); /* * if added is the null pointer it gets ignored. Otherwise initialized with * true. */ if (added) { *added = true; } /* * Search in list if it already contains string s */ for (struct Node *n = list; n; n = n->next) { /* * TODO: * - Check if node n contains the string * - If this is the case: * - If added is not the null pointer set *added to false * - return a pointer to the unique string object stored in this node */ } /* * String needs to be added */ struct Node *n = 0; // <- TODO: Allocate a sufficiently large memory block // for a node that can store a string element // with length len. // Do not forget the null byte! if (!n) { fprintf(stderr, "makeUStr: out of memory\n"); abort(); } /* * Initialize the unique string object stored in this node: */ n->ustr.len = len; // TODO: Initialize also n->ustr.cstr by copying s into this buffer /* * TODO: Append the new node to the head of the list. I.e. the new node * will become the new head of the list. The successor of the new * node will be the old list head. */ /* * Return a pointer to the unique string object */ return &n->ustr; } const struct UStr * UStrAdd(const char *s) { return UStrAdd_(s, 0); } void UStrPrintPool(void) { for (struct Node *n = list; n; n = n->next) { printf("%s\n", n->ustr.cstr); } } -------------------------------------------------------------------------------- Allocating a List Node ====================== In the header _ustr.h_ we declare the following struct for a list element: ---- CODE (type=c) ------------------------------------------------------------- struct UStr { size_t len; char cstr[]; }; -------------------------------------------------------------------------------- Note that the member _cstr_ of this struct is an array of characters. That means it is the address of the first byte that follows the member _len_ in memory. Assume you define a variable _ustr_ of type _struct UStr_ as follows ---- CODE (type=c) ------------------------------------------------------------- struct UStr ustr; -------------------------------------------------------------------------------- then you can describe the memory layout with ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-1}{20} \DrawQuadVariable{8}{ustr.len} \DrawMemAddress{16}{ustr.cstr} \DrawMemAddress{8}{\&ustr} \end{tikzpicture} -------------------------------------------------------------------------------- For this you also see that the size of _struct UStr_ is just the size of ~size_t~. The list node can be declared in the source file as follows: ---- CODE (type=c) ------------------------------------------------------------- struct Node { struct Node *next; struct UStr ustr; }; -------------------------------------------------------------------------------- Assume you would define ---- CODE (type=c) ------------------------------------------------------------- struct Node node; -------------------------------------------------------------------------------- then the memory layout can be described with ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-2}{20} \DrawQuadVariable{0}{node.next} \DrawQuadVariable{8}{node.ustr.len} \DrawMemAddress{16}{node.ustr.cstr} \DrawMemAddress{0}{\&node} \end{tikzpicture} -------------------------------------------------------------------------------- If you use dynamic memory allocation like here ---- CODE (type=c) ------------------------------------------------------------- struct Node *n = malloc(sizeof(*n)); -------------------------------------------------------------------------------- we would have: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-2}{20} \DrawQuadVariable{0}{n->next} \DrawQuadVariable{8}{n->ustr.len} \DrawMemAddress{16}{n->ustr.cstr} \DrawPointer{0}{n} \end{tikzpicture} -------------------------------------------------------------------------------- For storing some string in such a node we simply need to allocate some additional bytes. So let's allocate memory for a list node like that: ---- CODE (type=c) ------------------------------------------------------------- struct Node *n = malloc(sizeof(*n) + 4); -------------------------------------------------------------------------------- then we have the following layout in memory: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-1}{20} %\DrawMemAddress{0}{42} \DrawQuadVariable{0}{n->next} \DrawQuadVariable{8}{n->ustr.len} \DrawLongVariable{16}{} \DrawMemAddress{16}{n->ustr.cstr} \DrawPointer{0}{n} \end{tikzpicture} -------------------------------------------------------------------------------- We allocated memory for _n->next_ and _n->ustr.len_ and 4 addition bytes. Of course this 4 bytes now could now be used to store a string in this list node. The length of this string should not exceed 3 characters as we also have to store a terminating zero byte. So after ---- CODE (type=c) ------------------------------------------------------------- strcpy(n->ustr.cstr, "Foo"); -------------------------------------------------------------------------------- we would have the following memory layout: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-1}{20} %\DrawMemAddress{0}{42} \DrawQuadVariable{0}{n->next} \DrawQuadVariable{8}{n->ustr.len} \DrawLongVariable{16}{} \DrawMemAddress{16}{n->ustr.cstr} \DrawByteVariable{16}{'f'} \DrawByteVariable{17}{'o'} \DrawByteVariable{18}{'o'} \DrawByteVariable{19}{0} \DrawPointer{0}{n} \end{tikzpicture} -------------------------------------------------------------------------------- Quiz 19: Implement the Unique String Class ========================================== Implement the unique string class. Submit your implementation with: ---- CODE (type=sh) ------------------------ submit hpc quiz19 ustr.c ustr.h xtest_ustr.c --------------------------------------------