Wednesday, June 8, 2011

NetBSD GSoC Weekly Report 1

The coding period of GSoC started on the 23rd of May and we are in the 3rd week since then, but this is my first report because during the Ist week I was bogged down with my semester exams and then I picked up the work from the 1st of June. And just last night (8th June) I finally completed my first task. 

The Task: In one of my previous posts I described about the first task. The problem was to get the list of directories which contain the man page sources. We will be needing this information in future when creating a database index for searching the man pages. 
The information which we were seeking is present in the file /etc/man.conf. The following two bits are important in man.conf for us:

_default        /usr/{share,X11R6,X11R7,pkg,local}/man/

and

_subdir        cat1 man1 cat2 man2 cat4 man4... 


From this we need to build the path of directories containing man pages like this

/usr/share/man/man1
/usr/share/man/man8
/usr/pkg/man/man4
...

I wrote a patch for man(1) to add a new option -p which will print this list of directories on the terminal in new line separated format. It took me a whole week to do this relatively simple task mostly because of my stupid mistakes.

My initial patch was kind of based on Brute force approach of problem solving, it was working but it was too complicated to anyone's liking.

It looked something like this: 

+/**
+*      Tests if if the directory at dirpath exists or not
+*/
+static int
+testdir(const char *dirpath)
+{
+       DIR *dp;
+       if ((dp = opendir(dirpath)) == NULL)

+               return 0; +       closedir(dp); +       return 1; +} +
+/** +*      Builds a list of directories containing man pages +*/ +void
+printmanpath(struct manstate *m) +{ +       ENTRY *esubd, *epath; +       char *manpath; /*it will store the actual manpath as it is built */
+       char *manpath_tokens[3]; /* stores /usr/, {share, pkg, ...}, /man/ */
+       char *defaultpath = NULL; /* stores the _default tag value obtained
from man.conf */ +       char *str, *buf; /* for storing temporary values */ +       int i; + +       TAG *path = m->defaultpath; +       TAG *subdirs = m->subdirs; +       if (path == NULL ) { +               manpath = NULL; +               return ; +       } + +       /** routine code to get the default man path from the TAG. +       *       path is of the form /usr/{share,X11R7,X11R6,pkg,local}/man/ (see /etc/man.conf) +       *       We will first tokenize it into 3 parts +       *       1. /usr/ +       *       2. share,X11R7,X11R6, pkg, local +       *       3. /man/ +       *       and store them in the array manpath_tokens. +       */ +       TAILQ_FOREACH(epath, &path->entrylist, q) {
+               defaultpath = strdup(epath->s);
+               for (str = strtok(defaultpath, (const char *)"{}"), i = 0; str; str = strtok(NULL, (const char *)"{}"), i++) { +                       manpath_tokens[i] = str; +               }
+               free(str); +       } +               /** +               *       1. Tokenize manpath_tokens[1] (share, X11R7, X11R6,...) +               *       2. Traverse the tail queue subdirs and get the list of subdirs i.e.: +               *               man1, man2, man3, ... man9, etc. (see /etc/man.conf) +               *       3. Finally build the complete path of the directory by concatenating the +               *               different parts +               */ +       for (str = strtok(manpath_tokens[1], ","); str; str = strtok(NULL, ",")) { +               TAILQ_FOREACH(esubd, &subdirs->entrylist, q) { +                       // we need only path of the actual man pages and not the cat ones
+                       if (strncmp(esubd->s, "man", 3))
+                               continue; + +                       asprintf(&buf, "%s%s%s%s/", manpath_tokens[0], str, manpath_tokens[2], esubd->s);
+ +                       // we should not add non-existing directories to the man path +                       if (!testdir(buf)) +                               continue; + +                       if (manpath == NULL) +                               asprintf(&manpath, "%s", buf); +                       else
+                               printf("%s\n", buf); +                       free(buf); +               } +       } + +       free(str); +       free(defaultpath); +}


My mentors  David and Joerg are showing a lot patience with me. They went through the different versions of the patches and gave their useful reviews. Joerg suggested a more intuitive and efficient algorithm to build this path in a recursive fashion. In the end I discovered glob(3) which provided Csh style brace expansion, and I settled on using it, as it was easiest and ensured that nothing goes wrong.


The final version of patch looked something like this:

+
+/*
+ * printmanpath --
+ *    Prints a list of directories containing man pages.
+ */
+static void
+printmanpath(struct manstate *m)
+{
+    ENTRY *esubd;
+    char *defaultpath = NULL; /* _default tag value from man.conf. */
+    char *buf; /* for storing temporary values */
+    char **ap;
+    glob_t pg;
+    struct stat sb;
+    TAG *path = m->defaultpath;
+    TAG *subdirs = m->subdirs;
+    
+    /* the tail queue is empty if no _default tag is defined in * man.conf */
+    if (TAILQ_EMPTY(&path->entrylist))
+        errx(EXIT_FAILURE, "Empty manpath");
+        
+    defaultpath = TAILQ_LAST(&path->entrylist, tqh)->s;
+    
+    if (glob(defaultpath, GLOB_BRACE | GLOB_NOSORT, NULL, &pg) != 0)
+        err(EXIT_FAILURE, "glob failed");
+        
+    TAILQ_FOREACH(esubd, &subdirs->entrylist, q) {
+        /* Drop cat page directory, only sources are relevant. */
+        if (strncmp(esubd->s, "man", 3))
+            continue;
+
+        for (ap = pg.gl_pathv; *ap != NULL; ++ap) {
+            if (asprintf(&buf, "%s%s", *ap, esubd->s) == -1) 
+                err(EXIT_FAILURE, "memory allocation error");
+            /* Skip non-directories. */
+            if (stat(buf, &sb) == 0 && S_ISDIR(sb.st_mode))
+                printf("%s\n", buf);
+
+            free(buf);
+        }
+    }
+    globfree(&pg);
+} 

What did I learn in the process: It was a small and relatively simple task. Although I believe brace expansion is not a trivial thing to do manually. Overall, it was a great learning curve for me. I learnt about queue(3) interfaces, glob(3), a host of string utilities available as per the POSIX and ISO C standards and I was unaware of them. 

And the most important learning lesson was about memory management. I tried to take care of freeing memory at most of the points, but I came to learn about many corner cases which I didn't know about, but might lead to memory leaks. I hope I did learn a lesson here :) 

Hopefully this will be my first patch for NetBSD. Joerg promised to commit it soon to the repository.

2 comments:

  1. Hey nice work Abhinav! Specially code snippets you have added are looking ubercool :)

    ReplyDelete
  2. Hey Srishti, thanks :)

    I thought I am going to post a lot of code in future, so better get some way of highlighting the code snippets.

    ReplyDelete