For all details and references of the project one can visit darcs-GSoC-2014-History-Reordering-Performance-and-Features.

martes, 29 de julio de 2014

Other Week (21-26 July)

Good news!

I almost finish to implement the option minimize-context for the command darcs send, I say almost because making some examples I find out that somethings could go wrong. But before entering in that, I will comment a little the main the test that without minimize-context doesn't work and with the option passes(when the implementation is ready the most likely is I will upload the example and some more in ExamplesRepos), and some "tentatives options" for darcs send that I think could be useful.

So, the minimal example:

Suppose we have repositories $r_1$, $r_2$ and $r_3$; $r_2$ a direct clone of $r_1$ where
originally $r_1$ has only one patch, 

$r_1$,$r_2$ $=$ $p_0$

now, we add a patch with a new file in $r_1$ and make a clone, $r_3$ which leave the repositories like this,

$r_1$,$r_3$ $=$ $p_0$ $p_1$

Suppose that we make a modification in the file that add $p_0$ and make a patch $p'_0$ in $r_1$ and we send a bundle (with darcs send) to $r_2$ and $r_3$. What ends up happening is what one imagine,
the bundle can be applied to $r_3$ but not to $r_2$, this because the bundle has de following "shape":

Some message

patch [Some hash]
Author: ...
Date: ...
  * [The modification to $p_0$]

New patches:




Patch bundle hash: [Some hash]

and $r_2$ doesn't have the patch $p_1$ despite the fact that $p_1$ nothing has to do with $p'_0$. Using the option minimize-context the bundle is the same but the "Context" is:



and now there is no problem.

Now, I find out that reducing the context is not always the "best option" for send a bundle. Here I will expose something that could go wrong. Take for example the darcs-screened repository, in total has 11000~ patches, suppose we add a file and make a bundle (using minimize-context) with only the patch that make the add, the context then is empty. So, if we try to apply this bundle to, say the remote repository, this could never end...
What go wrong?; because the context is empty, making darcs apply [TheBundle] in some point try to merge the entire repository versus the "set" of patches to apply, this is very costly if we have 11000~ patches.

I write "best option" before because with my current solution is not always is better reduce the context, but the final idea is that always be the best option. So, I think is necessary to have some care for example in the last implementation if exist a tag, this tag is added to the context. This simple solution solves the problem and seems correct, on the other hand having more than 600~ or 1000~ patches without a tag seem a little extreme :)

Ending, I'm making some tests of two interesting options that came to mind:

. --last-deps=NUMBER
. --deps-from-tag=REGEXP

The first searches dependencies in the first N patches, and the second search since a given tag.

In conclusion, I'm still doubtful about how solve that problem. More if I have consider the two options. For the end of the week I hope have all this solved.

jueves, 24 de julio de 2014

Some Week (14-19 July)

Hi all!

I was finishing understanding and implementing the command darcs send --minimize-context using "optimize reorder" when I begin to suspect that doesn't solve the problem described in here. The thing is, despite the fact that the context in the bundle to send is reduced if before we send we make "optimize reorder", this doesn't solve the problem of dependencies. Guillaume finished of evacuate my doubts, and so after read:

[darcs-users] darcs cannot apply some patch bundles
issue1514 (which is the issue which "replace" issue2044 darcs send should do optimize --reorder)

I convince myself of what needs to be done, and it's calculate the "exact" dependencies of the patches to send so such dependencies be the context in the bundle to send. "Exact" because for big repositories can be very costly and calculate till certain tag seem appropriate.

Now, one concern is the cost of doing the search of dependencies. About this I can first comment some of the things I was doing during the week and later show, what I think are, encouraging examples. So first, maybe the most relevant thing of the week it's the implementation of the command darcs show dependencies with the following "description":

Usage: darcs show dependencies [OPTION]... 
Generate the graph of dependencies.

              --from-match=PATTERN  select changes starting with a patch matching PATTERN
              --from-patch=REGEXP   select changes starting with a patch matching REGEXP
              --from-tag=REGEXP     select changes starting with a tag matching REGEXP
              --last=NUMBER         select the last NUMBER patches
              --matches=PATTERN     select patches matching PATTERN
  -p REGEXP   --patches=REGEXP      select patches matching REGEXP
  -t REGEXP   --tags=REGEXP         select tags matching REGEXP
              --disable             disable this command
  -h          --help                shows brief description of command and its arguments

till the moment the command returns a graph described in dot language, this can eventually change. But with the current returned value one can do:

$\$$ darcs show dep | dot -Tpdf -o deps.pdf

to draw the graph in a pdf. Finally, in summary to calculate the dependencies I use more or less the idea which describes Ganesh in here.

Moving to the examples is interesting, thinking in the performance of the implementation of darcs send --minimize-context using this approach, to see the followings results:

1. Show the dependencies after the tag 2.9.9 (75 patches)
$ time darcs show dep --from-tag=2.9.9
real 0m0.397s
user 0m0.373s
sys 0m0.026s

2. Show the dependencies after the tag 2.9.8 (133 patches)
$ time darcs show dep --from-tag=2.9.8
real 0m2.951s
user 0m2.865s
sys 0m0.082s

3. Show the dependencies after the tag 2.9.7 (288 patches)
$ time darcs show dep --from-tag=2.9.7
real 0m26.654s
user 0m26.003s
sys 0m0.511s

4. Show the dependencies after the tag 2.9.6 (358 patches)
$ time darcs show dep --from-tag=2.9.6
real 0m39.019s
user 0m38.302s
sys 0m0.666s

5. Show the dependencies after the tag 2.9.5 (533 patches)
$ time darcs show dep --from-tag=2.9.5
real 1m53.730s
user 1m51.343s
sys 0m1.939s

Rushed conclusion, seems the performance is quite good even more if we think that for compute the graph dependencies we calculate the dependencies of "all the selected patches against all the selected patches" and in the case of the option for send would only require to calculate "patches to send against all the selected patches".

lunes, 14 de julio de 2014

Month of June

Here goes a little summary of what I been doing between late june (9~21) and early july (1~11).

First and easy, I have been documenting Darcs.misplacedPatches (old name chooseOrder), D.P.W.Ordered and D.P.W.Sealed. Something to comment is that the semantics of misplacedPatches, not always can clean a tag doing darcs optimize reorder. For example; Suppose we have a repository, $r_1$ with the following patches;

$r_1$ $=$ $t_{1,0}$ $p_{1,0}$ $t_{1,1}$

here all tags are clean, but if we make another repository, say $r_2$, and we pull from $r_1$ of the
following way

$\$$ darcs pull -a -p $p_{1,0}$ $r_1$ (we want to pull the patch $p_{1,0}$, we assume that the name of the patch is $p_{1,0}$ for the matching with -p option)
$\$$ darcs pull -a $r_1$

so now we have,

$r_2$ $=$ $p_{1,0}$ $t_{1,0}$ $t_{1,1}$

and we see that $t_{1,0}$ is dirty. Doing darcs optimize reorder not reorder nothing. What is going on is that to know what reorder, misplacePatches takes the first tag, in our case $t_{1,1}$, and
"search" for what patches he don't tag. But $p_{1,0}$ and $t_{1,0}$ are tagged by $t_{1,1}$ so there is nothing to reorder despite $t_{1,0}$ is dirty. Therefore there is no way of clean $t_{1,0}$ because misplacePatches always takes the first tag, so if a tag is tagging one or more dirty tags, this tags never be available to get clean.

"Second", using the implementation of "reorder" one can get almost for free the option --reorder for the commands pull, apply and rebase pull. The behavior for the case of pull (for the others commands is the same basic idea) is that our local patches remain on top after a pull from a remote repository, e.i. suppose we have the followings $l$(ocal) and $r$(emote) repositories,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $lp_{n+1}$ $\ldots$ $lp_m$

$r$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $rp_{n+1}$ $\ldots$ $rp_k$

where $lp$ are the local patches that don't belong to $r$, and vice versa for $rp$. Make darcs pull, leaves $l$ as follow,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $lp_{n+1}$ $\ldots$ $lp_m$ $rp_{n+1}$ $\ldots$ $rp_k$

meanwhile make darcs pull --reorder, leaves $l$,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $rp_{n+1}$ $\ldots$ $rp_k$ $lp_{n+1}$ $\ldots$ $lp_m$

making more easy to send the $lp$ patches later.

"Third", beginning a new task, implement option minimize-context for command darcs send. Still no much to comment, I have almost finished implementing the option but with some doubts, I hope that for the end of the week have a more "prettier" implementation as well as a better understanding.