## Thursday, September 17, 2009

### Genetic Algorithms (Thoughtful Thursday Post)

In genetic algorithms, one represent information as a Chromosome, a series of values. E. G., assume we are optimizing a gas pipeline, that has five pumping stations each with a pressure differential and we have five pipe diameters between them. Thus we would have for two possibilities. I will refer to them as Configuration A and Configuration B.
```P(1) P(2) P(3) P(4) P(5) D(1) D(2) D(3) D(4) D(5)

300  200  150  75    80   1.2  1.5  1.3  2.2  3.3

P(1) P(2) P(3) P(4) P(5) D(1) D(2) D(3) D(4) D(5)

275  300  125  140   140  1.8  1.2  1.4  2.1  3.1
```
This might represent the following real world configuration where the p(1)..p(5) in boxes are the pressure of the pumping stations and the d(1) to d(50 are the diameter of the pipes:
```                           d(3)
_____      ______    _____   _____     _____
|    |d(1) |    |d(2)|    |  |    |d(4)|    |d(5)
|P(1)|==== |p(2)|=== |p(3)|==|p(4)|=== |p(5)|===
|    |     |    |    |    |  |    |    |    |
_____      ______    _____   _____     _____
```
The most important operation is crossover, where two possible pipleine configurations "mate" and produce two offspring. The system randomly chooses a point in the chromose, and the two mates get switched at that point. Assume the random number generate says cross over after D(2). The two children in the next set would be:
```P(1) P(2) P(3) P(4) P(5) D(1) D(2) D(3) D(4) D(5)

300  200  150  75    80   1.2  1.5  1.4  2.1  3.1

P(1) P(2) P(3) P(4) P(5) D(1) D(2) D(3) D(4) D(5)

275  300  125  140   140  1.8  1.2  1.3  2.2  3.3
```
The second important operation is fitness evaluation and preferring high fitnesses. In the pipeline scenario, this fitness would probably the inverse cost. Thus, assume the two original configurations had costs of \$100.00 and \$75.00. Assume that there were two other configurations, call them Configuration C and Configuration D, with costs of \$300.00 and \$150.00.

The genetic algorithm uses random number generators to determine the chance of each configuration mating and thus contributing to the next set of chromosomes, the population in the terms of the Illigal algorithms. Then configuration B would have four times the chance of mating as C and D would have half the chance of mating as configuration C.

# Application to taxation and budgeting

In our project, the chromosome will include the various tax rates. These are the marginal tax rates for each quintile of income (with a few special ones for the richest.

```R(1) - tax rate for those in the bottom twenty percent of income
R(2) - tax  rate for those in the twenty to fourty percent of income
R(3) - tax rate for those in the fourty to sixty percent of income
R(4) - tax rate for those in the sixty to eighty percent of income
R(5) - tax rate for those making between eighty to ninety-five percent
of income
R(6) - tax rate for those making between ninety-five percent and
ninety percent of the income
R(7) - tax rate for those in the top one percent of income
```
And then, we have the budget amounts, how much to be spent on:
```R(8) - social security
R(9) - defense
R(10) - education
R(11) - medicare
R(12) - transportation
R(13) - agriculture
R(14) - aid to low income families
R(15) - training labor and unemployment
R(16) - international affairs
```

## Schemata

David Goldberg's book very nicely explains schemata (Genetic algorithms in search, optimization, and machine learning / by David E. Goldberg. Published: Reading, Mass. : Addison-Wesley Pub. Co., 1989, which is where I got the information in this article ). Schemata represent that we have chosen several fields. The other's have not been chosen yet. In our first example, the schemata
```P(1)=300 * * * * * D(1)=1.2 * * * * *
```
represents a specific value for the pressure in pumping station one and a specific value for the pipe to its right. The schemata,
```P(1)=300 P(2) = 200 * * *  * * * * *
```
represents all the possible configurations with pumping station one having pressure of 300 and pumping station two having pressure of 200.

In crossover, which schemata is most likely to be disrupted by the crossover which occurs each cycle, it is the first with about a fifty percent chance of getting disrupted. The second schemata has only a ten percent chance of getting disrupted as of the ten possible divisions only one will disrupt the schemata.

(These are approximations as most genetic algorithms use scaling to get better performance to prevent random variations that happenn to fit from dominating in the early stages and to magnify differences for fine tuning at the end of the optimization, but that doesn't concern the main idea of what we are doing in social choice.)

In taxes, schemata might represent

```R(1) = 0.1 R(2) = 0.2 * * * *  * * * * *  *
```
The tax rates forthe two lowest income quintiles thus constitute a schema, and since they are next to each other on the chromosome, they would be most likely preserved. On the other hand, a relationship between the tax rate on the lowest quintile and the money for welfare (aid to low-income persons) would not likely to be preserved. , e.g.
```R(1) = 0.1 * * * *   * * * * *  * * * R(14) = 10.0E6 *  *
```
That is because the first is on R(1) and the second is on R(14). Practically every cross-over will ruin this connection.

There are two other operators, mutation, which is randomly changing one value in the chromosome and inversion which is changing the order of operations. In this space, we could allow individuals to make proposals of new chromosomes or changes. Mutations thus introduce new possible budgets into the mix.

An interesting twist would be to proceed the genetic algorithm by a process where individuals give their degree of associativeness between two characteristics. This would be a pseudo-inversion.

MR. Lessard, a graduate student in our department, has started two programs in that direction: He implemented a web system for the above genetic algorithm.

1. After logging in and do "informed consent" stuff, the participants enters their ideal budget. Thus, the initial population consists of everyone's ideal budgets.
2. When the system has everyone's ideal budget (or at least from enough people), it will cross them over. Each of these will be presented for rating.
3. The users will rate five budgets each.
4. The system will do cross over on the results. These will be based on the fitness ratings in step three.