Skip to content

Commit

Permalink
Release v2.3.0
Browse files Browse the repository at this point in the history
- add support for MCT algorithm
- update documentation
- fix minor bugs
  • Loading branch information
Ayan-Kumar-Saha committed Aug 28, 2020
1 parent 8e5235e commit 9fe0757
Show file tree
Hide file tree
Showing 5 changed files with 51 additions and 28 deletions.
54 changes: 34 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Markov Chain Type 4 Rank Aggregation
**implementation of MC4 Rank Aggregation algorithm using Python**
**implementation of MC4 and MCT Rank Aggregation algorithm using Python**

## Description

This project is all about implementing one of the most popular rank aggregation algorithms **Markov Chain Type 4** or **MC4**. In the field of Machine Learning and many other scientific problems, several items are often needed to be ranked based on some criterion. However, different ranking schemes order the items based on different preference criteria. Hence the rankings produced by them may differ greatly.
This project is all about implementing two of the most popular rank aggregation algorithms, **Markov Chain Type 4** or **MC4** and **MCT**. In the field of Machine Learning and many other scientific problems, several items are often needed to be ranked based on some criterion. However, different ranking schemes order the items based on different preference criteria. Hence the rankings produced by them may differ greatly.

Therefore a rank aggregation technique is often used for combining the individual rank lists into a single aggregated ranking. Though there are many rank aggregation algorithms, MC4 is one of the most renowned ones.
Therefore a rank aggregation technique is often used for combining the individual rank lists into a single aggregated ranking. Though there are many rank aggregation algorithms, MC4 and MCT are two of the most renowned ones.

## Resource

Expand All @@ -23,24 +23,31 @@ For a specific release, `pip install mc4=={version}` such as `pip install mc4==1

## General Usage

Using this package is very easy. You just need the following three lines of code to use the package.
Using this package is very easy.

1. Prepare a dataset containing ranks of all the items provided by different algorithms. See [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/README.md) for sample datasets and more info.

2. Use following lines of code to use the package. Make sure to pass arguments according to your dataset otherwise answers will be incorrect.

```python
from mc4.algorithm import mc4_aggregator
import pandas as pd

aggregated_ranks = mc4_aggregator('dataset.csv')
# Method 1
aggregated_ranks = mc4_aggregator('test_dataset_1.csv', header_row = 0, index_col = 0)

# or

aggregated_ranks = mc4_aggregator(df)
# or Method 2
df = pd.read_csv('test_dataset_1.csv', header = 0, index_col = 0)
aggregated_ranks = mc4_aggregator(df, header_row = 0, index_col = 0)

print(aggregated_ranks)
```
here `dataset.csv` or `df` are lists of ranks provided by different ranking algorithms or rank lists. *You can refer [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/datasets.md) for more info and some test datasets.*
here `test_dataset_1.csv` is a sample dataset containing ranks of different items provided by different algorithms.

`mc4_aggregator` takes some additional arguments as well.
`mc4_aggregator` takes some mandatory and optional arguments -

* `order (string)`: order of the dataset, default is `'row'`. More on this, [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/datasets.md).
* `algo (string)`: algorithm for rank aggregation, `mc4` or `mct`, default is `mc4`
* `order (string)`: order of the dataset, `row` or `column`, default is `row`. More on this, [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/README.md).
* `header_row (int or None)`: row number of the dataset containing the header, default is `None`
* `index_col (int or None)`: column number of the dataset containing the index, default is `None`
* `precision (float)`: acceptable error margin for convergence, default is `1e-07`
Expand All @@ -49,49 +56,56 @@ here `dataset.csv` or `df` are lists of ranks provided by different ranking algo

## Command Line Usage

You can directly use this package from command line if you have the dataset prepared already.

* To get help and usage details,
```shell
~$ mc4_aggregator -h or --help
```

* Use with default settings,
```shell
~$ mc4_aggregator <data source> e.g. mc4_aggregator dataset.csv
~$ mc4_aggregator dataset.csv
```

* Specify the algorithm for rank aggregation using `-a` or `--algo`, options: `mc4` or `mct`, default is `mc4`
```shell
~$ mc4_aggregator dataset.csv -a mct
```

* Specify order using `-o`or `--order`, default is `row`
* Specify order using `-o`or `--order`, options: `row` or `column`, default is `row`
```shell
~$ mc4_aggregator <data source> -o <order> e.g. mc4_aggregator dataset.csv -o column
~$ mc4_aggregator dataset.csv -o column
```

* Specify header row using `-hr` or `--header_row`, default is `None`
```shell
~$ mc4_aggregator <data source> -hr <header row> e.g. mc4_aggregator dataset.csv -hr 1
~$ mc4_aggregator dataset.csv -hr 0
```

* Specify index column using `-ic` or `--index_col`, default is `None`
```shell
~$ mc4_aggregator <data source> -ic <index column> e.g. mc4_aggregator dataset.csv -ic 1
~$ mc4_aggregator dataset.csv -ic 0
```

* Specify precision using `-p` or `--precision`, default is `1e-07`
```shell
~$ mc4_aggregator <data source> -p <precision> e.g. mc4_aggregator dataset.csv -p 0.000001
~$ mc4_aggregator dataset.csv -p 0.000001
```

* Specify iterations using `-i` or `--iterations`, default is `200`
```shell
~$ mc4_aggregator <data source> -i <iterations> e.g. mc4_aggregator dataset.csv -i 300
~$ mc4_aggregator dataset.csv -i 300
```

* Specify ergodic number using `-e` or `--erg_number`, default is `0.15`
```shell
~$ mc4_aggregator <data source> -p <precision> e.g. mc4_aggregator dataset.csv -e 0.20
~$ mc4_aggregator dataset.csv -e 0.20
```

* All together,
```shell
~$ mc4_aggregator dataset.csv -o column -hr 1 -ic 1 -p 0.000001 -i 300 -e 0.20
~$ mc4_aggregator dataset.csv -a mct -o column -hr 0 -ic 0 -p 0.000001 -i 300 -e 0.20
```

## Output
Expand Down
19 changes: 14 additions & 5 deletions mc4/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,13 @@ def get_matrix_shape(df):
return rows, cols


def get_partial_transition_matrix(df, items, lists):
def get_partial_transition_matrix(df, algo, items, lists):

"""Returns the partial transition matrix from the dataframe containing different ranks
Args:
df (pandas.core.DataFrame): dataframe object containing different ranks
algo (string): mc4 or mct
items (int): number of items
lists (int): number of lists
Expand All @@ -70,10 +71,13 @@ def get_partial_transition_matrix(df, items, lists):

if result == 0 and i==j:
val = -1
elif result >= (lists/2):
elif result > (lists/2):
val = 0
else:
val = 1
if algo == 'mc4':
val = 1
else:
val = (lists-result) / lists

matrix_input = val

Expand Down Expand Up @@ -216,12 +220,13 @@ def get_mapped_final_ranks(df, final_ranks, index_col):
return ranks


def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, precision=0.0000001, iterations=200, erg_number=0.15):
def mc4_aggregator(source, algo='mc4', order = 'row', header_row=None, index_col=None, precision=0.0000001, iterations=200, erg_number=0.15):

"""Performs aggregation on different ranks using Markov Chain Type 4 Rank Aggeregation algorithm and returns the aggregated ranks
Args:
file_path (string): path of the dataset file containing all different ranks
algo (string): mc4 or mct, default is mc4
order (string): order of the dataset, default is row i.e. row-major
header_row (int or None): row number of the dataset containing the header, default is None
index_col (int or None): column number of the dataset containing the index, default is None
Expand All @@ -233,6 +238,9 @@ def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, preci
list: contestantwise aggregated ranks
"""

if algo not in ['mc4', 'mct']:
raise Exception(f"Invalid ranking algorithm '{algo}'")

if isinstance(source, str) and is_csv(source):

if is_valid_path(source):
Expand All @@ -251,9 +259,10 @@ def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, preci
else:
raise Exception(f"Unsupported data source '{get_filename(source)}'")


rows, cols = get_matrix_shape(df)

partial_transition_matrix = get_partial_transition_matrix(df, rows, cols)
partial_transition_matrix = get_partial_transition_matrix(df, algo, rows, cols)

normalized_transition_matrix = get_normalized_transition_matrix(partial_transition_matrix, rows)

Expand Down
4 changes: 2 additions & 2 deletions mc4/command_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
parser = argparse.ArgumentParser(description='Takes necessary inputs for mc4_aggegator')

parser.add_argument('source', type=str, help='source of the lists of ranks')
parser.add_argument('-a', '--algo', type=str, default='mc4', help='rank aggregation algorithm, mc4 or mct, default is mc4', choices=['mc4', 'mct'])
parser.add_argument('-o', '--order', type=str, default='row', help='order of the dataset, default is row', choices=['row', 'column'])
parser.add_argument('-hr', '--header_row', type=int, help='row number of the header, default is None')
parser.add_argument('-ic', '--index_col', type=int, help='column number of the index, default is None')
Expand All @@ -14,5 +15,4 @@
args = parser.parse_args()

def main():
print(mc4_aggregator(args.source, args.order, args.header_row, args.index_col, args.precision, args.iterations, args.erg_number))

print(mc4_aggregator(args.source, args.algo ,args.order, args.header_row, args.index_col, args.precision, args.iterations, args.erg_number))
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name="mc4",
version="2.2.1",
version="2.3.0",
author="Ayan Kumar Saha",
author_email="ayankumarsaha96@gmail.com",
description="A python package for implementing Markov Chain Type 4 rank aggregation",
Expand Down
File renamed without changes.

0 comments on commit 9fe0757

Please sign in to comment.