grape.ensmallen.datasets.linqs
This sub-module offers methods to automatically retrieve the graphs from LINQS repository.
View Source
"""This sub-module offers methods to automatically retrieve the graphs from LINQS repository.""" from .pubmeddiabetes import PubMedDiabetes from .cora import Cora from .citeseer import CiteSeer __all__ = [ "PubMedDiabetes", "Cora", "CiteSeer", ]
View Source
def PubMedDiabetes( directed: bool = False, preprocess: bool = True, load_nodes: bool = True, verbose: int = 2, cache: bool = True, cache_path: str = "graphs/linqs", version: str = "latest", **additional_graph_kwargs: Dict ) -> Graph: """Return new instance of the PubMedDiabetes graph. The graph is automatically retrieved from the LINQS repository. The Pubmed Diabetes dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words. Parameters ------------------- directed: bool = False Wether to load the graph as directed or undirected. By default false. preprocess: bool = True Whether to preprocess the graph to be loaded in optimal time and memory. load_nodes: bool = True, Whether to load the nodes vocabulary or treat the nodes simply as a numeric range. verbose: int = 2, Wether to show loading bars during the retrieval and building of the graph. cache: bool = True Whether to use cache, i.e. download files only once and preprocess them only once. cache_path: str = "graphs" Where to store the downloaded graphs. version: str = "latest" The version of the graph to retrieve. additional_graph_kwargs: Dict Additional graph kwargs. Returns ----------------------- Instace of PubMedDiabetes graph. References --------------------- Please cite the following if you use the data: ```bib @inproceedings{namata2012query, title={Query-driven active surveying for collective classification}, author={Namata, Galileo and London, Ben and Getoor, Lise and Huang, Bert and EDU, UMD}, booktitle={10th International Workshop on Mining and Learning with Graphs}, volume={8}, year={2012} } ``` """ return AutomaticallyRetrievedGraph( graph_name="PubMedDiabetes", repository="linqs", version=version, directed=directed, preprocess=preprocess, load_nodes=load_nodes, verbose=verbose, cache=cache, cache_path=cache_path, additional_graph_kwargs=additional_graph_kwargs, callbacks=[ parse_linqs_pubmed_incidence_matrix ], callbacks_arguments=[ { "cites_path": "Pubmed-Diabetes/Pubmed-Diabetes/data/Pubmed-Diabetes.DIRECTED.cites.tab", "content_path": "Pubmed-Diabetes/Pubmed-Diabetes/data/Pubmed-Diabetes.NODE.paper.tab", "node_path": "nodes.tsv", "edge_path": "edges.tsv" } ] )()
Return new instance of the PubMedDiabetes graph.
The graph is automatically retrieved from the LINQS repository. The Pubmed Diabetes dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
Parameters
- directed (bool = False): Wether to load the graph as directed or undirected. By default false.
- preprocess (bool = True): Whether to preprocess the graph to be loaded in optimal time and memory.
- load_nodes (bool = True,): Whether to load the nodes vocabulary or treat the nodes simply as a numeric range.
- verbose (int = 2,): Wether to show loading bars during the retrieval and building of the graph.
- cache (bool = True): Whether to use cache, i.e. download files only once and preprocess them only once.
- cache_path (str = "graphs"): Where to store the downloaded graphs.
- version (str = "latest"): The version of the graph to retrieve.
- additional_graph_kwargs (Dict): Additional graph kwargs.
Returns
- Instace of PubMedDiabetes graph.: References
Please cite the following if you use the data:
@inproceedings{namata2012query,
title={Query-driven active surveying for collective classification},
author={Namata, Galileo and London, Ben and Getoor, Lise and Huang, Bert and EDU, UMD},
booktitle={10th International Workshop on Mining and Learning with Graphs},
volume={8},
year={2012}
}
View Source
def Cora( directed: bool = False, preprocess: bool = True, load_nodes: bool = True, verbose: int = 2, cache: bool = True, cache_path: str = "graphs/linqs", version: str = "latest", **additional_graph_kwargs: Dict ) -> Graph: """Return new instance of the Cora graph. The graph is automatically retrieved from the LINQS repository. The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. Parameters ------------------- directed: bool = False Wether to load the graph as directed or undirected. By default false. preprocess: bool = True Whether to preprocess the graph to be loaded in optimal time and memory. load_nodes: bool = True, Whether to load the nodes vocabulary or treat the nodes simply as a numeric range. verbose: int = 2, Wether to show loading bars during the retrieval and building of the graph. cache: bool = True Whether to use cache, i.e. download files only once and preprocess them only once. cache_path: str = "graphs" Where to store the downloaded graphs. version: str = "latest" The version of the graph to retrieve. additional_graph_kwargs: Dict Additional graph kwargs. Returns ----------------------- Instace of Cora graph. References --------------------- Please cite the following if you use the data: ```bib @incollection{getoor2005link, title={Link-based classification}, author={Getoor, Lise}, booktitle={Advanced methods for knowledge discovery from complex data}, pages={189--207}, year={2005}, publisher={Springer} } @article{sen2008collective, title={Collective classification in network data}, author={Sen, Prithviraj and Namata, Galileo and Bilgic, Mustafa and Getoor, Lise and Galligher, Brian and Eliassi-Rad, Tina}, journal={AI magazine}, volume={29}, number={3}, pages={93--93}, year={2008} } ``` """ return AutomaticallyRetrievedGraph( graph_name="Cora", repository="linqs", version=version, directed=directed, preprocess=preprocess, load_nodes=load_nodes, verbose=verbose, cache=cache, cache_path=cache_path, additional_graph_kwargs=additional_graph_kwargs, callbacks=[ parse_linqs_incidence_matrix ], callbacks_arguments=[ { "cites_path": "cora/cora/cora.cites", "content_path": "cora/cora/cora.content", "node_path": "nodes.tsv", "edge_path": "edges.tsv" } ] )()
Return new instance of the Cora graph.
The graph is automatically retrieved from the LINQS repository. The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
Parameters
- directed (bool = False): Wether to load the graph as directed or undirected. By default false.
- preprocess (bool = True): Whether to preprocess the graph to be loaded in optimal time and memory.
- load_nodes (bool = True,): Whether to load the nodes vocabulary or treat the nodes simply as a numeric range.
- verbose (int = 2,): Wether to show loading bars during the retrieval and building of the graph.
- cache (bool = True): Whether to use cache, i.e. download files only once and preprocess them only once.
- cache_path (str = "graphs"): Where to store the downloaded graphs.
- version (str = "latest"): The version of the graph to retrieve.
- additional_graph_kwargs (Dict): Additional graph kwargs.
Returns
- Instace of Cora graph.: References
Please cite the following if you use the data:
@incollection{getoor2005link,
title={Link-based classification},
author={Getoor, Lise},
booktitle={Advanced methods for knowledge discovery from complex data},
pages={189--207},
year={2005},
publisher={Springer}
}
@article{sen2008collective,
title={Collective classification in network data},
author={Sen, Prithviraj and Namata, Galileo and Bilgic, Mustafa and Getoor, Lise and Galligher, Brian and Eliassi-Rad, Tina},
journal={AI magazine},
volume={29},
number={3},
pages={93--93},
year={2008}
}
View Source
def CiteSeer( directed: bool = False, preprocess: bool = True, load_nodes: bool = True, verbose: int = 2, cache: bool = True, cache_path: str = "graphs/linqs", version: str = "latest", **additional_graph_kwargs: Dict ) -> Graph: """Return new instance of the CiteSeer graph. The graph is automatically retrieved from the LINQS repository. The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words. Parameters ------------------- directed: bool = False Wether to load the graph as directed or undirected. By default false. preprocess: bool = True Whether to preprocess the graph to be loaded in optimal time and memory. load_nodes: bool = True, Whether to load the nodes vocabulary or treat the nodes simply as a numeric range. verbose: int = 2, Wether to show loading bars during the retrieval and building of the graph. cache: bool = True Whether to use cache, i.e. download files only once and preprocess them only once. cache_path: str = "graphs" Where to store the downloaded graphs. version: str = "latest" The version of the graph to retrieve. additional_graph_kwargs: Dict Additional graph kwargs. Returns ----------------------- Instace of CiteSeer graph. References --------------------- Please cite the following if you use the data: ```bib @incollection{getoor2005link, title={Link-based classification}, author={Getoor, Lise}, booktitle={Advanced methods for knowledge discovery from complex data}, pages={189--207}, year={2005}, publisher={Springer} } @article{sen2008collective, title={Collective classification in network data}, author={Sen, Prithviraj and Namata, Galileo and Bilgic, Mustafa and Getoor, Lise and Galligher, Brian and Eliassi-Rad, Tina}, journal={AI magazine}, volume={29}, number={3}, pages={93--93}, year={2008} } ``` """ return AutomaticallyRetrievedGraph( graph_name="CiteSeer", repository="linqs", version=version, directed=directed, preprocess=preprocess, load_nodes=load_nodes, verbose=verbose, cache=cache, cache_path=cache_path, additional_graph_kwargs=additional_graph_kwargs, callbacks=[ parse_linqs_incidence_matrix ], callbacks_arguments=[ { "cites_path": "citeseer/citeseer/citeseer.cites", "content_path": "citeseer/citeseer/citeseer.content", "node_path": "nodes.tsv", "edge_path": "edges.tsv" } ] )()
Return new instance of the CiteSeer graph.
The graph is automatically retrieved from the LINQS repository. The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
Parameters
- directed (bool = False): Wether to load the graph as directed or undirected. By default false.
- preprocess (bool = True): Whether to preprocess the graph to be loaded in optimal time and memory.
- load_nodes (bool = True,): Whether to load the nodes vocabulary or treat the nodes simply as a numeric range.
- verbose (int = 2,): Wether to show loading bars during the retrieval and building of the graph.
- cache (bool = True): Whether to use cache, i.e. download files only once and preprocess them only once.
- cache_path (str = "graphs"): Where to store the downloaded graphs.
- version (str = "latest"): The version of the graph to retrieve.
- additional_graph_kwargs (Dict): Additional graph kwargs.
Returns
- Instace of CiteSeer graph.: References
Please cite the following if you use the data:
@incollection{getoor2005link,
title={Link-based classification},
author={Getoor, Lise},
booktitle={Advanced methods for knowledge discovery from complex data},
pages={189--207},
year={2005},
publisher={Springer}
}
@article{sen2008collective,
title={Collective classification in network data},
author={Sen, Prithviraj and Namata, Galileo and Bilgic, Mustafa and Getoor, Lise and Galligher, Brian and Eliassi-Rad, Tina},
journal={AI magazine},
volume={29},
number={3},
pages={93--93},
year={2008}
}