Aine Fairbrother-Browne, Sonia García-Ruiz, Regina Hertfelder Reynolds, Mina Ryten, Alan Hodgkinson, ensemblQueryR: fast, flexible and high-throughput querying of Ensembl LD API endpoints in R, Gigabyte, 2023 https://doi.org/10.46471/gigabyte.91
ensemblQueryR function | Ensembl API endpoint | Arguments | Output | Description |
---|---|---|---|---|
ensemblQueryGetPops | Information [5] | N/A | A list of human populations for which LD metrics can be retrieved. | This function retrieves a list of the Ensembl populations for which LD metrics can be queried (for further information on the available populations, see Ensembl [13]). These can be supplied to the ‘pop’ argument across ensemblQueryR functions. |
pingEnsembl | Information [5] | N/A | An integer (and a message) to indicate the status of the Ensembl API. Returns 1 and reports “Server OK.” if the server is up. | This function checks and informs the user of the status of the Ensembl API. |
ensemblQueryLDwithSNPwindow | Window [5] | rsidr2d.primewindow.sizepop | A data frame with five columns: ‘query’ (the variant input to ‘rsid’), snp_in_ld (variant(s) in LD with ‘query’), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function retrieves variants in LD with the query variant within a given genomic window. |
ensemblQueryLDwithSNPwindowDataframe | Window [5] | in.tabler2d.primewindow.sizepopcores | A data frame with five columns: ‘query’ (the variant input to ‘rsid’), snp_in_ld (variant(s) in LD with ‘query’), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function takes a data frame with a column of variant rsIDs (Reference SNP cluster IDs). It retrieves variants in LD with each query variant within a given genomic window. |
ensemblQueryLDwithSNPpair | Pair [5] | rsid1rsid2pop | A data frame with five columns: ‘query1’ (the variant input to ‘rsid1’), ‘query2’ (the variant input to ‘rsid2’), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function takes a pair of rsIDs and retrieves their LD metrics (D′ and R2). |
ensemblQueryLDwithSNPpairDataframe | Pair [5] | in.tablepopcores | A data frame with five columns: ‘query1’ (the variant input to ‘rsid1’), ‘query2’ (the variant input to ‘rsid2’), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function takes a data.frame containing paired rsIDs, retrieving LD metrics (D′ and R2) for all pairs. |
ensemblQueryLDwithSNPregion | Region [5] | chrstartendpop | A data frame with eight columns: ‘query_chr’ (the query chromosome supplied to ‘chr’), ‘query_start’ (the query start coordinate supplied to ‘start’), ‘query_end’ (the query end coordinate supplied to ‘end’), ‘rsid1’ (variant one of two in the pair), ‘rsid2’ (variant two of two in the pair), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function takes a genomic coordinate, retrieving LD metrics (D′ and R2) for all rsID within the defined region. |
ensemblQueryLDwithSNPregionDataframe | Region [5] | in.tablepopcores | A data frame with eight columns: ‘query_chr’ (the query chromosome supplied to ‘chr’), ‘query_start’ (the query start coordinate supplied to ‘start’), ‘query_end’ (the query end coordinate supplied to ‘end’), ‘rsid1’ (variant one of two in the pair), ‘rsid2’ (variant two of two in the pair), r2 (r-squared statistic), d_prime (D′ statistic), population_name (the population supplied to ‘pop’). | This function takes a data frame containing genomic coordinate(s) and retrieves LD metrics (D′ and R2) for all rsID within the defined region(s). |