Python module
Overview
BibTeX
Clean a BibTeX database. |
|
Remove unnecessary fields from BibTex database. |
|
Merge items that have the same keys from BibTex database. |
|
Get entry's digital identifiers. |
|
Format the journal entry for arXiv preprints. |
|
Check online databases (can be slow!). |
Automatic formatting
Remove wrapping "{...}". |
|
Abbreviate first name(s) to initials. |
|
Return last name as 'citation key'. |
Automatic information extraction
|
Try to match a doi, return the first match. |
|
Try to match a arxiv-id, return the first match. |
Official journal names
|
Load a journal-database from a YAML-file ( |
|
Load database(s) from default locations. |
Return the config directory. |
|
Update the default databases shipped with GooseBib. |
|
Generate (an up-to-date) version of one of the default databases shipped in GooseBib. |
|
|
Generate a database from JabRef. |
|
Dump database to YAML-file (see |
|
Simple class to store journal info. |
|
Store journal database as list of journals, which allows efficient handling. |
GooseBib.bibtex
For BibTeX files:
Automatic formatting.
Check if up-to-date.
Compare.
- GooseBib.bibtex.GbibClean(cli_args: list[str] = None)
Command-line tool to clean a BibTeX database, see
--help.
- GooseBib.bibtex.GbibDiscover()
Command-line tool to compare a BibTeX database for online databases, see
--help.
- GooseBib.bibtex.GbibShowAuthorRename()
Show author rename if
GbibCleanis applied, see--help.
- class GooseBib.bibtex.MyBibTexParser(data=None, **args)
Overload of
bibtexparser.bparser.BibTexParseradding an extra internal field"DISPLAY_ORDER"to preserve the order of each item.- parse(bibtex_str, *args, **kwargs)
Parse a BibTeX string into an object
- Parameters:
bibtex_str – BibTeX string
partial – If True, print errors only on parsing failures. If False, an exception is raised.
- Type:
str or unicode
- Type:
boolean
- Returns:
bibliographic database
- Return type:
BibDatabase
- class GooseBib.bibtex.MyBibTexWriter(*args, **kwargs)
Overload of
bibtexparser.bwriter.BibTexWriteracting on an extra internal field"DISPLAY_ORDER"to preserve the order of each item. In addition, there is an extra parametersort_entries = Falsethat controls if entries will by sorted based on the citation-key.- write(*args, **kwargs)
Converts a bibliographic database to a BibTeX-formatted string.
- Parameters:
bib_database (BibDatabase) – bibliographic database to be converted to a BibTeX string
- Returns:
BibTeX-formatted string
- Return type:
str or unicode
- GooseBib.bibtex.abbreviate_journal(data: list[dict], journal_type: str = 'abbreviation', journal_database: list[str] = ['pnas', 'physics', 'mechanics']) list[dict]
- GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs)
- GooseBib.bibtex.abbreviate_journal(data: IOBase, *args, **kwargs)
Abbreviate journals based on a standard library.
- Parameters:
data – The BibTeX database.
journal_type – Rename journal to its
"title","abbreviation", or"acronym".journal_database – Database(s) with official journal names/abbreviations/acronyms to use.
- GooseBib.bibtex.clean(data: list[dict], sep_name: str = '', sep_journal: str = '', title: bool = True, protect_math: bool = True, rm_unicode: bool = True, no_abbreviate: list[str] = [], select_fields: bool = True) list[dict]
- GooseBib.bibtex.clean(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.clean(data: str, *args, **kwargs) str
- GooseBib.bibtex.clean(data: IOBase, *args, **kwargs) str
Clean a BibTeX database.
Remove unnecessary fields (see
GooseBib.bibtex.select()).Unify the formatting of authors (see
GooseBib.reformat.abbreviate_firstname()).Ensure proper math formatting (see
GooseBib.reformat.protect_math()).Convert unicode to TeX (see
GooseBib.reformat.rm_unicode()).Fill digital identifier if it is not present but can be recognised from a different field, (see
GooseBib.bibtex.get_identifiers()).
- Parameters:
data – The BibTeX database.
sep_name – Separator for name initials (e.g. “”, “ “).
sep_journal – Separator for journal abbreviations (e.g. “”, “ “).
title – Include title.
protect_math – Apply fix in
GooseBib.reformat.protect_math().rm_unicode – Apply fix in
GooseBib.reformat.rm_unicode().no_abbreviate – List of entries for which to skip author abbreviation.
select_fields – Apply
selection()to the output.
- GooseBib.bibtex.clever_merge(data: list[dict], merge: bool = True) list[dict]
- GooseBib.bibtex.clever_merge(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.clever_merge(data: str, *args, **kwargs)
- GooseBib.bibtex.clever_merge(data: IOBase, *args, **kwargs)
Try to merge the same entries.
- Parameters:
data – The BibTeX database.
- GooseBib.bibtex.dbsearch_arxiv(data: list[dict], silent: bool = False) dict
- GooseBib.bibtex.dbsearch_arxiv(data: str, *args, **kwargs) BibDatabase
Check online databases (can be slow!).
- Parameters:
silent – Hide status bar.
- Returns:
Dictionary with discovered items.
- GooseBib.bibtex.format_journal_arxiv(data: list[dict], fmt: str, journal_database: list[str] = ['arxiv']) list[dict]
- GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs)
- GooseBib.bibtex.format_journal_arxiv(data: IOBase, *args, **kwargs)
Format the journal entry for arXiv preprints. Use
"{}"in the formatter to include the arxivid.- Parameters:
data – The BibTeX database.
fmt – Formatter, e.g.
"Preprint"or"Preprint: arXiv {}".journal_database – Database(s) with known arXiv variants.
- GooseBib.bibtex.get_identifiers(entry: dict) dict
Get entry’s digital identifiers. The following identifiers are returned (if found):
"doi""arxivid". Note that an arxivid as doi is returned (only) as arxivid.
- Parameters:
entry – The bib-entry.
- Returns:
Dictionary with the found identifiers.
- GooseBib.bibtex.manual_merge(data: list[dict], keys: list[Tuple[str, str]]) Tuple[list[dict], dict]
- GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) Tuple[BibDatabase, dict]
- GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) Tuple[str, dict]
- GooseBib.bibtex.manual_merge(data: IOBase, *args, **kwargs) Tuple[str, dict]
Merge items.
- Parameters:
data – The BibTeX database.
keys – List of keys for merge (key[1] merged into key[0]).
- Returns:
The BibTeX database. A dictionary mapping the new keys to the old keys.
- GooseBib.bibtex.parse(bibtex_str: str, aggresive: bool = False) str
Parse a BibTeX string once.
- Parameters:
aggresive – Use aggressive interpretation strategy.
- GooseBib.bibtex.read_display_order(bibtex_str: str, tabsize: int = 2) -> (<class 'dict'>, <class 'int'>)
Read order of fields of all entries.
- Parameters:
bibtex_str – A BibTeX ‘file’.
tabsize – Replace “ “ by a number of spaces.
- Returns:
A dictionary with a list of fields per key. The typical indentation.
- GooseBib.bibtex.select(data: list[dict], fields: dict[list[str]] | list[str] = None, ensure_link: bool = True, remove_url: bool = True) list[dict]
- GooseBib.bibtex.select(data: BibDatabase, *args, **kwargs) BibDatabase
- GooseBib.bibtex.select(data: str, *args, **kwargs) str
- GooseBib.bibtex.select(data: IOBase, *args, **kwargs)
Remove unnecessary fields from BibTex database.
- Parameters:
data – The BibTeX database.
fields – Fields to keep per entry type (default from
selection()). If a list is specified all entry types are treated the same.ensure_link – Add URL to
fieldsif nodoi,arxivid, or ,eprintis present.remove_url – Remove URL when either a
`doi,arxivid, or ,eprintis present.
- GooseBib.bibtex.selection(use_bibtexparser: bool = False) dict
List of fields to keep in a BibTeX file to get a useful list of references: fields that are not in this selection may be useful for a database, but might only cloud BibTeX output.
- Parameters:
use_bibtexparser – Add bibtexparser specific fields to select (not part of BibTeX output).
- GooseBib.bibtex.unique(data: list[dict], merge: bool = True) list[dict]
- GooseBib.bibtex.unique(data: BibDatabase, *args, **kwargs) BibDatabase
- GooseBib.bibtex.unique(data: str, *args, **kwargs) str
Merge items that have the same keys from BibTex database.
- Parameters:
data – The BibTeX database.
merge – Add fields from duplicate entries to the first entry.
- Returns:
The BibTeX database.
- GooseBib.bibtex.unique_keys(data: list[dict]) Tuple[list[dict], dict]
- GooseBib.bibtex.unique_keys(data: BibDatabase, *args, **kwargs) Tuple[BibDatabase, dict]
- GooseBib.bibtex.unique_keys(data: str, *args, **kwargs) Tuple[str, dict]
Rename keys that occur more than once in the BibTeX database.
- Parameters:
data – The BibTeX database.
- Returns:
The BibTeX database. A dictionary mapping the new keys to the old keys.
- GooseBib.bibtex.yaml_dump(filename, data, force=False)
Dump data to YAML file.
- Parameters:
filename (str) – The output filename.
data (list, dict) – The data to dump.
force (bool, optional) – Do not prompt to overwrite file.
GooseBib.reformat
Automatic formatting.
- GooseBib.reformat.abbreviate_firstname(name: str, sep: str = ' ') str
Abbreviate first name(s) to initials.
For example:
de Geus, Thomas Willem Jan -> de Geus, T. W. J.
- Parameters:
name – The name formatted as “Lastname, firstname secondname …”.
sep – Separator to place between initials.
- Returns:
Formatted name.
- GooseBib.reformat.autoformat_names(names: str, sep: str = ' ') str
Automatically format names. E.g.:
de Geus, Thomas Willem Jan and Wyart, Matthieu de Geus, T.W.J. and Wyart, M.
- Parameters:
name – Names formatted as “lastname, firstname and lastname, firstname …”.
sep – Separator to place between initials.
- Returns:
Formatted names.
- GooseBib.reformat.name2key(name: str) str
Return last name as ‘citation key’.
This returns the last name:
Without accents.
Without spaces.
Starting with a capital letter.
- Parameters:
name – The name formatted as “Lastname, firstname secondname …”.
- Returns:
Formatted name.
- GooseBib.reformat.number_range(string: str) str
Format page range. This replaces “-” with “–“.
- Parameters:
string – A string.
- Returns:
The reformatted string.
- GooseBib.reformat.protect_math(text: str) str
Protect math mode.
- Parameters:
text – Some text.
- Returns:
Formatted text.
- GooseBib.reformat.remove_wrapping_braces(string: str) str
Remove wrapping “{…}”. :param string: A string. :return: The reformatted string.
- GooseBib.reformat.rm_accents(text: str) str
Remove accents.
- Parameters:
text – Some text.
- Returns:
Formatted text.
- GooseBib.reformat.rm_unicode(text: str) str
Remove unicode.
- Parameters:
text – Some text.
- Returns:
Formatted text.
GooseBib.recognise
- GooseBib.recognise.arxivid() str
- GooseBib.recognise.arxivid(*args: str) str
- GooseBib.recognise.arxivid(entry: dict) str
Try to match a arxiv-id, return the first match.
- Parameters:
args – Arguments to check.
- Returns:
The first match (stripped for url etc.).
- GooseBib.recognise.doi() str
- GooseBib.recognise.doi(*args: str) str
- GooseBib.recognise.doi(entry: dict) str
Try to match a doi, return the first match.
- Parameters:
args – Arguments to check,
- Returns:
The first match (stripped for url etc.).
GooseBib.journals
Construct/apply journal database.
In GooseBib, a journal database is stored as a YAML-file, for example:
- abbreviation: Proc. Natl. Acad. Sci.
acronym: PNAS
name: Proceedings of the National Academy of Sciences
variations:
- Proc. Nat. Acad. Sci.
- abbreviation: Phys. Rev. Lett.
acronym: PRL
name: Physical Review Letters
Note that the minimal requirement is to store the name, the abbreviation, acronym,
and variations are optional.
- class GooseBib.journals.Journal(name: str = None, abbreviation: str = None, acronym: str = None, variations: list[str] = None, index: list[int] = None, abbreviation_is_acronym: bool = False)
Simple class to store journal info.
- Parameters:
name – Journal’s name.
abbreviation – Abbreviation of the journal’s name (optional).
acronym – Acronym of the journal’s name (optional).
variations – Known variations used for the journal’s name, abbreviation, etc. (optional).
index – For internal use only. Construction can be simplified by specifying name, abbreviation, acronym, and variations as a single list using the
variationsoption (name,abbreviation, andacronymshould be left blank in that case).indexthen indicates the indices in this list corresponding to[name, abbreviation, acronym](the same index may be use multiple times if there is no abbreviation or acronym and for example the name is used instead).abbreviation_is_acronym – Use abbreviation as acronym if no acronym is specified.
- add_variation(arg: str)
Add a variation (does not change the name, abbreviation, or acronym).
- Parameters:
arg – Name.
- add_variations(arg: list[str])
Add a list of variations (does not change the name, abbreviation, or acronym).
- Parameters:
arg – Names.
- set_abbreviation(arg: str, also_acronym: bool = False)
(Over)write the abbreviation of the journal’s name.
- Parameters:
arg – Name.
also_acronym – Use also as acronym.
- set_acronym(arg: str)
(Over)write the acronym of the journal’s name.
- Parameters:
arg – Name.
- set_name(arg)
(Over)write the journal’s name.
- Parameters:
arg – Name.
- unique()
In place operation. Removes duplicates from list of stored name, abbreviation, acronym, variations. Does not change the output in any way.
- class GooseBib.journals.JournalList(data: dict[Journal] | list[Journal] = None)
Store journal database as list of journals, which allows efficient handling.
- Parameters:
data – List of journals. A
dictinput is interpreted as[value for (key, value) in sorted(data.items())].
- map2abbreviation(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with abbreviation replaced where a positive match was found.
- map2acronym(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with acronym replaced where a positive match was found.
- map2name(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with official name replaced where a positive match was found.
- tolist() list[dict]
Return as list of dictionaries. Same as:
ret = [] for i in data: ret += [dict(i)]
- unique(force_first=True) bool
Merge journal that have a common entry. Note that this applies changes in-place.
- Parameters:
force_first – Add the second, third, … duplicates only as name variations, do not change name, abbreviation, and acronym of the first duplicate. If
False, the name, abbreviation, and acronym of all duplicates are considered if they are not present in the first duplicate.- Returns:
Trueif the list was unique, i.e. if no changes were applied.
- GooseBib.journals.download_from_jabref(*domains) dict[Journal]
Generate a database from JabRef.
- Parameters:
domains –
Domain(s) to include in the database. Choose from:
"acs""ams""annee-philologique""dainst""entrez""general""geology_physics""geology_physics_variations""ieee""ieee_strings""lifescience""mathematics""mechanical""medicus""meteorology""sociology""webofscience-dots""webofscience"
- Returns:
A dictionary of
Journal. The keys of the dictionary are the journal’s names extracted from JabRef’s database.
- GooseBib.journals.dump(filepath: str, data: dict[Journal] | list[Journal] | JournalList, force: bool = False)
Dump database to YAML-file (see
GooseBib.journals).- Parameters:
filepath – Filename.
data – The database.
force – Do not prompt to overwrite file.
- GooseBib.journals.generate_default(domain: str) dict[Journal]
Generate (an up-to-date) version of one of the default databases shipped in GooseBib.
- Parameters:
domain –
Domain. Choose from:
"physics""mechanics""PNAS""PNAS-USA"
- Returns:
A dictionary of
Journal. The keys of the dictionary are the journal’s names.
- GooseBib.journals.get_configdir() str
Return the config directory.
- GooseBib.journals.load(*args: str) JournalList
Load database(s) from default locations. Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.
To add custom databases, store a YAML-file to:
dirname = GooseBib.get_configdir() stylename = "mystyle" filepath = os.path.join(dirname, f"{stylename}.yaml")
See
GooseBib.journalsfor structure of the YAML-file.Note
Files stored in
get_configdir()are prioritised over default files shipped with the library.- Parameters:
args –
"physics","mechanics","PNAS","PNAS-USA", …- Returns:
- GooseBib.journals.read(filepath: str, abbreviation_is_acronym: bool = True) JournalList
Load a journal-database from a YAML-file (
GooseBib.journals).Tip
To construct a
JournalListbased on several YAML-files, proceed as follows:db = GooseBib.journals.read("/path/to/first/name.yaml") db += GooseBib.journals.read("/path/to/seconds/name.yaml") db += GooseBib.journals.read("/path/to/third/name.yaml") # ...
Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.
- Parameters:
filepath – File-path.
abbreviation_is_acronym – Use abbreviation for missing acronym (otherwise title is used).
- Returns:
- GooseBib.journals.update_default()
Update the default databases shipped with GooseBib. This updates the YAML-files (see
GooseBib.journals) in the library directory.Tip
To update the YAML-files in the repository, simple run this file from the repository, as its main is adapted for this.