Python module

Overview

BibTeX

`GooseBib.bibtex.clean`(...)	Clean a BibTeX database.
`GooseBib.bibtex.select`(...)	Remove unnecessary fields from BibTex database.
`GooseBib.bibtex.unique`(...)	Merge items that have the same keys from BibTex database.
`GooseBib.bibtex.get_identifiers`(entry)	Get entry's digital identifiers.
`GooseBib.bibtex.format_journal_arxiv`(...)	Format the journal entry for arXiv preprints.
`GooseBib.bibtex.dbsearch_arxiv`()	Check online databases (can be slow!).

Automatic formatting

`GooseBib.reformat.remove_wrapping_braces`(string)	Remove wrapping "{...}".
`GooseBib.reformat.abbreviate_firstname`(name)	Abbreviate first name(s) to initials.
`GooseBib.reformat.name2key`(name)	Return last name as 'citation key'.

Automatic information extraction

`GooseBib.recognise.doi`(-> str)	Try to match a doi, return the first match.
`GooseBib.recognise.arxivid`(-> str)	Try to match a arxiv-id, return the first match.

Official journal names

`GooseBib.journals.read`(filepath[, ...])	Load a journal-database from a YAML-file (`GooseBib.journals`).
`GooseBib.journals.load`(*args)	Load database(s) from default locations.
`GooseBib.journals.get_configdir`()	Return the config directory.
`GooseBib.journals.update_default`()	Update the default databases shipped with GooseBib.
`GooseBib.journals.generate_default`(domain)	Generate (an up-to-date) version of one of the default databases shipped in GooseBib.
`GooseBib.journals.download_from_jabref`(*domains)	Generate a database from JabRef.
`GooseBib.journals.dump`(filepath, data[, force])	Dump database to YAML-file (see `GooseBib.journals`).
`GooseBib.journals.Journal`([name, ...])	Simple class to store journal info.
`GooseBib.journals.JournalList`([data])	Store journal database as list of journals, which allows efficient handling.

GooseBib.bibtex

For BibTeX files:

Automatic formatting.
Check if up-to-date.
Compare.

GooseBib.bibtex.GbibClean(cli_args: list[str] = None): Command-line tool to clean a BibTeX database, see --help.

GooseBib.bibtex.GbibDiscover(): Command-line tool to compare a BibTeX database for online databases, see --help.

GooseBib.bibtex.GbibShowAuthorRename(): Show author rename if GbibClean is applied, see --help.

class GooseBib.bibtex.MyBibTexParser(data=None, **args)

Overload of bibtexparser.bparser.BibTexParser adding an extra internal field "DISPLAY_ORDER" to preserve the order of each item.

parse(bibtex_str, *args, **kwargs)

Parse a BibTeX string into an object

Parameters:

bibtex_str – BibTeX string
partial – If True, print errors only on parsing failures. If False, an exception is raised.

Type:

str or unicode

Type:

boolean

Returns:

bibliographic database

Return type:

BibDatabase

class GooseBib.bibtex.MyBibTexWriter(*args, **kwargs)

Overload of bibtexparser.bwriter.BibTexWriter acting on an extra internal field "DISPLAY_ORDER" to preserve the order of each item. In addition, there is an extra parameter sort_entries = False that controls if entries will by sorted based on the citation-key.

write(*args, **kwargs)

Converts a bibliographic database to a BibTeX-formatted string.

Parameters:: bib_database (BibDatabase) – bibliographic database to be converted to a BibTeX string
Returns:: BibTeX-formatted string
Return type:: str or unicode

GooseBib.bibtex.abbreviate_journal(data: list[dict], journal_type: str = 'abbreviation', journal_database: list[str] = ['pnas', 'physics', 'mechanics']) → list[dict]

GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs) → BibDatabase

GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs)

GooseBib.bibtex.abbreviate_journal(data: IOBase, *args, **kwargs)

Abbreviate journals based on a standard library.

Parameters:

data – The BibTeX database.
journal_type – Rename journal to its "title", "abbreviation", or "acronym".
journal_database – Database(s) with official journal names/abbreviations/acronyms to use.

GooseBib.bibtex.clean(data: list[dict], sep_name: str = '', sep_journal: str = '', title: bool = True, protect_math: bool = True, rm_unicode: bool = True, no_abbreviate: list[str] = [], select_fields: bool = True) → list[dict]

GooseBib.bibtex.clean(data: str, *args, **kwargs) → BibDatabase

GooseBib.bibtex.clean(data: str, *args, **kwargs) → str

GooseBib.bibtex.clean(data: IOBase, *args, **kwargs) → str

Clean a BibTeX database.

Remove unnecessary fields (see GooseBib.bibtex.select()).
Unify the formatting of authors (see GooseBib.reformat.abbreviate_firstname()).
Ensure proper math formatting (see GooseBib.reformat.protect_math()).
Convert unicode to TeX (see GooseBib.reformat.rm_unicode()).
Fill digital identifier if it is not present but can be recognised from a different field, (see GooseBib.bibtex.get_identifiers()).

Parameters:

data – The BibTeX database.
sep_name – Separator for name initials (e.g. “”, “ “).
sep_journal – Separator for journal abbreviations (e.g. “”, “ “).
title – Include title.
protect_math – Apply fix in GooseBib.reformat.protect_math().
rm_unicode – Apply fix in GooseBib.reformat.rm_unicode().
no_abbreviate – List of entries for which to skip author abbreviation.
select_fields – Apply selection() to the output.

GooseBib.bibtex.clever_merge(data: list[dict], merge: bool = True) → list[dict]

GooseBib.bibtex.clever_merge(data: str, *args, **kwargs) → BibDatabase

GooseBib.bibtex.clever_merge(data: str, *args, **kwargs)

GooseBib.bibtex.clever_merge(data: IOBase, *args, **kwargs)

Try to merge the same entries.

Parameters:: data – The BibTeX database.

GooseBib.bibtex.dbsearch_arxiv(data: list[dict], silent: bool = False) → dict

GooseBib.bibtex.dbsearch_arxiv(data: str, *args, **kwargs) → BibDatabase

Check online databases (can be slow!).

Parameters:: silent – Hide status bar.
Returns:: Dictionary with discovered items.

GooseBib.bibtex.format_journal_arxiv(data: list[dict], fmt: str, journal_database: list[str] = ['arxiv']) → list[dict]

GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs) → BibDatabase

GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs)

GooseBib.bibtex.format_journal_arxiv(data: IOBase, *args, **kwargs)

Format the journal entry for arXiv preprints. Use "{}" in the formatter to include the arxivid.

Parameters:

data – The BibTeX database.
fmt – Formatter, e.g. "Preprint" or "Preprint: arXiv {}".
journal_database – Database(s) with known arXiv variants.

GooseBib.bibtex.get_identifiers(entry: dict) → dict

Get entry’s digital identifiers. The following identifiers are returned (if found):

"doi"
"arxivid". Note that an arxivid as doi is returned (only) as arxivid.

Parameters:: entry – The bib-entry.
Returns:: Dictionary with the found identifiers.

GooseBib.bibtex.manual_merge(data: list[dict], keys: list[Tuple[str, str]]) → Tuple[list[dict], dict]

GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) → Tuple[BibDatabase, dict]

GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) → Tuple[str, dict]

GooseBib.bibtex.manual_merge(data: IOBase, *args, **kwargs) → Tuple[str, dict]

Merge items.

Parameters:

data – The BibTeX database.
keys – List of keys for merge (key[1] merged into key[0]).

Returns:

The BibTeX database. A dictionary mapping the new keys to the old keys.

GooseBib.bibtex.parse(bibtex_str: str, aggresive: bool = False) → str

Parse a BibTeX string once.

Parameters:: aggresive – Use aggressive interpretation strategy.

GooseBib.bibtex.read_display_order(bibtex_str: str, tabsize: int = 2) -> (<class 'dict'>, <class 'int'>)

Read order of fields of all entries.

Parameters:

bibtex_str – A BibTeX ‘file’.
tabsize – Replace “ “ by a number of spaces.

Returns:

A dictionary with a list of fields per key. The typical indentation.

GooseBib.bibtex.select(data: list[dict], fields: dict[list[str]] | list[str] = None, ensure_link: bool = True, remove_url: bool = True) → list[dict]

GooseBib.bibtex.select(data: BibDatabase, *args, **kwargs) → BibDatabase

GooseBib.bibtex.select(data: str, *args, **kwargs) → str

GooseBib.bibtex.select(data: IOBase, *args, **kwargs)

Remove unnecessary fields from BibTex database.

Parameters:

data – The BibTeX database.
fields – Fields to keep per entry type (default from selection()). If a list is specified all entry types are treated the same.
ensure_link – Add URL to fields if no doi, arxivid, or , eprint is present.
remove_url – Remove URL when either a `doi, arxivid, or , eprint is present.

GooseBib.bibtex.selection(use_bibtexparser: bool = False) → dict

List of fields to keep in a BibTeX file to get a useful list of references: fields that are not in this selection may be useful for a database, but might only cloud BibTeX output.

Parameters:: use_bibtexparser – Add bibtexparser specific fields to select (not part of BibTeX output).

GooseBib.bibtex.unique(data: list[dict], merge: bool = True) → list[dict]

GooseBib.bibtex.unique(data: BibDatabase, *args, **kwargs) → BibDatabase

GooseBib.bibtex.unique(data: str, *args, **kwargs) → str

Merge items that have the same keys from BibTex database.

Parameters:

data – The BibTeX database.
merge – Add fields from duplicate entries to the first entry.

Returns:

The BibTeX database.

GooseBib.bibtex.unique_keys(data: list[dict]) → Tuple[list[dict], dict]

GooseBib.bibtex.unique_keys(data: BibDatabase, *args, **kwargs) → Tuple[BibDatabase, dict]

GooseBib.bibtex.unique_keys(data: str, *args, **kwargs) → Tuple[str, dict]

Rename keys that occur more than once in the BibTeX database.

Parameters:: data – The BibTeX database.
Returns:: The BibTeX database. A dictionary mapping the new keys to the old keys.

GooseBib.bibtex.yaml_dump(filename, data, force=False)

Dump data to YAML file.

Parameters:

filename (str) – The output filename.
data (list, dict) – The data to dump.
force (bool, optional) – Do not prompt to overwrite file.

GooseBib.reformat

Automatic formatting.

GooseBib.reformat.abbreviate_firstname(name: str, sep: str = ' ') → str

Abbreviate first name(s) to initials.

For example:

de Geus, Thomas Willem Jan ->
de Geus, T. W. J.

Parameters:

name – The name formatted as “Lastname, firstname secondname …”.
sep – Separator to place between initials.

Returns:

Formatted name.

GooseBib.reformat.autoformat_names(names: str, sep: str = ' ') → str

Automatically format names. E.g.:

de Geus, Thomas Willem Jan and Wyart, Matthieu
de Geus, T.W.J. and Wyart, M.

Parameters:

name – Names formatted as “lastname, firstname and lastname, firstname …”.
sep – Separator to place between initials.

Returns:

Formatted names.

GooseBib.reformat.name2key(name: str) → str

Return last name as ‘citation key’.

This returns the last name:

Without accents.
Without spaces.
Starting with a capital letter.

Parameters:: name – The name formatted as “Lastname, firstname secondname …”.
Returns:: Formatted name.

GooseBib.reformat.number_range(string: str) → str

Format page range. This replaces “-” with “–“.

Parameters:: string – A string.
Returns:: The reformatted string.

GooseBib.reformat.protect_math(text: str) → str

Protect math mode.

Parameters:: text – Some text.
Returns:: Formatted text.

GooseBib.reformat.remove_wrapping_braces(string: str) → str: Remove wrapping “{…}”. :param string: A string. :return: The reformatted string.

GooseBib.reformat.rm_accents(text: str) → str

Remove accents.

Parameters:: text – Some text.
Returns:: Formatted text.

GooseBib.reformat.rm_unicode(text: str) → str

Remove unicode.

Parameters:: text – Some text.
Returns:: Formatted text.

GooseBib.recognise

GooseBib.recognise.arxivid() → str

GooseBib.recognise.arxivid(*args: str) → str

GooseBib.recognise.arxivid(entry: dict) → str

Try to match a arxiv-id, return the first match.

Parameters:: args – Arguments to check.
Returns:: The first match (stripped for url etc.).

GooseBib.recognise.doi() → str

GooseBib.recognise.doi(*args: str) → str

GooseBib.recognise.doi(entry: dict) → str

Try to match a doi, return the first match.

Parameters:: args – Arguments to check,
Returns:: The first match (stripped for url etc.).

GooseBib.journals

Construct/apply journal database.

In GooseBib, a journal database is stored as a YAML-file, for example:

- abbreviation: Proc. Natl. Acad. Sci.
  acronym: PNAS
  name: Proceedings of the National Academy of Sciences
  variations:
  - Proc. Nat. Acad. Sci.
- abbreviation: Phys. Rev. Lett.
  acronym: PRL
  name: Physical Review Letters

Note that the minimal requirement is to store the name, the abbreviation, acronym, and variations are optional.

class GooseBib.journals.Journal(name: str = None, abbreviation: str = None, acronym: str = None, variations: list[str] = None, index: list[int] = None, abbreviation_is_acronym: bool = False)

Simple class to store journal info.

Parameters:

name – Journal’s name.
abbreviation – Abbreviation of the journal’s name (optional).
acronym – Acronym of the journal’s name (optional).
variations – Known variations used for the journal’s name, abbreviation, etc. (optional).
index – For internal use only. Construction can be simplified by specifying name, abbreviation, acronym, and variations as a single list using the variations option (name, abbreviation, and acronym should be left blank in that case). index then indicates the indices in this list corresponding to [name, abbreviation, acronym] (the same index may be use multiple times if there is no abbreviation or acronym and for example the name is used instead).
abbreviation_is_acronym – Use abbreviation as acronym if no acronym is specified.

add_variation(arg: str)

Add a variation (does not change the name, abbreviation, or acronym).

Parameters:: arg – Name.

add_variations(arg: list[str])

Add a list of variations (does not change the name, abbreviation, or acronym).

Parameters:: arg – Names.

set_abbreviation(arg: str, also_acronym: bool = False)

(Over)write the abbreviation of the journal’s name.

Parameters:

arg – Name.
also_acronym – Use also as acronym.

set_acronym(arg: str)

(Over)write the acronym of the journal’s name.

Parameters:: arg – Name.

set_name(arg)

(Over)write the journal’s name.

Parameters:: arg – Name.

unique(): In place operation. Removes duplicates from list of stored name, abbreviation, acronym, variations. Does not change the output in any way.

class GooseBib.journals.JournalList(data: dict[Journal] | list[Journal] = None)

Store journal database as list of journals, which allows efficient handling.

Parameters:: data – List of journals. A dict input is interpreted as [value for (key, value) in sorted(data.items())].

map2abbreviation(journals: list[str], case_sensitive: bool = False) → list[str]

Map list of names.

Parameters:

journals – List to map.
case_sensitive – Keep case during look-up.

Returns:

Input list with abbreviation replaced where a positive match was found.

map2acronym(journals: list[str], case_sensitive: bool = False) → list[str]

Map list of names.

Parameters:

journals – List to map.
case_sensitive – Keep case during look-up.

Returns:

Input list with acronym replaced where a positive match was found.

map2name(journals: list[str], case_sensitive: bool = False) → list[str]

Map list of names.

Parameters:

journals – List to map.
case_sensitive – Keep case during look-up.

Returns:

Input list with official name replaced where a positive match was found.

tolist() → list[dict]

Return as list of dictionaries. Same as:

ret = []

for i in data:
    ret += [dict(i)]

unique(force_first=True) → bool

Merge journal that have a common entry. Note that this applies changes in-place.

Parameters:: force_first – Add the second, third, … duplicates only as name variations, do not change name, abbreviation, and acronym of the first duplicate. If False, the name, abbreviation, and acronym of all duplicates are considered if they are not present in the first duplicate.
Returns:: True if the list was unique, i.e. if no changes were applied.

GooseBib.journals.download_from_jabref(*domains) → dict[Journal]

Generate a database from JabRef.

Parameters:

domains –

Domain(s) to include in the database. Choose from:

"acs"
"ams"
"annee-philologique"
"dainst"
"entrez"
"general"
"geology_physics"
"geology_physics_variations"
"ieee"
"ieee_strings"
"lifescience"
"mathematics"
"mechanical"
"medicus"
"meteorology"
"sociology"
"webofscience-dots"
"webofscience"

Returns:

A dictionary of Journal. The keys of the dictionary are the journal’s names extracted from JabRef’s database.

GooseBib.journals.dump(filepath: str, data: dict[Journal] | list[Journal] | JournalList, force: bool = False)

Dump database to YAML-file (see GooseBib.journals).

Parameters:

filepath – Filename.
data – The database.
force – Do not prompt to overwrite file.

GooseBib.journals.generate_default(domain: str) → dict[Journal]

Generate (an up-to-date) version of one of the default databases shipped in GooseBib.

Parameters:

domain –

Domain. Choose from:

"physics"
"mechanics"
"PNAS"
"PNAS-USA"

Returns:

A dictionary of Journal. The keys of the dictionary are the journal’s names.

GooseBib.journals.get_configdir() → str: Return the config directory.

GooseBib.journals.load(*args: str) → JournalList

Load database(s) from default locations. Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.

To add custom databases, store a YAML-file to:

dirname = GooseBib.get_configdir()
stylename = "mystyle"
filepath = os.path.join(dirname, f"{stylename}.yaml")

See GooseBib.journals for structure of the YAML-file.

Note

Files stored in get_configdir() are prioritised over default files shipped with the library.

Parameters:: args – "physics", "mechanics", "PNAS", "PNAS-USA", …
Returns:: JournalList

GooseBib.journals.read(filepath: str, abbreviation_is_acronym: bool = True) → JournalList

Load a journal-database from a YAML-file (GooseBib.journals).

Tip

To construct a JournalList based on several YAML-files, proceed as follows:

db = GooseBib.journals.read("/path/to/first/name.yaml")
db += GooseBib.journals.read("/path/to/seconds/name.yaml")
db += GooseBib.journals.read("/path/to/third/name.yaml")
# ...

Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.

Parameters:

filepath – File-path.
abbreviation_is_acronym – Use abbreviation for missing acronym (otherwise title is used).

Returns:

JournalList

GooseBib.journals.update_default(): Update the default databases shipped with GooseBib. This updates the YAML-files (see GooseBib.journals) in the library directory.

Tip

To update the YAML-files in the repository, simple run this file from the repository, as its main is adapted for this.