Python module
Overview
BibTeX
Clean a BibTeX database. |
|
Remove unnecessary fields from BibTex database. |
|
Merge items that have the same keys from BibTex database. |
|
Get entry's digital identifiers. |
|
Format the journal entry for arXiv preprints. |
|
Check online databases (can be slow!). |
Automatic formatting
Remove wrapping "{...}". |
|
Abbreviate first name(s) to initials. |
|
Return last name as 'citation key'. |
Automatic information extraction
|
Try to match a doi, return the first match. |
|
Try to match a arxiv-id, return the first match. |
Official journal names
|
Load a journal-database from a YAML-file ( |
|
Load database(s) from default locations. |
Return the config directory. |
|
Update the default databases shipped with GooseBib. |
|
Generate (an up-to-date) version of one of the default databases shipped in GooseBib. |
|
|
Generate a database from JabRef. |
|
Dump database to YAML-file (see |
|
Simple class to store journal info. |
|
Store journal database as list of journals, which allows efficient handling. |
GooseBib.bibtex
For BibTeX files:
Automatic formatting.
Check if up-to-date.
Compare.
- GooseBib.bibtex.GbibClean(cli_args: list[str] = None)
Command-line tool to clean a BibTeX database, see
--help
.
- GooseBib.bibtex.GbibDiscover()
Command-line tool to compare a BibTeX database for online databases, see
--help
.
- GooseBib.bibtex.GbibShowAuthorRename()
Show author rename if
GbibClean
is applied, see--help
.
- class GooseBib.bibtex.MyBibTexParser(data=None, **args)
Overload of
bibtexparser.bparser.BibTexParser
adding an extra internal field"DISPLAY_ORDER"
to preserve the order of each item.- parse(bibtex_str, *args, **kwargs)
Parse a BibTeX string into an object
- Parameters:
bibtex_str – BibTeX string
partial – If True, print errors only on parsing failures. If False, an exception is raised.
- Type:
str or unicode
- Type:
boolean
- Returns:
bibliographic database
- Return type:
BibDatabase
- class GooseBib.bibtex.MyBibTexWriter(*args, **kwargs)
Overload of
bibtexparser.bwriter.BibTexWriter
acting on an extra internal field"DISPLAY_ORDER"
to preserve the order of each item. In addition, there is an extra parametersort_entries = False
that controls if entries will by sorted based on the citation-key.- write(*args, **kwargs)
Converts a bibliographic database to a BibTeX-formatted string.
- Parameters:
bib_database (BibDatabase) – bibliographic database to be converted to a BibTeX string
- Returns:
BibTeX-formatted string
- Return type:
str or unicode
- GooseBib.bibtex.abbreviate_journal(data: list[dict], journal_type: str = 'abbreviation', journal_database: list[str] = ['pnas', 'physics', 'mechanics']) list[dict]
- GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.abbreviate_journal(data: str, *args, **kwargs)
- GooseBib.bibtex.abbreviate_journal(data: IOBase, *args, **kwargs)
Abbreviate journals based on a standard library.
- Parameters:
data – The BibTeX database.
journal_type – Rename journal to its
"title"
,"abbreviation"
, or"acronym"
.journal_database – Database(s) with official journal names/abbreviations/acronyms to use.
- GooseBib.bibtex.clean(data: list[dict], sep_name: str = '', sep_journal: str = '', title: bool = True, protect_math: bool = True, rm_unicode: bool = True, no_abbreviate: list[str] = [], select_fields: bool = True) list[dict]
- GooseBib.bibtex.clean(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.clean(data: str, *args, **kwargs) str
- GooseBib.bibtex.clean(data: IOBase, *args, **kwargs) str
Clean a BibTeX database.
Remove unnecessary fields (see
GooseBib.bibtex.select()
).Unify the formatting of authors (see
GooseBib.reformat.abbreviate_firstname()
).Ensure proper math formatting (see
GooseBib.reformat.protect_math()
).Convert unicode to TeX (see
GooseBib.reformat.rm_unicode()
).Fill digital identifier if it is not present but can be recognised from a different field, (see
GooseBib.bibtex.get_identifiers()
).
- Parameters:
data – The BibTeX database.
sep_name – Separator for name initials (e.g. “”, “ “).
sep_journal – Separator for journal abbreviations (e.g. “”, “ “).
title – Include title.
protect_math – Apply fix in
GooseBib.reformat.protect_math()
.rm_unicode – Apply fix in
GooseBib.reformat.rm_unicode()
.no_abbreviate – List of entries for which to skip author abbreviation.
select_fields – Apply
selection()
to the output.
- GooseBib.bibtex.clever_merge(data: list[dict], merge: bool = True) list[dict]
- GooseBib.bibtex.clever_merge(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.clever_merge(data: str, *args, **kwargs)
- GooseBib.bibtex.clever_merge(data: IOBase, *args, **kwargs)
Try to merge the same entries.
- Parameters:
data – The BibTeX database.
- GooseBib.bibtex.dbsearch_arxiv(data: list[dict], silent: bool = False) dict
- GooseBib.bibtex.dbsearch_arxiv(data: str, *args, **kwargs) BibDatabase
Check online databases (can be slow!).
- Parameters:
silent – Hide status bar.
- Returns:
Dictionary with discovered items.
- GooseBib.bibtex.format_journal_arxiv(data: list[dict], fmt: str, journal_database: list[str] = ['arxiv']) list[dict]
- GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs) BibDatabase
- GooseBib.bibtex.format_journal_arxiv(data: str, *args, **kwargs)
- GooseBib.bibtex.format_journal_arxiv(data: IOBase, *args, **kwargs)
Format the journal entry for arXiv preprints. Use
"{}"
in the formatter to include the arxivid.- Parameters:
data – The BibTeX database.
fmt – Formatter, e.g.
"Preprint"
or"Preprint: arXiv {}"
.journal_database – Database(s) with known arXiv variants.
- GooseBib.bibtex.get_identifiers(entry: dict) dict
Get entry’s digital identifiers. The following identifiers are returned (if found):
"doi"
"arxivid"
. Note that an arxivid as doi is returned (only) as arxivid.
- Parameters:
entry – The bib-entry.
- Returns:
Dictionary with the found identifiers.
- GooseBib.bibtex.manual_merge(data: list[dict], keys: list[Tuple[str, str]]) Tuple[list[dict], dict]
- GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) Tuple[BibDatabase, dict]
- GooseBib.bibtex.manual_merge(data: str, *args, **kwargs) Tuple[str, dict]
- GooseBib.bibtex.manual_merge(data: IOBase, *args, **kwargs) Tuple[str, dict]
Merge items.
- Parameters:
data – The BibTeX database.
keys – List of keys for merge (key[1] merged into key[0]).
- Returns:
The BibTeX database. A dictionary mapping the new keys to the old keys.
- GooseBib.bibtex.parse(bibtex_str: str, aggresive: bool = False) str
Parse a BibTeX string once.
- Parameters:
aggresive – Use aggressive interpretation strategy.
- GooseBib.bibtex.read_display_order(bibtex_str: str, tabsize: int = 2) -> (<class 'dict'>, <class 'int'>)
Read order of fields of all entries.
- Parameters:
bibtex_str – A BibTeX ‘file’.
tabsize – Replace “ “ by a number of spaces.
- Returns:
A dictionary with a list of fields per key. The typical indentation.
- GooseBib.bibtex.select(data: list[dict], fields: dict[list[str]] | list[str] = None, ensure_link: bool = True, remove_url: bool = True) list[dict]
- GooseBib.bibtex.select(data: BibDatabase, *args, **kwargs) BibDatabase
- GooseBib.bibtex.select(data: str, *args, **kwargs) str
- GooseBib.bibtex.select(data: IOBase, *args, **kwargs)
Remove unnecessary fields from BibTex database.
- Parameters:
data – The BibTeX database.
fields – Fields to keep per entry type (default from
selection()
). If a list is specified all entry types are treated the same.ensure_link – Add URL to
fields
if nodoi
,arxivid
, or ,eprint
is present.remove_url – Remove URL when either a
`doi
,arxivid
, or ,eprint
is present.
- GooseBib.bibtex.selection(use_bibtexparser: bool = False) dict
List of fields to keep in a BibTeX file to get a useful list of references: fields that are not in this selection may be useful for a database, but might only cloud BibTeX output.
- Parameters:
use_bibtexparser – Add bibtexparser specific fields to select (not part of BibTeX output).
- GooseBib.bibtex.unique(data: list[dict], merge: bool = True) list[dict]
- GooseBib.bibtex.unique(data: BibDatabase, *args, **kwargs) BibDatabase
- GooseBib.bibtex.unique(data: str, *args, **kwargs) str
Merge items that have the same keys from BibTex database.
- Parameters:
data – The BibTeX database.
merge – Add fields from duplicate entries to the first entry.
- Returns:
The BibTeX database.
- GooseBib.bibtex.unique_keys(data: list[dict]) Tuple[list[dict], dict]
- GooseBib.bibtex.unique_keys(data: BibDatabase, *args, **kwargs) Tuple[BibDatabase, dict]
- GooseBib.bibtex.unique_keys(data: str, *args, **kwargs) Tuple[str, dict]
Rename keys that occur more than once in the BibTeX database.
- Parameters:
data – The BibTeX database.
- Returns:
The BibTeX database. A dictionary mapping the new keys to the old keys.
- GooseBib.bibtex.yaml_dump(filename, data, force=False)
Dump data to YAML file.
- Parameters:
filename (str) – The output filename.
data (list, dict) – The data to dump.
force (bool, optional) – Do not prompt to overwrite file.
GooseBib.reformat
Automatic formatting.
- GooseBib.reformat.abbreviate_firstname(name: str, sep: str = ' ') str
Abbreviate first name(s) to initials.
For example:
de Geus, Thomas Willem Jan -> de Geus, T. W. J.
- Parameters:
name – The name formatted as “Lastname, firstname secondname …”.
sep – Separator to place between initials.
- Returns:
Formatted name.
- GooseBib.reformat.autoformat_names(names: str, sep: str = ' ') str
Automatically format names. E.g.:
de Geus, Thomas Willem Jan and Wyart, Matthieu de Geus, T.W.J. and Wyart, M.
- Parameters:
name – Names formatted as “lastname, firstname and lastname, firstname …”.
sep – Separator to place between initials.
- Returns:
Formatted names.
- GooseBib.reformat.name2key(name: str) str
Return last name as ‘citation key’.
This returns the last name:
Without accents.
Without spaces.
Starting with a capital letter.
- Parameters:
name – The name formatted as “Lastname, firstname secondname …”.
- Returns:
Formatted name.
- GooseBib.reformat.number_range(string: str) str
Format page range. This replaces “-” with “–“.
- Parameters:
string – A string.
- Returns:
The reformatted string.
- GooseBib.reformat.protect_math(text: str) str
Protect math mode.
- Parameters:
text – Some text.
- Returns:
Formatted text.
- GooseBib.reformat.remove_wrapping_braces(string: str) str
Remove wrapping “{…}”. :param string: A string. :return: The reformatted string.
- GooseBib.reformat.rm_accents(text: str) str
Remove accents.
- Parameters:
text – Some text.
- Returns:
Formatted text.
- GooseBib.reformat.rm_unicode(text: str) str
Remove unicode.
- Parameters:
text – Some text.
- Returns:
Formatted text.
GooseBib.recognise
- GooseBib.recognise.arxivid() str
- GooseBib.recognise.arxivid(*args: str) str
- GooseBib.recognise.arxivid(entry: dict) str
Try to match a arxiv-id, return the first match.
- Parameters:
args – Arguments to check.
- Returns:
The first match (stripped for url etc.).
- GooseBib.recognise.doi() str
- GooseBib.recognise.doi(*args: str) str
- GooseBib.recognise.doi(entry: dict) str
Try to match a doi, return the first match.
- Parameters:
args – Arguments to check,
- Returns:
The first match (stripped for url etc.).
GooseBib.journals
Construct/apply journal database.
In GooseBib, a journal database is stored as a YAML-file, for example:
- abbreviation: Proc. Natl. Acad. Sci.
acronym: PNAS
name: Proceedings of the National Academy of Sciences
variations:
- Proc. Nat. Acad. Sci.
- abbreviation: Phys. Rev. Lett.
acronym: PRL
name: Physical Review Letters
Note that the minimal requirement is to store the name
, the abbreviation
, acronym
,
and variations
are optional.
- class GooseBib.journals.Journal(name: str = None, abbreviation: str = None, acronym: str = None, variations: list[str] = None, index: list[int] = None, abbreviation_is_acronym: bool = False)
Simple class to store journal info.
- Parameters:
name – Journal’s name.
abbreviation – Abbreviation of the journal’s name (optional).
acronym – Acronym of the journal’s name (optional).
variations – Known variations used for the journal’s name, abbreviation, etc. (optional).
index – For internal use only. Construction can be simplified by specifying name, abbreviation, acronym, and variations as a single list using the
variations
option (name
,abbreviation
, andacronym
should be left blank in that case).index
then indicates the indices in this list corresponding to[name, abbreviation, acronym]
(the same index may be use multiple times if there is no abbreviation or acronym and for example the name is used instead).abbreviation_is_acronym – Use abbreviation as acronym if no acronym is specified.
- add_variation(arg: str)
Add a variation (does not change the name, abbreviation, or acronym).
- Parameters:
arg – Name.
- add_variations(arg: list[str])
Add a list of variations (does not change the name, abbreviation, or acronym).
- Parameters:
arg – Names.
- set_abbreviation(arg: str, also_acronym: bool = False)
(Over)write the abbreviation of the journal’s name.
- Parameters:
arg – Name.
also_acronym – Use also as acronym.
- set_acronym(arg: str)
(Over)write the acronym of the journal’s name.
- Parameters:
arg – Name.
- set_name(arg)
(Over)write the journal’s name.
- Parameters:
arg – Name.
- unique()
In place operation. Removes duplicates from list of stored name, abbreviation, acronym, variations. Does not change the output in any way.
- class GooseBib.journals.JournalList(data: dict[Journal] | list[Journal] = None)
Store journal database as list of journals, which allows efficient handling.
- Parameters:
data – List of journals. A
dict
input is interpreted as[value for (key, value) in sorted(data.items())]
.
- map2abbreviation(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with abbreviation replaced where a positive match was found.
- map2acronym(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with acronym replaced where a positive match was found.
- map2name(journals: list[str], case_sensitive: bool = False) list[str]
Map list of names.
- Parameters:
journals – List to map.
case_sensitive – Keep case during look-up.
- Returns:
Input list with official name replaced where a positive match was found.
- tolist() list[dict]
Return as list of dictionaries. Same as:
ret = [] for i in data: ret += [dict(i)]
- unique(force_first=True) bool
Merge journal that have a common entry. Note that this applies changes in-place.
- Parameters:
force_first – Add the second, third, … duplicates only as name variations, do not change name, abbreviation, and acronym of the first duplicate. If
False
, the name, abbreviation, and acronym of all duplicates are considered if they are not present in the first duplicate.- Returns:
True
if the list was unique, i.e. if no changes were applied.
- GooseBib.journals.download_from_jabref(*domains) dict[Journal]
Generate a database from JabRef.
- Parameters:
domains –
Domain(s) to include in the database. Choose from:
"acs"
"ams"
"annee-philologique"
"dainst"
"entrez"
"general"
"geology_physics"
"geology_physics_variations"
"ieee"
"ieee_strings"
"lifescience"
"mathematics"
"mechanical"
"medicus"
"meteorology"
"sociology"
"webofscience-dots"
"webofscience"
- Returns:
A dictionary of
Journal
. The keys of the dictionary are the journal’s names extracted from JabRef’s database.
- GooseBib.journals.dump(filepath: str, data: dict[Journal] | list[Journal] | JournalList, force: bool = False)
Dump database to YAML-file (see
GooseBib.journals
).- Parameters:
filepath – Filename.
data – The database.
force – Do not prompt to overwrite file.
- GooseBib.journals.generate_default(domain: str) dict[Journal]
Generate (an up-to-date) version of one of the default databases shipped in GooseBib.
- Parameters:
domain –
Domain. Choose from:
"physics"
"mechanics"
"PNAS"
"PNAS-USA"
- Returns:
A dictionary of
Journal
. The keys of the dictionary are the journal’s names.
- GooseBib.journals.get_configdir() str
Return the config directory.
- GooseBib.journals.load(*args: str) JournalList
Load database(s) from default locations. Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.
To add custom databases, store a YAML-file to:
dirname = GooseBib.get_configdir() stylename = "mystyle" filepath = os.path.join(dirname, f"{stylename}.yaml")
See
GooseBib.journals
for structure of the YAML-file.Note
Files stored in
get_configdir()
are prioritised over default files shipped with the library.- Parameters:
args –
"physics"
,"mechanics"
,"PNAS"
,"PNAS-USA"
, …- Returns:
- GooseBib.journals.read(filepath: str, abbreviation_is_acronym: bool = True) JournalList
Load a journal-database from a YAML-file (
GooseBib.journals
).Tip
To construct a
JournalList
based on several YAML-files, proceed as follows:db = GooseBib.journals.read("/path/to/first/name.yaml") db += GooseBib.journals.read("/path/to/seconds/name.yaml") db += GooseBib.journals.read("/path/to/third/name.yaml") # ...
Note that the order matters: In case of duplicates the first found entry is leading in determining the title, abbreviation, and acronym.
- Parameters:
filepath – File-path.
abbreviation_is_acronym – Use abbreviation for missing acronym (otherwise title is used).
- Returns:
- GooseBib.journals.update_default()
Update the default databases shipped with GooseBib. This updates the YAML-files (see
GooseBib.journals
) in the library directory.Tip
To update the YAML-files in the repository, simple run this file from the repository, as its main is adapted for this.