Ticket #64 (closed enhancement: fixed)

Opened 3 years ago

Last modified 2 years ago

Improve Header interface; fold CardList functionality into it

Reported by: stsci_embray Owned by: stsci_embray
Priority: high Milestone: 3.1.0
Version: dev Severity: normal
Keywords: Cc:

Description (last modified by stsci_embray) (diff)

Phil pointed out that the attribute Header.ascard is not a terribly useful name (I only use it because it's what was already there). There are at least two problems with it:

  • Who knows what a "card" is anymore? Although it's a useful abstraction at times, a user looking for a way to view a header as a list of keyword,value,comment triplets would not necessarily know to go looking for "cards".
  • It's just not very gramatically accurate: It should at least be Header.ascards or Header.ascardlist since a header is composed of multiple cards.

Phil proposed renaming it Header.aslist or something of the like. I'd go a bit further and say that the Header interface needs to be reworked.

For one, the existing CardList interface should be an entirely internal detail, and should probably be folded into the Header class. The two classes already share quite a bit of functionality (for example, they both act a little bit like a sequence and a little bit like a mapping, though CardList is more directly sequence-like since it currently subclasses list).

A FITS header is essentially an ordered mapping and thus should have an interface similar to Python's built in OrderedDict, though with some special functionality required by FITS headers:

  • It must support insertion of new key/value pairs at arbitrary positions in the ordering.
  • Similarly, it should support other sequence-like operations such as concatenation/extend().
  • It must support duplicate keywords. Although duplicate keywords (other than commentary keywords) are a bad idea, they are theoretically supported by the FITS standard. They are also required for record-valued keyword card support.
  • It must also be able to support an arbitrary number of COMMENT and HISTORY items as well as blank items in the sequence.
  • It must allow position-based indexing as well as keyword-based indexing.
    • When position-based indexing is used, the position refers to an actual 'card' in the sequence of 80-byte blocks. This can include blank and commentary cards. This interface would allow direct manipulation of the header. For now, position-based indexing will just return the value of the n-th card, like with keyword-based. Keeping the interface more consistent I think would be less confusing in the long term.
    • When keyword-based indexing is used the value of that keyword is returned. There are some additional requirements here:
      • If there are duplicate cards with the same keyword, the value of the first card with that keyword is returned and a warning is issued explaining that there are duplicate keywords and one should be removed.
      • It should also allow a [keyword,n] style of indexing where n refers to the n-th card with the same keyword. This allows the values of duplicate keywords to be accessed. Though this is a rare case, it should be easy enough to support.
      • In the case of the 'COMMENT' and 'HISTORY' keywords it should return lists of all the COMMENT and HISTORY values in the order that they appear in the header. ['COMMENT',n] and ['HISTORY',n] indexing would still return the n-th COMMENT or HISTORY values respectively.
  • Currently, indexing a Header object returns the value of the Card associated with that index. To access the Card object itself, which contains the keyword, value, and comment values, one must go through Header.ascard to access the underlying CardList object. Since I want to make CardList an internal detail, there still should be a way to access things like the comment. I would propose having Header return proxies for the values that have the .keyword, .value, and .comment attributes associated with the card that the value came from. By using proxies, the returned values can still be used as ints, floats, strings, etc., but will have these additional attributes attached to them. After further consideration, I think that the proxy idea is to frought with perils. I think that for most cases it would work, but it's not worth the effort for the corner cases where it gets complicated. Also, only one of the three attached attributes are useful: .key isn't that useful, since in most cases values are looked up by keyword, so you already know the key. .value would only be a reference to the actual value being proxied, and so that leaves .comment as the only attribute that adds new information. A simpler approach would be to have a Header.comments attribute that can return card comments given either an integer index or keyword.
  • CONTINUE cards must be completely abstracted away from the user (as they are currently). A card composed of multiple CONTINUE cards must be represented to the user as a single item in the card sequence.
  • Howard pointed out a while back that Headers should support intelligent extend/concatenate functionality. For example, to support adding keywords from a primary header to an extension header or vice-versa. Duplicate and standard keywords should be stripped from the header on the RHS of the operation.

Change History

comment:1 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:2 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:3 Changed 3 years ago by stsci_embray

  • Description modified (diff)

Updated the description with some thoughts on how to handle comments.

comment:4 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:5 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:6 Changed 3 years ago by stsci_embray

  • Status changed from new to assigned

Been working on this in the header refactoring branch for some time.

comment:7 Changed 3 years ago by stsci_embray

[In r1181] Refs #64. This restores basic support for record-valued keyword cards. Gone is the separate RecordValuedKeywordCardclass--instead support is built right into the Card class.

From an architectural perspective I kind of liked how there were separate HierarchCard and RecordValuedKeywordCard classes. It's a nice idea to have an interface whereby it's possible for extension classes to add support for parsing arbitrary custom card formats.

However, in practice it never worked that well. Hieararch cards worked pretty well, but RecordValuedKeywordCard always required special support in the Header class, so there was never really much of an abstraction there after all. If somebody actually asked for the ability to create extensions to an abstract Card base class, I might try to come up with a way to make that work. But since I doubt anyone actually cares to have that, it's ultimately not worth the effort.

As it is, adding support for RVKCs back into the Card class turned out to be a relatively simple diff. Though slightly kludgy, it's still far less code than having the separate RVKC class, and ultimately a bit simpler I think.

There does seem to be a slight performance degradation with this change, which I would like to get to the bottom of at some point. I think it mostly just comes from the fact that ever card needs to be checked whether or not it's a RVKC (which was previously the case as well). But I still suspect there are opportunities for speedup.

comment:8 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:9 Changed 3 years ago by stsci_embray

  • Description modified (diff)

comment:10 Changed 3 years ago by stsci_embray

[In r1197] Refs #64.

  • Moved most of the details of reading headers from files and writing them to files into the Header class where they really belong.
  • Reimplemented Header.toTxtFile in terms of the new Header.fromfile classmethod and marked it as deprecated (Header.fromTxtFile hasn't been reworked yet, since it needs to work a little differently).
  • Added a _CardAccessor helper class taht works the same way as the _HeaderComments class, but is a little more generic in that it just returns Card objects. _HeaderComments is reimplemented as a subclass of _CardAccessor that takes the returned Cards and returns their .comment attributes.
  • Added a _HeaderCommentaryCards _CardAccessor for viewing lists of commentary cards in a friendly manner.
  • Added several more Header unit tests.

comment:11 Changed 3 years ago by stsci_embray

"For #64, adds improved header concatenation that strips HDU-specific keywords by default (though the strip functionality still needs to be improved to work with more HDU types).

comment:12 Changed 3 years ago by stsci_embray

[In r1205] Fixed up fromTxtFile a bit, and a few bugs along the way. Also added a simple test for it. Still haven't set it as deprecated as I'm not sure how to best go about offering people a replacement. header.extend(Header.fromfile(...)) would mostly replace it, but with slightly different behavior. The default behavior of fromTxtFile is partly sensible: It updates the values of keywords already in the header, and adds any new keywords, (but no duplicates). But it has other strange behaviors that I'm not sure I like...

comment:13 Changed 3 years ago by stsci_embray

As a followup to the last comment: the thing I don't like about fromTxtFile() as implemented is that if the input header has a SIMPLE keyword, that is, it's a primary header and the header being modified has an EXTENSION keyword, the EXTENSION keyword is replaced with the SIMPLE keyword, effectively converting it to a Primary HDU. This breaks things because nothing else is done to ensure that the resulting header is valid for a Primary HDU.

The opposite is true too: A SIMPLE keyword in the existing header is replaced with an EXTENSION keyword from the input header. A more reliable and straightforward approach would be for it to work like header.extend() and header.copy() in that only keywords that don't have special meaning for the HDU type are copied over. This makes for a cleaner merge in most cases.

For now I will leave fromTxtFile() with its existing functionality, but will eventually recommend that it not be used, in favor of a different approach. Though what that approach ends up being depends in part on the needs of people who actually use this method (assuming that they exist).

comment:14 Changed 2 years ago by stsci_embray

Merged all the header-refactoring work into trunk in r1307. Still some work to be done here, primarily on documentation: The changelog should give a brief overview of the major header changes. However, there should also be small guide that gives a complete overview of the changes, possibly in a separate file. The changelog should reference this guide.

comment:15 Changed 2 years ago by stsci_embray

  • Status changed from assigned to closed
  • Resolution set to fixed

r1622 updated the changelog and added the aforementioned transition guide. With that, I'm calling this done for all intents and purposes. Might continue to tweak the guide a bit, and there will undoubtedly still be further tweaks to the header interface. But those can be taken as new issues.

Note: See TracTickets for help on using tickets.