History of Black Writing works in the HathiTrust Digital Library

Data and contextual information for “History of Black Writing works in the HathiTrust Digital Library,” a workset curated by Maryemma Graham, Sarah Arbuthnot-Lendt, Jade Harrison, Brendan Williams-Childs, and Ashley Patricia Simmons in collaboration with the HathiTrust Research Center with generous support from the Mellon Foundation.

View the Project on GitHub htrc/scwared-history-of-black-writing

Introduction to the History of Black Writing Workset

Maryemma Graham, Jade Harrison, Ashley Simmons, Brendan Williams-Childs (HBW)

The “History of Black Writing’s Scholar-Curated Workset for Analysis, Reuse, and Dissemination” (which we will refer to as “the HBW Workset”) is a new collection of resources within the HathiTrust Digital Library system based on roughly 1,400 novels drawn from the History of Black Writing Corpus and bibliography. The History of Black Writing project (HBW) is a research center that focuses on African American literary recovery; digital preservation, curation, and access; professional development and curriculum innovation; and public humanities programming. The HBW curates an expanding database that serves as a research tool for a wide range of data-driven scholarship across the humanities. One of HBW’s main goals has been to promote diversity and inclusion in higher education and it has been doing so for 40 years. What follows is an Introduction to the HBW workset developed over a twenty-four-month period as a flagship model for the HTRC project “Scholarly Curated Worksets for Analysis, Reuse, and Dissemination” (SCWAReD), funded by the Andrew W. Mellon Foundation. There is a companion Project Report that details both the course of our work in this effort, including three “spinoff” student projects, and some closely related non-SCWAReD instantiations of the larger HBW.

The HBW Corpus and bibliography constitute the only collection of its kind currently in existence and is continually expanding through the identification of new titles and consolidation of other special collections. As part of HBW’s ambitious research agenda with several initiatives, the HBW Corpus is directly associated with Black Book Interactive Project.1 This workset Introduction is divided into four sections: (1) HBW’s history, and the formation, content, and presentations of the HBW Corpus; (2) the creation and application of the HBW Workset; (3) sample projects in progress; and (4) conclusions. The HBW Workset is an important addition both to the HathiTrust and HBW ecosystems, and a unique opportunity to apply new methods of research to a critical data source.

1: History of Black Writing: History, Formation, and Content

HBW began as a recovery and preservation project in 1983 at the University of Mississippi and has been instrumental in reintroducing thousands of Black-authored texts into the literary canon. The core of HBW’s work is its Corpus which supports a wide range of opportunities for teaching and research-related activities. As of December 2021 (as work on the collaborative SCWAReD project began) HBW Corpus and bibliography consisted of approximately 4,000 identified titles of fiction by African American writers published since 1853. While works by major authors constitute an important component, most of the titles are by lesser known and understudied authors. English is the dominant language, except for a few works by Black writers who originally published their work in other languages, especially French. A variety of genres are represented, including popular and young adult titles. The collection falls into the following periods: 1853-1900, American slavery through the post Emancipation era; 1900-1960, the Great Migration, New Negro Renaissance, and post-World War II eras; 1960-2000, Civil Rights, the Black Arts Movement, and the Black Women’s Literary Renaissance; and 2000 to the present, Twenty-First Century Fiction. The Corpus is complemented by descriptive metadata, both general and specific, with special attention to race and difference. A radical increase in the writing and publishing of Black fiction, especially by Black women, occurred after the 1970s, just as Black fiction began to attract more specialized attention, accompanied by a significant body of criticism after the 1980s. Similarly, the steady growth of a Black consumer market has continued to generate new authors and new subjects for Black fiction after 1980. The goal of the HBW is not to make distinctions between literary fiction or genre/popular fiction as commonly understood or to judge the quality of a work.

The content of the HBW bibliography shows an increase in the quantity of publishing over time and greater sophistication and experimentation with techniques of modern and modernist writing (multiple plot sequences, stream-of-consciousness, non-linear time, and diversification of characters). The significant increase in the reader-driven fiction, urban fiction, as well as mystery, fantasy, speculative fiction, women-centered fiction, and erotica, continues to draw the particular interest of scholars. The growth in quantity parallels the increase in the number of award-winning authors. Both highlight the need for more large scale-research on the works themselves and more immersive studies. While the term “Black imaginary” is identified primarily with the Black speculative movement in narrative fiction, film, and popular culture, fiction as “imagined reality” is the window into a world through a culture with a distinct legacy of Black storytelling about the past, present, and future as conceived by the authors. Black writing becomes an intervention in a social and intellectual landscape; the books themselves provide essential insights and conversations in extended form. Because most of the texts in the HBW bibliography have been inaccessible, therefore underread and understudied, HBW’s goal in creating, maintaining, and digitizing the Corpus is to a) prevent the ongoing erasure of this crucial body of knowledge; b) ensure that our understanding of human history is as comprehensive as possible; and c) emphasize that what we know is only a tiny part of what exists. Given the exclusionary practices long associated with the preservation and production of knowledge by Black authors and about the Black experience until the mid 20th century and associated with the Black Studies Movement in higher education generally, new methods must be devised to disrupt traditions and practices intended to perpetuate such exclusions, whether intentional or not. As we engage with the digital turn, this must be coupled with a process that Roopika Risam refers to as “decolonizing the digital humanities,” which HBW’s partnership with HTRC represents.2

The HBW bibliography has continued to grow in the last decade with new collections including the Essence Best Sellers List, a project developed by Jacinta Saffold,3 consisting of fiction targeting Black women readers that appeared monthly in Essence Magazine from 1994-2010; the Kathleen Bethel4 “African American Fiction Summer List,” an annual posting of Black fiction published between 1990-2018, and the Detroit Public Library’s African American Booklist,5 a theme-based annotated bibliography and monograph published annually since 1986. Major libraries, including the Schomburg Center for Research in Black Culture (New York Public Library) and the Moorland Spingarn Collection (Howard University), John Hope and Aurelia E. Franklin Library (Fisk University), and Charles L. Blockson Afro-American Collection (Temple University) are currently among the largest institutional collections with many of these titles, but are not sufficiently catalogued or described, making research more difficult for scholars who do not already know the names of individual texts or authors. Critical special collections like those of Dr. Joseph A. Pierce in San Antonio, Texas, and the Joanne Banks Collection at the University of Pennsylvania have yet to be digitized.

The creation of a full-text HBW Corpus with nuanced and flexible search capabilities, hosted by the University of Chicago’s Textual Optics Lab, marked a major leap in both accessibility and digital sophistication. This version of the Corpus includes both openly accessible items from the public domain (about 55 works as of this writing), and restricted access to an additional 1,100 works, and is available at https://textual-optics-lab.uchicago.edu/black_writing_corpus. More details are available in our Project Report.

2. The HBW Workset

Creation

Using the HBW Corpus bibliography of 3,144 volumes, HTRC staff searched publicly available HathiTrust Digital Library metadata, the Hathifiles,6 for volumes matching author-title pairs in the HBW list. The search aimed at identifying volumes available in the HT digital library, specifically through authors and titles: the automated search looked for matches for each title/author combination and was done via fuzzy-matching, thus allowing variations in spelling. The matching was conducted by first normalizing author and title names to produce better accuracy from fuzzy matching between the HathiTrust metadata and the HBW author-title list. Regular expressions were used for this pre-processing, with some common processing steps such as removing punctuation from author names, or arranging them in “Surname, First Name” format. Additionally, HathiTrust metadata inconsistently includes author birth and death dates in the name field, which was removed to match the HBW metadata.

The results produced by this fuzzy-matching process were then automatically and manually re-checked in search for false positive (items that were erroneously marked as “found”) or false negative (items that were erroneously marked as “not found”) results. The results were evaluated, and titles not found were examined to explore the possibility of false negatives due to data format, and additional preprocessing was done as needed. This verified process yielded 1,999 volumes matched (or 64%) from the initial list of 3,144 titles. More details on this result, including comparisons with the lacunae revealed by SCWAReD sister projects, are to be found in the overall SCWAReD gap-filling report.

Application

SCWAReD has allowed us to provide a cohesive and robust online source for more focused research on the Corpus. The workset is its own collection that provides a functional environment for individual and collaborative research queries on Black fiction contained within the collection. While the workset cannot substitute for close reading of African American texts, it can provide a digital infrastructure and guide a set of practices for exploring a specific set of research questions.

The workset has been carefully curated as a starting point for researchers. It also serves an important purpose in identifying gaps in the HathiTrust Digital Library’s current holdings, using the HBW bibliography as the best available scholarly measure of the elusive idea of “completeness.” By identifying those gaps and attempting to add missing works to the Library, HathiTrust and its member institutions may dramatically improve access to resources that not only document a key component of Black literary production for 275 years but also establish a crucial archive by and about Black people, their lives and events as interpreted by those within that community.

3. Sample Projects

Much of the day-to-day work creating the HBW Workset was undertaken by its student staff in collaboration with HTRC staff. In addition to contributing their expert knowledge to a project intended to create a generalized, reusable resource, these early-career scholars also had specific research questions of their own, which they hoped to answer using computational methods and, if possible, data from the Workset they were engaged in creating.

These students were most directly involved in creating the HBW workset, while also pursuing collaborative research projects with HTRC (and pursuing their studies):

In the end, unfortunately, the HBW Workset presented here proved to be less than ideal for all three projects – but for reasons that are important and instructive. In the interests of transparency and lessons learned, we describe these projects briefly here, along with some reasons why the Workset was not appropriate to a particular research question. The students’ full project narratives are included in the accompanying HBW Project Report.

Simmons’s work focused on the theme of respectability in Black writing. While acknowledging that the term respectability was coined in late twentieth-century scholarship, she sought to use its characteristics as a “retroactive historical framework” in order to identify fictional texts from earlier times that might illustrate or instantiate this modern concept. She initially sought to identify (using the Voyant text analysis suite) the most-frequent co-occurring terms in three known “respectability” texts as seeds for further searching within the HBW workset, but when this approach failed, she was left to rely instead on later scholarly judgment to assemble her corpus for further analysis. Perhaps one lesson to be learned from the course of this project is that some subtle questions of literary style and sociology simply benefit more from scholarly judgment than from computation.

Harrison’s research traced the use of Black Vernacular English in selected works of 20th-century women writers. She began by focusing on works known to have high concentrations of Black English, which led to her first impediment: most of these works were simply lacking in the HBW Workset – a well-known problem of gaps in the HathiTrust collection, which SCWAReD attempted to address with only limited success – so she was left to her own devices to scan the texts herself, then convert and clean the data before using text-analytic tools. This difficulty was compounded when her research revealed a need to separate narration from direct speech.

Williams-Childs’s work was impeded by even more significant gaps in the HathiTrust collections. It centered on comparisons of black-authored fiction in trade and vanity publishing, but he found that the collection included practically no vanity publications whatsoever (not surprising, given academic libraries as its sole source). This meant that the HBW Workset would simply not be helpful for his particular research question. An important lesson from this experience is that scholars should acknowledge the essential incompleteness of the HathiTrust collection (despite its size), and probably of all collections. This is especially crucial when studying not only marginalized authors and genres (a particular focus of the SCWAReD project), but also works marginalized in the publishing, distribution, and library selection regimes that fed the HathiTrust collection.

While we believe that each of these projects might eventually have made more use of the HBW Workset, which was being created even as the students were pursuing their own research, in every case they would have required not only more time for experimentation on the part of the researchers, but also more resources and greater intentionality for digitization (on the part of HathiTrust’s member libraries) in order to fill gaps in the base collection. We are grateful that our student researchers pushed the limits of the HBW Workset, and hope that they, as well as future researchers, will continue both to push its limits and to find new uses for it.

4. General Conclusions

Creating the HBW Workset provided an important opportunity for curating and studying focused worksets of titles from historically under-resourced textual communities. Using HTRC’s existing methods demonstrates both advantages and limitations of the process. An obvious benefit is that more diverse collections in the HathiTrust Library are highlighted and made available for non-consumptive research. At the same time, new, more specialized approaches will need to be developed to improve scholarly discovery just as descriptive metadata will need to be revised to produce enough results to answer research questions. In each of the sample projects, manual and computerized approaches were required to generate data for the research questions. This is not a new observation, as Nicole Brown, et al., confirmed in an earlier study “In Search of Zora/When Metadata Isn’t Enough (2019).7 Although various methods were used for searches, they were generally unable to extract the information and relevant titles associated with the research questions. Researchers must rely on their prior knowledge of texts for their research sample. The more diverse the collections within HTRC become, therefore, the more prepared we are to respond to “low returns.” For research questions intended to tell us more about the public and private lives of communities who remained outside out the majority culture, and especially for doing textual analysis of works produced by these communities, acknowledging the need for a mixed-methods approach, a convergence of other data models, or the development of new or modified models, is essential.

Creating the HBW Workset helped us identify other challenges associated with creating reusable worksets. Curating and studying focused worksets, especially from African American literature, requires a much more rigorous search process if we are to learn more about culture and access – as well as the ways in which certain specific networks (independent black publishing, Black magazines, book clubs, etc.) displace the conventional forms of cultural authority. Search methods into the HBW Workset do not alone allow us to unpack the complicated silences around the explosion of contemporary African American fiction in the last thirty years. This has brought in more readers and writers and provided new opportunities for research and public engagement.

The Black collections assembled for these sample graduate student projects were based on author and/or genre, but they turned out not to be the most reliable sources of information if one asks questions with limited knowledge about the authors or titles that happen to be in the collection. Collections on Black publishing companies, books read by book clubs, or reviewed in Black print media, for example, which are continually being uncovered, are necessary and must become attendant collections in need of curation. Future work with HTRC could include the creation of smaller focused worksets based on such criteria. Moreover, digital studies research is only beginning to question long held assumptions about what exists and what does not and therefore act upon this knowledge. For example, one such assumption associated with the late 19th century and early 20th century, a period referred to as “The Nadir,” coined by historian Rayford Logan,8 was that literary and cultural production was at an all-time low due to intensity of racial oppression and political repression, roughly between 1877-1915. What we have learned more recently is the opposite: there was an explosion in print culture: an independent Black press, self-publishing, and the beginning period of expanding literary and cultural expression that would reach its peak in the New Negro Movement (1915-1930). Thus, recovery work must become a cornerstone for continued development of information retrieval systems of Black literature and Black print culture more broadly. Correspondingly, Black print culture after the 1970s, especially contemporary Black fiction, is more heterogenous than ever and aims to speak more directly to the lived experiences of Black people that are multiple and contradictory. For the HBW Workset, while the term “Black fiction” remains a useful classification, it must also consider similarity and divergence as important for understanding the development of a discrete literary tradition.

Using race as a signifier requires us to abandon fixed or accepted ideas of Black life and culture. From this perspective, the Black novel, which comprises the most significant component of the HBW corpus, presents a “new world novel,”9 i.e., a wholly new form of intellectual production. Over time, these texts reorder the perception and interpretation of older terms, including racism, nationalism, and culture, through the stories they tell. Explorations into how formerly invisible histories and cultures interact with the public domain become foregrounded. This new freedom has allowed for further experimentation that raises questions about marginalization within and across communities and the reworking of identity. It gives way to investigations into the lasting effects of slavery, colonialism, immigration, and migration, among other topics. In other words, the idea of “Blackness” as an identity or a construction requires us to explore alternative literary traditions and to utilize mixed modes of representation that often privilege memory, a staple especially in contemporary Black fiction.

Taken as a whole, the HBW Workset maps a distinct, diverse, and complex culture and history, linguistic diversity, and fluid use of genre. These observations are based on traditional critical approaches to the literary corpus in recent years. Trying to get at the multimodal and multivocal responses to worlds the authors occupied and engaged, their relationships with their readers, and their imaginations requires significant and further refinements of our retrieval methods. Formal written literacy, like education, as Elizabeth McHenry reminds us, seen as a potential threat to maintaining Black subordination during enslavement, quickly became an avenue for voicing demands as much as evidence of one’s humanity, a sense of belonging to and equal participation in a nation,10 regardless of one’s legal rights. Thus, Black writing is a record of change as much as it is a guide for understanding those changes and imagining new realities.

Nevertheless, a consistent element within Black fiction is its emphasis on mobility or place, whether within the US or beyond. These “migrations” can be literal, as one moves from the rural South to the urban North or expatriates, leaving the US to places beyond its borders. It can also refer to transmigration, like the moving back and forth between global Black communities. Figurative migrations allow for constant engagement with a wide range of subject matter that derives from the African American experience and culture since writers are exploring and reexamining an untold/distorted history or recreating one that has been eradicated.

Despite its limitations, the HBW workset is an essential intervention in the critical invisibility of these texts, a crucial complement to close reading, and our guide in pointing out new avenues for data-driven research. We often make such a statement, but it is only through expanding the possibilities for sharing knowledge more broadly that we can build more extensive networks of researchers and scholars with the specialized knowledge necessary to expand the scope of this work.

Ongoing challenges

The overall value of the HBW Corpus and bibliography is their attention to the range and breadth of Black fiction, which the curated workset makes clear. Known authors coexist with little-known authors, an important feature that reduces the bias associated with human judgment. Because of the popularity of Southern literature, Black subject matter has long been a shared focus of interest by Black and white authors, especially from the South, which has often made it difficult to determine the race of many of the authors we have identified. A significant challenge is the difficulty of tracking bibliographic information due to the practice of self- and independent publishing that occurred from the beginning of Black literary production.

Possibilities

Recognizing the sustained popularity of a Black subject matter on its own terms well before Black authors gained access to mainstream publishing becomes an important reason for deeper investigations into how Black writing has become increasingly more identified with growing experimentation and innovation within writing traditions. The collection itself leads to a desire for more information about early organized activities that supported active writing and publishing. Several contemporary projects suggest possibilities with further use of the workset. Richard Jean So relied on the HBW bibliography for a selection of titles that led to his book Redlining Culture: A Data History of Racial Inequality and Postwar Fiction _(2021). The book models traditional and data-driven approaches to his conclusions and serves as an extended model for Harrison’s and Simmons’s sample projects on black women writers. Kinohi Nishikawa’s _Street Players: Black Pulp Fiction and the Making of a Literary Underground (2018) offers a unique look into a collection that has been all but forgotten, serving as a singularly important model for understanding the intersection of readers and forms of popular culture and unique publishing trends, which supports the approach that Childs-Williams takes in his study of vanity presses. Both books showcase the world of Black print culture in new ways. While these studies are more expansive than the samples provided here, increased accessibility of these works will encourage more in-depth quantitative and qualitative research into these under-discussed collections. Likewise, the American Antiquarian Society’s Summer Seminar, “Black Print, Black Activism, Black Study,” led by Derrick R. Spires and Benjamin Fagan in July, 2022, points to a critical direction for other significant intersections between black print and black activism along with black print practices themselves.

The existence of individual works of literature also begs the question of its relationship to multiple forms of print, which have been more consistently associated with activism, including the pamphlet, the newspaper, and organizational records and documents. Likewise, the focus on a single-authored text during consecutive periods of massive organizing within the black community has often been lost in favor of more direct action and necessarily collective approaches to social change. The workset, therefore, returns us to a tradition and the need to attend to a much too understudied legacy – once considered a vacuum of evidence – that has significant meaning for the contemporary period when the writing of fiction has become such a valuable and heavily marketed enterprise. Granted, it takes many more studies to yield high-impact results, since the more results we have, the more reliable our conclusions can be.

However, what is presented here is an opportunity and an approach to analyzing fiction as an expressive function, which implies the creative uses of black language in relationship to black culture and black agency. By rejecting the idea of “major” and “minor” writers and focusing on published works themselves, through the HBW workset, we can begin to (re)construct as rigorously as possible how a written tradition takes multiple shapes over time. The HBW workset offers a unique ability to foreground a frequently conflicted set of relationships that African American literature has with America’s literary traditions, even as it becomes the unacknowledged precursor for many of them. As the only workset exclusively focused on black fiction, with the tools becoming increasingly available, we can begin to address the complex structure of African American fiction as both act and artifact as it continues to evolve. If we ask questions from the vantage point of a uniquely contextualized rootedness that this literature signals, deriving as it did from practices associated with a highly verbal people and culture, any use of this workset will reveal matters not only about the texts themselves but much more.

Notes

  1. BBIP is a research community associated with the Corpus that includes instruction in DH methods and funding for teaching and publishing. BBIP partners with The Textual Optics Lab at the University of Chicago, the HBCU Library Alliance, the College Language Association, and AFRO-PWW at the University of Illinois. Since 2016, funding has been provided by NEH and ACLS. 

  2. Roopika Risam, “Decolonizing the Digital Humanities in Theory and Practice.” http://hdl.handle.net/20.500.13013/421. In: The Routledge Companion to Media Studies and Digital Humanities (Jentery Sayers, ed., 2018). Risam is also the author of New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis and Pedagogy (Northwestern, 2019) 

  3. Currently Assistant Professor of English, University of New Orleans, and a BBIP Scholar. 

  4. Bethel is African American Studies Librarian at Northwestern University with a long history of developing African American resource materials. 

  5. Under the direction Jo Anne Montdowney, Executive Director, Detroit Public Library 

  6. https://www.hathitrust.org/member-libraries/resources-for-librarians/data-resources/hathifiles/ 

  7. Nicole Brown, Ruby Mendenhall, Michael Black, Mark Van Moer, Karen Flynn, Malaika McKee, Assata Serai, Ismini Lourentzou, and ChenXiang Zhai, In Search of Zora/When Metadata Isn’t Enough: Rescuing the Experiences of Black Women Through Statistical Modeling,” Journal of Library Metadata 2019, v. 19, no. 304, pp. 141-162. 

  8. See Rayford Logan, The Negro in American Life and Thought, 1977-1901 _was first published in 1954 and reprinted/expanded as _The Betrayal of the Negro: from Rutherford B. Hayes to Woodrow Wilson 1970. 

  9. Maryemma Graham, “Negotiating Memory: Nationalism, Globalism and the New World Novel,” Transcultural Visions of Identities in Images and Texts. Ed. Wilfried Raussert. (Universitatsverlag: Heidelberg, Germany, 2008), 281-308. 

  10. Elizabeth McHenry, Forgotten Readers: Recovering the Lost History of African American Literary Societies. (Durham: Duke University Press, 2002), 23.