Concerns about the ACM Digital Library

This post reports an email exchange among three people—Sándor Fekete, a long-standing member of the SOCG community, Wayne Graves, the Director of Information Systems at ACM, and myself—describing some concerns about ACM’s stewardship of its Digital Library. Sándor sent me the original message, which I forwarded to Paul Beame and Donna Cappo, who in turn forwarded it to Wayne, who kindly responded directly to everyone.

For the most part, I will let the emails speak for themselves, but I do want to emphasize two points.

First, this email was motivated by an exchange a few months ago, involving Sándor, Joe Mitchell, me, and various people at ACM, about several errors and inconsistencies in the Digital Library, many of which were well over a year old, and about the fact that SOCG 2013 was not yet indexed in DBLP nine months after the conference. Those issues have all been addressed.

But I also want to emphasize Wayne’s concluding “call for action”, explicitly requesting feedback to improve the Digital Library. Independently from any decision about the future of SOCG, this openness to community feedback is fantastic news for everyone.


Sándor Fekete:

Particularly important aspects in the discussion about the affiliation between scientific communities (such as Computational Geometry) and the ACM are the validation, impact and visibility of the corresponding literature. Clearly, these issues are crucial importance for the career paths of scientists.

This has always been one of the calling cards of the ACM in the past: In principle, the ACM label has been perceived as a stamp of quality for publications, and the digital library has been promoted as enhancing impact and visibility of publications. In many cases, these promises have been strong enough to overrule other concerns – like cost, additional hassles for local organizers (which are always representatives of the community) and other aspects like copyright.

Unfortunately, experience over the years concerning the ACM DL has been below expectations.

  • The delays in having publications readily accessible have been agonizing; for example, the proceedings of SODA 2013 and 2014 are still not listed. [Actually, SODA 2013 is listed. —Jeff]
  • The overall appearance is cluttered, hard to scan, and in many cases organized in an ad-hoc manner (as visible by recurring typos that needed several rounds of fixing). For example, accessing proceedings volumes is only possible by scrolling through one huge top-level page that has no hierarchy; the search function only yields individual articles. Google from outside the DL works much better than search from inside of the DL.
  • There is no apparent effort to make (and keep) outside search engines up-to-date on ACM DL content. The database DBLP has become an invaluable resource not just for scientists, but also for hiring and tenure committees. (I see this all the time!) In many cases, it has taken an extraordinary effort (involving multiple emails and phone calls) to ensure inclusion of ACM proceedings in DBLP, undertaken by individual scientists, not ACM staff. (For example, this was the case with the proceedings of SoCG 2013.) For career decisions, committees tend to count what can readily be found.
  • The readiness of staff to pick up and respond to suggestions for improvement has been quite discouraging. (Examples upon request.)

Taking care of these matters should be guaranteed by ACM staff and electronic setup, not be left in the hands of scientists and organizers; having to work through ACM staff becomes a serious obstacle.  Thus, the ACM DL is in danger of becoming a liability, not an asset, given that there are alternatives that are easy to control and more reliable in providing service.


Wayne Graves:

The delays in having publications readily accessible have been agonizing;  for example, the proceedings of SODA 2013 and 2014 are still not listed.

The proceedings of SODA 2013 are in the Digital Library.  ACM does not handle the production of the SODA conference proceedings – it is handled by SIAM.  We recognize that SODA 2014 has fallen through the cracks and we have contacted SIAM’s production manager to work out a process for setting up an automatic feed for future years.  SODA ’14 is now is the Digital Library as well. [Actually, SODA 2014 is still not available. —Jeff]

The overall appearance is cluttered, hard to scan, and in many cases organized in an ad-hoc manner (as visible by recurring typos that needed several rounds of fixing).

We are always looking for feedback and it would be very helpful to understand the details here. Sándor, I would be more than willing to have a conversation with you on potential improvements.

For example, accessing proceedings volumes is only possible by scrolling through one huge top-level page that has no hierarchy;

There is a large proceeding listing: http://dl.acm.org/proceedings.cfm available within the DL.  This particular listing has changed overtime and has evolved to a large flat listing based on feedback.  The use case here is that users wanted this simple listing to allow for “find on page” local browser functionality.  Each of the conference series headings allows the user to view an aggregated view of the entire conference series archive, ex: SODA – http://dl.acm.org/event.cfm?id=RE382

There are a few other browseable paths to conference proceedings within the DL:

  • Through the SIG, “Publications Archive” view, in this case it would be through SIGACT: http://dl.acm.org/sig.cfm?id=SP914, then click the [Publication Archive] tab.
  • There is a Conference listing page: http://dl.acm.org/events.cfm on which SODA is listed.  Clicking through on this link will take you to an aggregated view of the SODA archive back to 1990.  It appears that the SIAM site goes back to 2009.
  • There is a “recent and upcoming” conference listing that contains the past 12 months of conferences that can be filtered by the user and upcoming conferences that can be subscribed to via RSS.

the search function only yields individual articles.

The primary function of the current search engine is to return articles.  Once on an article page there is a link to the publication in which the article appears.  There are a variety of facets that enable additional levels of navigation to filter the result set by particular publications or types of publications or year…

The search also returns back related journals or related SIGs when possible.

We are investigating a variety of new search solutions to enhance the current user experience.

Google from outside the DL works much better than search from inside of the DL.

Google does work very well.  We work directly with Google to enhance the metadata of all of the materials within the ACM DL to ensure that Google can continue to drive a significant amount of traffic to the DL.

There is no apparent effort to make (and keep) outside search engines up-to-date on ACM DL content. The database DBLP has become an invaluable resource not just for scientists, but also for hiring and tenure committees. (I see this all the time!) In many cases, it has taken an extraordinary effort (involving multiple emails and phone calls) to ensure inclusion of ACM proceedings in DBLP, undertaken by individual scientists, not ACM staff. (For example, this was the case with the proceedings of SoCG 2013.)

We do establish procedures and relationships with as many outside search engines as possible.  In some cases we allow for crawling and in others we create regular data feeds.

The readiness of staff to pick up and respond to suggestions for improvement has been quite discouraging. (Examples upon request.)

Again, it would be great to have a conversation about this — examples would be very helpful.

Taking care of these matters should be guaranteed by ACM staff and electronic setup, not be left in the hands of scientists and organizers; having to work through ACM staff becomes a serious obstacle.  Thus, the ACM DL is in danger of becoming a liability, not an asset, given that there are alternatives that are easy to control and more reliable in providing service.

Our goal with the ACM DL is to make sure all of these matters are attended to in a timely and professional manner.


Jeff Erickson:

Wayne —

Thanks for your quick and informative response.

Do I have your permission to post your response, or any part thereof, to my blog makingsocg.wordpress.com? As I told Paul and Donna, I am planning to post Sándor’s original message, but I would like to offer you the opportunity to respond publicly first.

Sandor is traveling and cannot respond quickly, but I can offer some further details on one point.

In many cases, it has taken an extraordinary effort (involving multiple emails and phone calls) to ensure inclusion of ACM proceedings in DBLP, undertaken by individual scientists, not ACM staff. (For example, this was the case with the proceedings of SoCG 2013.)

We do establish procedures and relationships with as many outside search engines as possible. In some cases we allow for crawling and in others we create regular data feeds.

These mechanisms broke down with the SOCG 2013 proceedings, which were only listed in DBLP in March of this year, thanks to Sandor’s efforts.

Joe Mitchell, Sāndor Fekete, and I contacted ACM about several mistakes in SOCG 2013 proceedings in the ACM DL (including broken DOI links, inconsistent spelling and capitalization, and even a few misspellings of “Symposium” and “Proceedings”), which Emily Eng fixed almost immediately.

However, when we brought up SOCG 2013’s omission from DBLP, her response was:

In regards to your DBLP question, you are correct, this is something you would need to address with them. ACM does not feed data to them, DBLP spiders the information from our site.

When Sándor contacted the administrators of DBLP, Michael Ley (the lead) responded immediately. Michael also reported that DBLP obtains its metadata by crawling the ACM DL; however, he said DBLP’s efforts are hampered both by limits on web crawling imposed by ACM (which other publishers such as IEEE do not impose on DBLP) and on the ad-hoc structure of the DL.

DBLP is extremely popular with the computing research community; in my experience, DBLP serves as THE public record for computer science publications (preprints, conferences, and journals) from all publishers (ACM, IEEE, SIAM, and others), with links directly to official publisher pages for each paper. This site is regularly used by hiring, tenure, promotions, and funding committees to independently verify publication records.

Given the popularity and importance of DBLP and the service that it provides both ACM and the community, we believe that ACM should be providing DBLP with a regular, direct, high-quality data feed. It should be ACM’s responsibility — not the responsibility of individual scientists like Sándor — to ensure that ACM publications are correctly indexed in DBLP in a timely manner, just as you already do with Google Scholar and (presumably) commercial indexing services such as SCOPUS and Web of Science.

Again, I welcome any further response. Please let me know whether I have your permission to post your response publicly.


Wayne Graves:

Jeff,

You can definitely use my comments in your post. I will also push forward on the DBLP concerns. There have been discussions in the past about DBLP taking advantage of the feeds from the DL and I will set up those conversations again. There may be other collaborative efforts that we, ACM, could look into with DBLP as well.

I appreciate all of the comments and concerns about the ACM DL. Please contact me with any other issues or ideas you have. In fact, along with problems and bugs, I would love to hear “wish list” functionality that could not only bring the ACM DL up to your expectations but reach beyond that and better serve the community. Can we we work some kind of call to action like this into your blog posting?

Advertisements

About Jeff Erickson

I'm a professor of computer science at the University of Illinois, Urbana-Champaign and the chair of the steering committee for the International Symposium on Computational Geometry.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Concerns about the ACM Digital Library

  1. eppstein says:

    Relatedly and FWIW I found ACM to be very responsive a year or two ago when I reported some shortcomings in the BibTeX that they were generating. I still use MathSciNet data when I can — it’s even cleaner — but they’re a lot better in this respect than most other publishers I’ve checked.

  2. Noman says:

    Is this a typo: “tend to count want can”?
    Maybe “tend to count what can”?

  3. Pingback: For Me, It’s About Open Access | Making SOCG

Comments are closed.