Present: I am a second year PhD student at the Information School at University of Washington advised by Jevin D. West.
My research is funded by the Center for an Informed Public, and can be broadly classified into Computational Social Science.
I study how AI is changing the institution of science —- the emerging field I call the Science of AI-mediated Science.
I am interested in the question: Can AI do science? Can we create mini versions of AI agents that are homines silici, computational analogues of us humans, who can autonomously interact and collaborate with each other and do research like a PhD student does (minus the coffee dependency, hopefully ;)). Basically, I'm trying to figure out if an AI system could someday take over my job.
I also study scientific collaboration—because yes, I can have varied interests, and frankly, it's fascinating to study that! Teamwork and diverse perspectives fuel so much of what drives innovation in science.
I also focus on AI alignment and AI ethics. It might seem like a bit of a mixed bag, but trust me, it's all connected. How AI works, its sensemaking process, and the values embedded within it are all essential factors in developing better AI systems.
Past: I did my Master's at Language Technologies Institute (LTI) at Carnegie Mellon University in the lovely (and chilly) Pittsburgh, where I was lucky to be mentored by Bhiksha Raj, and Rita Singh on various directed research projects in the field of Speech Processing and Voice Forensics. I also completed a Master's thesis on "Characterizing Misinformed Online Health Communities." under the guidance of Kathleen M. Carley and David R. Mortensen. My Bachelor's was in Computer Science from Carnegie Mellon University in Qatar where I was mentored by Ingmar Weber and Saquib Razak for my undergraduate thesis on "Lifestyle Disease Surveillance Using Population Search Behaviour." Before starting my PhD, I also worked as a Research Associate at New York University in Abu Dhabi. This experience resulted in several important projects of my career in the field of Science of Science, in collaboration with Bedoor AlShebli. Kinga R. Makovi, Talal Rahwan, and Wifag Adnan.
Future: I have been fortunate enough to find amazing and caring advisors and collaborators who took a leap of faith to support and guide me, and collectively shaped my interests, and gave me opportunities to tackle interesting questions. I try to optimize for long-term collaborations and mentorship. I also often think about the implications of my research, and hope to eventually be able make some difference in the world -- the world where impact is not just measured by citations (though, hey, I wouldn't say no to a few of those either!).
I'm always open to collaboration and discussions. So don't hesitate to reach out if you would like to collaborate with me, or just talk about life. Prospective PhD applicants, especially those from underrepresented backgrounds, are more than welcome to email me about questions related to the application process or PhD experience. :)
Shahan Ali Memon,
Soham De,
Riyan Mujtaba,
Sungha Kang,
Nic Weber,
Bedoor AlShebli,
Jaime Snyder,
Jevin D. West
In Preparation
In this study, we investigate the ways in which AI systems interpret identity and competence markers in professional Curricula Vitae (CVs) and transform structured documents into photo-realistic headshots. Our objective is to qualitatively analyze how AI systems, such as ChatGPT, process identity and competence markers such as gender, experience, job titles, education, skills, and translate them into fully realized portraits. We find that the AI system frequently generates masculine representations, regardless of the CV holder's actual gender, and often resort to stereotypical associations, such as academics "wear glasses" or appear "kind and approachable." Through this visual elicitation exercise, we examine what traits AI selects or emphasizes in its generated portraits and how biases propagate between models in AI systems.
Shahan Ali Memon,
James Koppel,
Tom Hope,
Jevin D. West
In Preparation
Quoted by Nature News
"AI Scientist" is a multi-agent AI system built using large language models intended to automate scientific research. The preprint associated with this system has recently attracted significant attention across social and news media, positioning it among many similar AI-driven systems under development. In this commentary, we critically examine the AI Scientist, focusing on issues such as bias, plagiarism, and hallucinations in its generated papers and reviews. Additionally, we address the system's methodological shortcomings. We further examine key technological challenges with the recent academic efforts in this area, highlighting important social, ethical and epistemological implications of such efforts as well as provide an agenda for future research and communication in this area.
Shahan Ali Memon,
Jevin D. West
Accepted for presentation at ICSSI (2024)
In Preparation
Slides (ICSSI)
The various processes of scientific research are increasingly being influenced by artificial intelligence (AI). Innovative tools are emerging to assist scholars in tasks such as hypothesis generation, literature review, data collection, experimentation, and writing. As these AI-driven technologies are integrated into research practices, they are fundamentally transforming the nature of scientific inquiry and the knowledge it produces. This evolving landscape has given rise to a new field of study known as the "Science of AI-mediated Science," which examines the impact of AI on the methodology and outcomes of scientific research.
Yueran Duan,
Shahan Ali Memon,
Bedoor AlShebli*,
Qing Guan,
Petter Holme*,
Talal Rahwan
Under Review
Preprint
Twitter thread
Postdoctoral training is commonly recognized as a challenging and intense period in one's career, where many talented PhD graduates encounter unforeseen circumstances that can impact their academic aspirations. Utilizing a specialized data set encompassing academic publications and career trajectories, we aim to comprehensively map out the varied outcomes of postdoctoral experiences.
Shahan Ali Memon,
Kinga Makovi*,
Bedoor AlShebli*
Nature Human Behavior (to appear; IF:22.3)
Accepted for presentation at ICSSI (2023) **Best Paper Award**
Accepted for presentation at Frontiers of Network Science Workshop (2022 & 2023)
Accepted for presentation at IC2S2 (2022)
Recording (IC2S2)
Slides (IC2S2)
Slides (ICSSI)
Preprint
Code
Retracting academic papers is a fundamental tool of quality control when the validity of papers or the integrity of authors is questioned post-publication. While retractions do not eliminate papers from the record, they have far-reaching consequences for retracted authors and their careers, serving as a visible and permanent signal of potential transgressions. Previous studies have highlighted the adverse effects of retractions on citation counts and coauthors' citations; however, the broader impacts beyond these have not been fully explored. We address this gap leveraging Retraction Watch, the most extensive data set on retractions and link it to Microsoft Academic Graph, a comprehensive data set of scientific publications and their citation networks, and Altmetric that monitors online attention to scientific output. Our investigation focuses on: 1) the likelihood of authors exiting scientific publishing following a retraction, and 2) the evolution of collaboration networks among authors who continue publishing after a retraction. Our empirical analysis reveals that retracted authors, particularly those with less experience, tend to leave scientific publishing in the aftermath of retraction, especially if their retractions attract widespread attention. We also uncover that retracted authors who remain active in publishing maintain and establish more collaborations compared to their similar non-retracted counterparts. Nevertheless, retracted authors generally retain less senior and less productive coauthors, but gain more impactful coauthors post-retraction. Taken together, notwithstanding the indispensable role of retractions in upholding the integrity of the academic community, our findings shed light on the disproportionate impact that retractions impose on early-career authors.
Bedoor AlShebli*,
Shahan Ali Memon,
James A. Evans,
Talal Rahwan*
Nature Scientific Reports (2024; IF:3.80)
Accepted for presentation at ICSSI (2023)
Accepted as a poster at IC2S2 (2023)
Paper
Slides (ICSSI)
Poster (IC2S2)
Twitter thread
Code
Fierce geopolitical tensions between China and the U.S. have led to policies that discourage cross-border collaboration and migration in the field of Artificial Intelligence. Despite this, we analyze a dataset of 363,000 AI scientists and 5,400,000 papers showing that China and the U.S. have been leading the field since 2000 in terms of impact, novelty, productivity, and workforce. Significant bidirectional migration is observed with both countries being primary destinations for one another. Collaborations between the two countries while increasing still represent a small fraction of their total productivity. Yet, we show that the two countries produce more impactful research when collaborating together, suggesting that promoting cross-border collaboration and migration could benefit the field of AI.
Shahan Ali Memon,
Jevin D. West
Center for an Informed Public Rapid Research Blog
Presented at the Center for an Informed Public Meeting
Selected as a required reading for IMT 589: "Problematic Information" class at University of Washington.
Commentary
Preprint
Slides (CIP Meeting)
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.
Shahan Ali Memon,
Saquib Razak
Ingmar Weber
Journal of Medical Internet Research (2020; IF:7.08)
Accepted for presentation at the Population Association of America (PAA 2021)
Accepted for presentation at CMU Qatar Meeting of the Minds (MoM 2017)
Paper
Slides (PAA)
Poster (MoM)
Code
Recording (PAA)
Slides for Google Trends Denormalization
As the process of producing official health statistics for lifestyle diseases is slow, researchers have explored using Web search data as a proxy for lifestyle disease surveillance. Existing studies, however, are prone to at least one of the following issues: ad-hoc keyword selection, overfitting, insufficient predictive evaluation, lack of generalization, and failure to compare against trivial baselines. The aims of this study were to (1) employ a corrective approach improving previous methods; (2) study the key limitations in using Google Trends for lifestyle disease surveillance; and (3) test the generalizability of our methodology to other countries beyond the United States.
Shahan Ali Memon*,
Wenbo Zhao*,
Bhiksha Raj,
Rita Singh
International Joint Conference on Neural Networks (IJCNN 2019)
Paper
Slides
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree.
Shahan Ali Memon,
Kathleen M. Carley
International Workshop on Mining Actionable Insights from Social Networks (MAISoN) (in conj. with CIKM 2020)
Funded by Center for Machine Learning and Health (CMLH)
Paper
Slides
Data
Codebook
Recording
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are two-fold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities.
Shahan Ali Memon,
Aman Tyagi,
David R. Mortensen,
Kathleen M. Carley
International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS 2020)
Funded by Center for Machine Learning and Health (CMLH)
Paper
Slides
Public health practitioners and policy makers grapple with the challenge of devising effective message-based interventions for debunking public health misinformation in cyber communities. Framing and personalization of the message is one of the key features for devising a persuasive messaging strategy. For an effective health communication, it is imperative to focus on preference based framing where the preferences of the target sub-community are taken into consideration. To achieve that, it is important to understand and hence characterize the target sub-communities in terms of their social interactions. In the context of health-related misinformation, vaccination remains to be the most prevalent topic of discord. Hence, in this paper, we conduct a sociolinguistic analysis of the two competing vaccination communities on Twitter: pro-vaxxers or individuals who believe in the effectiveness of vaccinations, and anti-vaxxers or individuals who are opposed to vaccinations
Wenbo Zhao,
Yang Gao,
Shahan Ali Memon,
Bhiksha Raj,
Rita Singh
25th International Conference on Pattern Recognition (ICPR 2020)
Paper
Slides
In regression tasks, the data distribution is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts.
Hira Dhamyal,
Shahan Ali Memon,
Bhiksha Raj,
Rita Singh
Annual Conference of the International Speech Communication Association (INTERSPEECH 2020)
Paper
Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize to natural settings, it is crucial to work with valid prototypes -- ones that can be assumed to reliably represent natural emotions. More concretely, it is important to study emulated emotions against natural emotions in terms of their physiological, and psychological concomitants. In this paper, we present an on-scale systematic study of the differences between natural and acted vocal emotions.
Susan Dun,
Hatim Rachdi
Shahan Ali Memon,
Yelena Mejova,
Ingmar Weber
International Journal of Sport Communication (2022; IF:1.59; Q-Index:Q2)
Accepted for presentation at the 105th NCA 105th Annual Convention (2019)
Paper
We assessed the discussion around FIFA World Cup 2022 in the Twittersphere to shed some light on whether Qatar’s nation-branding and soft power attempts are reflected in public perceptions.
Navin Kumar, Isabel Corpus, Meher Hans, Nikhil Harle, Nan Yang,
Curtis McDonald, Shinpei Nakamura Sakai, Kamila A Janmohamed, Weiming Tang,
Jason L Schwartz, S Mo Jones-Jang, Koustuv Saha,
Shahan Ali Memon,
Chris Bauch,
Munmun De Chaudhury,
Orestis Papakyriakopoulos,
Joseph D Tucker, Abhay Goyal,
Aman Tyagi,
Kaveh Khoshnood,
Saad Omer
BMC Public Health (2022; IF:3.98)
Paper
The purpose of this analysis was to detail the behavior of top Reddit users, posts’ relationship with events early in the vaccine timeline, and the relationship between subreddits that shared COVID-19 vaccine posts. Research questions are as follows: What is the behavior of top Reddit users in regards to COVID-19 vaccines (RQ1)? What are Reddit posts’ relationship with events early in the vaccine timeline (RQ2)? What is the relationship between subreddits that shared COVID-19 vaccine posts (RQ3)?
Shahan Ali Memon,
Hira Dhamyal,
Oren Wright,
Daniel Justice,
Vijaykumar Palat,
William Boler,
Bhiksha Raj,
Rita Singh
arXiv
Paper
Slides
Recording
Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently established. In this paper, we study the differences in perception of emotion between genders from speech data in the wild, annotated through crowdsourcing. While we limit ourselves to a single modality (i.e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general. Our paper addresses multiple serious challenges related to making statistically viable conclusions from crowdsourced data. Overall, the contributions of this paper are two fold: a reliable novel framework for perceptual studies from crowdsourced data; and the demonstration of statistically significant differences in speech-based emotion perception between genders.
Shahan Ali Memon,
Rohith Krishnan Pillai,
Susan Dun,
Yelena Mejova,
Ingmar Weber
International ACM Conference on Web Science (WebSci 2017)
Accepted for presentation at 2016 Qatar Foundation Annual Research Conference (QFARC)
Accepted for presentation at CMU Qatar Meeting of the Minds (MoM 2017)
Paper
Poster
Abstract
Is it possible to "hack" an image of an international entity by driving international and domestic media? Here, we present an image/brand monitoring tool for a country, Qatar, which presents an overview of the contexts and references to media in which it is mentioned on social media. Tracking dozens of languages, this tool allows a global understanding of the perceptions and concerns Twitter users associate with Qatar, and which mainstream media may be driving these sentiments.
Member: AI Ethics Advisory Board, Washington Office of Superintendent of Public Instruction (OSPI), 2023-2024
Journal Reviewer: eLife, 2023
Journal Reviewer: SAGE Communication & Sport, 2023
Journal Reviewer: Elsevier's Information Processing and Management Journal, 2021
Journal Reviewer: Journal of Medical Internet Research, 2020
Conference Reviewer: IEEE International Conference on Machine Learning and Applications, 2020
Moderator: SBP-BRiMS, 2020
Peer Health Advocate: Mental Health Advocate, Student Health Services, Carnegie Mellon University, 2016-2017
Co-founder/Co-designer: CMU Qatar Mindfulness Room, 2016-2017
Board Member: Academic Review Board and University Disciplinary Committee (ARB-UDC), Carnegie Mellon University, 2015-2017