GenAI – Defining Success

In my last two posts, I provided a Source Analysis of LLM output and discussed the Claims Made and Supporting Evidence Used by an LLM in its output. If you are just joining the conversation, what we have identified so far is that, like all forms of media, LLM output has a purpose and an intended audience. The primary purpose of an LLM is to benefit investors and shareholders with customer or user experience as secondary. This due to something called “fiduciary responsibility” or the duty to act in the best intrest of shareholders in your product. It would not be in the shareholder’s interest to make a product people would not want to use, so there is benefit in factually correct or verifiably true output in order to have a product to sell. It is equally true that training an LLM to to promote certain viewpoints or products is beneficial to shareholders.

How Accurate is AI according to an SME?

LLMs themselves are typically more honest than most CEOs regarding the quality of generated output. While companies on Twitter and LinkedIn promise amazing resuts from their AI-Driven products, if you use the AI directly you are going to see something like this:

But What About…

You can improve your results a little if you know the model strengths. Different LLMs are better at different tasks. GPT is for a more accurate summary. Claude is the better writer. And Gemini is much better at reducing ambiguity or determining when there may be other equally true interpretations of the same text.

While each has its strengths, the truth remains that AI is just an emerging science. You can try what you like, but at the end of the day, the you should expect 80% accuracy. But what about… AI-Engineers, Prompt Engineers, RAG Pipelines, Multi-Shot prompts, Chain-of-Thought….

Prompt Engineering

This is how you get to that 80% near perfect association. Learning to speak the same language as the LLM is definitely the first step, and once you can define a clear rubric (success criteria) in your prompt, you are going to see improved results. Remember it is probabalistic, so clear concise requirements are easier to understand.

Prompt engineering does work and provides significantly improved results. But to make prompt engineering sound like something fancy for a resume, we provide a lot of fancy names to describe methods we use to improve the output. You have already done this if you have used AI to write a resume or reformat a list. So here is the simplified version of the most popular methods:

  • RAG (Retrieval-Augmented Generation) Pipelines -> You break the foundational information your tool needs into smaller vectors or pieces and store them so that they can be accessed when generating the reply. In the simplest form, this is like providing a copy of your resume to have GPT write a more accurate cover letter. In a more complicated form, it is the data equivalent of an old fashioned encyclopedia with the information broken down into books by letter so you only take the one book you need when trying to lookup something more specific.
  • [Single/Few/Multi]-Shot Prompting -> You give the prompt one or more examples of inputs and outputs to for more context. This is kind of like showing a student a worked example of the problem before asking them to do it themselves.
  • Chain-of-Thought Prompting -> This is a fancy way of saying “explain your reasoning.” It involves breaking a big task into smaller steps or if provided in one prompt, having the LLM explain its reasoning. This helps to prevent skipping steps or losing the logic.

All of these actually do work to improve your output, and you have probably already emplyed all three of these examples in your daily interactions with AI. So why is the prompt not giving 100% accuracy?!

Training

To oversimplify, an LLM is a predicitve model. Given all of the other words in the sentence, whatever data they have used for training, and what data they have collected about you, the model calculates what is the most likely next word or phrase to output. In the literature for training ML models to score students short answer questions or essays, 80% agreement (when corrected for chance agreement) is considered near perfect (Nehm & Haertig, 2012) And those metrics are still largely used today (e.g. Zhai et al., 2021). And it is not a bad measure to choose when you consider the amount of training it requires to have human raters score at that level of agreement, and sometimes the machine-to-human agreement is higher than human-to-human agreement (e.g. Maestrales et al., 2021). So if two humans only agree on the score of a single sentence 70-90% of the time, that is the data we use to train ML or AI.

Environment

If I am logged into my work account Chat GPT will have a different set of assumptions than if I am logged into my personal account. If I am automating through the API, the results are different from both. Switching across different LLMs, different models of the same LLM, or using the same model of the same LLM before and after an update can mean new prompt adjustments.

If I use the playground to build a prompt and add in the tools this agent or prompt is allowed to access, automating that prompt through the API fails to limit to those specified tools. This will yield different than anticipated results when deploying the system.

Even the same prompt in the same model in the same window will have different outputs when run multiple times. Before and during Black Friday one year, I ran the same prompt more than 100 times and recorded the failures in format or response demonstrating that traffic played a significant role in the returned output. Testing the automated scoring of students written responses to test questions, I would often find different scores for each trial. Some responses were less ambiguous and scored easily, and others were different every time.

Context Windows

I will just add more instructions and more context!

This works to a point. But there is a bit of a parabolic performance curve when it comes to the number of tokens (how many words) you are inputting. Too few and you dont have enough context, but too many and you start skipping instructions. You can max out your inputs, but the model will summarize instructions and content decided for itself which to follow.

Similarly continuing the chain-of-thought in a single conversation increases the number of tokens. You have definitely already noticed that your chat reaches a certain length and the quality decreases rapidly. This is usually after about 3 or 4 outputs.

AI-Driven QC

So lets use more AI to fix the AI!

This is an expenditure with diminishing returns. The capacity for a fix depends largely on the reason for the errors. A hiccup in the server might be easily fixed, but a lack of data or fundamental misunderstanding will not be resolved in the next pass using the same LLM. Using multiple different agents or LLMs can be very helpful when one excels over the other in different areas.

Unfortunately, it is also impossible to know where the errors are occurring. If you assume there is a random failure rate of at least 10% on each task, where is that 10% in the generation process, and where is it in the classification or editing? Are the errors on the same items? How many bad items are misclassified as good?

In one experiment I performed, I used AI to do a writing task. The instruction was to write an answer to a specific question and explain its reasoning based on the data provided or simply state that there was not enough information. Another round read what was written and either approved the response, rewrote the response, or state that there was not enough information. And then a third round of AI to decide between the first two responses if different. It was to decided which was better or state there was not enough information to write to response. Of the cases where the first and second responses were both written, but they were different, the third trial decided there was not enough information in almost 50% of those cases. There is no way of knowing which is correct without human review.

Burden of Responsibility in Use

So from a subject matter expert who has automated systems at scale… we are quite a long way from no-humans in the loop if you need high accuracy in your content. For low stakes writing tasks, automation is possible, but for more detailed or structured content, we have a long way to go. The LLM output for higher level science and math, in depth reasoning tasks, the building of learning progressions, larger scale lesson planning, it is just not quite there.

While we are replacing a lot of SMEs with automation engineers, it is really important to have this conversation. The way the engineering team feels about the one-off vibe-coded scripts cobbled together by the content team is exactly how we feel about the vibe-coded content.LLMs write. Any higher level analysis and science texts output by the tech team are about the same quality as the vibe-code the boss’ nephew is trying to use to sneak his way onto the engineering team during the summer internship.

We still need people who are able to review the output, verify the accuracy, and review high stakes generation. And now more than ever, we need real experts. As the technology gets closer and closer to generating something correctly, ,it becomes more difficult for a non-expert to identify the issues.

Where Does Responsibility Fall?

The responsibility for truth falls on the end-user, the consumer of the AI product. We use RAG models, good data, agentic systems, etc. to improve the accuracy as best we can. But it is not perfect and someone someone has to decide what is “good enough.” If I create a tool that uses an LLM to perform a task and sell this AI-driven product to you, I still sell that product with the burden of verification falling finally at your feet because it is tailored to your need.

With luck my data pipeline integration will improve your output by reducing hallucinations or automating a process you could not before, but the final review of each output still falls on the last user in the chain. I will use AI knowing it is about 80% accurate, and I will sell my product with the same accuracy warning as the original agent.

As an adult selling to adults, passing that responsibility off the consumer is a little different than when passing the end product over to children with no humans or expert reviewers. Are children then the final, end user, responsible for knowing if what the adults teach them is true? Where is this line being drawn in AI-First, tech Driven culture?

Hallucination or Training

But what are those errors? Are they randomly distributed? Are they based on the training data? Do they matter? We will talk about that in my next post.

Galicia, How I Love You

It is impossible to run out of beautiful things to see in Galicia. I have been to so many places around the world, and I have never seen any so incredible in all of my life. It is not one part of one city, it is not one place. It is Galicia. Here she is lush, and dense, and green. So. Green. There is a surreal sensation being in this place that you cannot know until you have lived it. A sense of timelessness. Hundreds and hundreds of years of people loving this same place.

A moonlight view of the Cathedral de Santiago de Compostela from the lighted fountain in el Parque Aladmeda

For over 1200 years, Pilgrims have been making the Camino to Santiago de Compostela. In summer, the Praza de Obradoiro is filled with the awed faces of travelers seeing the cathedral for the first time.

This is a land where the fairy tales are real, where the magic never left, and where las meigas still walk barefoot in the streams and waterfalls. This is the place you imagined as a child. The moon smiles down on your nights like the Chesire Cat in Wonderland. Mornings you awake with clouds as your pillow, the fog settled around sleepy mountain pueblos keeping them safe and unseen through the darkest hours of the night. The oldest, tallest trees in the forest are covered in soft mosses. They guard the clouds until mid morning. and let them go slowly in rivulets of steam. A morning prayer to the sun like little dancing forest spirits. By afternoon, the mica sparkles like diamonds in the stones and the sand. Leaving the beach at the end of a long afternoon your body shimmers like fairy fire in the late afternoon sun. Springheads in the mountains become waterfalls as rivers flow to the ocean.

Living here comes at a price.

Build what you wish, but Galicia retakes what is hers and makes it green again in time. Galicia does not belong to men. She cannot.

Palomera covered in vines.
Old paper factory reclaimed by moss and trees.
Old stairwell covered in moss and vines.

It rains for months. From October to March, there is rain. This is how she stays so green, so mossy, so strong. Those magical mountaintop springs must come from somewhere. Sunny days are so rare that they are always accompanied by the sound of laundry rustling in the breeze. Yes really, it rains so much we have synchronized laundry days, meigas, mothers, and Pilgrims alike. Maybe where you live you have wet and dry, but here we have wet, dry, and “it has been wet so long it won’t dry.”

Escaped umbrella on a windy day.
Three broken umbrellas in a trash can because galicia is cold and wet and windy and wet and windy.
Two broken umbrellas and a suitcase because Galicia is traveling and rain.

To stake your claim in any place that belongs to her, you must fight. The ground is covered in thorned, creeping, tentacles reaching further and further each day. The plants here sting and bite. Toxos grow anywhere the ortigas do not and the moras grow in between. The meaning of the galician name for rascacú is where it scratches you while walking in the forest. There are plants with burning fibers, plants that look like the ones that burn, scratching plants, plants with big needles, plants with little needles, plants with thorns, plants with big thorns, plants with small thorns, and plants with big hidden thorns. I have probably missed a few. Today I asked my friend the name of a plant in a garden that we passed. He told me if it doesn’t have thorns he does not know.

Another biting pokey plant of galicia but I dont know what it is called.
Spider webs on toxo death spines.

On the left there is a plant that is covered in spines, I dont know what it is called. Above and below are spiders trying to escape the rain on toxo thorns.

A whole field of spiders and pokey doom spikes.
Toxos, Ortigas, and Moras growing together in a stinging pile of doom. Really dont walk bearfoot in Galicia.
Yes, this is actually a thorny toxo bush with its branches stretching into stinging nettles and blackberry brambles. Galicia is not messing around. Wear shoes.

Parks even have to close during storms because no one wants to experience falling castañas. These will pop your bike tires, and do not cease to be sharp after 6 months on the ground in the rain. But they are the life blood of Galicia. There are festivals in every city here to celebrate the harvest. Magosto is a deepy rooted Galician celebration of the chestnut harvest, with some saying it is rooted in Roman or Celtric traditions such as Samhain. Castañas have been a staple here since before anyone thought to write it down. They are hardy trees, the nuts are packed with nutrients, and their harvest comes after the heavy rains. It is easy to imagine how cherished this annually occuring natural windfall would be, especially in years where drought or rain might impact other crops. The festival coincides with All Saints Day making it a celebration of harvest, death, honoring ancesters, and celebrating the coming winter months all at once.

Green castañas on the tree.
Spiky death needle nuts on a tree.
castañas without the needle covered shell roasting on a flame.
Naked death needle nuts roasting over the fire.

And this week marked the beginning of my adventures for the year! There was sun. Beautiful, glorious, soul replenishing sun. The flowers are emerging. Galicia is green and yellow and beautiful.

I have been absent from writing about my adventures for a while. I took up a contract that would allow me to save enough money to spend the next two or three years traveling, writing, and working on a secret project I am very excited about. It was too many hours each week in front of a computer, but I went on many adventures to share with you. I just need to post them. I plan to back fill those adventures from here as well as add new ones this season.

GenAI – Claims and Evidence

In my last post, GenAI – A Source Analysis, I discussed the importance of using a critical approach to assessing the product being consumed by identifying the purpose, the point of view expressed, and the audience. In this post, I am going to provide an example of why we must critically analyze the intent of the corporations producing the output we use.

I recently encountered an alarming example of a biased or misleading output related to constitutional rights and powers in the United States. At first I thought it might be a hallucination, but after resetting the chat, trying again, and continuing to discuss the reasoning, I believe this to be an example worth discussing in the context of source and purpose.

Gemini output stating the best correct response to a question about the constitutionally granted powers of the presidency are actually illegal acts or a usurption of power.

Again, I refuse to discuss AI adoption as a binary issue with a qualitative classifcation of bad or good, so I am setting the rules of engagement with high school AP Historical Thinking Skills. This time, we engage Skill 3: Evidence in Sources.

Skill 3: Claims and Evidence in Sources

Analyze arguments in primary and secondary sources.

  • 3.A Identify and describe a claim and or argument in a text-based source
  • 3.B Identify the evidence used in a source to support the argument.
  • 3.c Compare the arguments or main ideas of two sources
  • 3.D Explain how claims or evidence support, modify, or refute a source’s argument.

Skill 3: Claims and Evidence in Sources

The other day I was using Gemini to analyze a question related to the Commander and Chief powers held by the president. I asked Gemini to determine if the question had a single best correct response. The question was written, reviewed, and approved by 2 other LLMs as having A as the single best correct response. Generally speaking Gemini significantly outperforms the other two models on this task which was the reason for this test comparison.

The question to be analyzed was “Which statement demonstrates how presidential Commander in Chief powers enable policy implementation without congressional approval?” Four options were presented, but we will only discuss the two relevant choices here. The correct response was, “A. Presidents can act defensively without congressional approval in military operations.” And another presented response was a surprising source of confusion for Gemini, “D. Presidents can deploy troops internationally without congressional approval in peacekeeping missions.”

Option D was an intentionally tricky distractor, but A is the factually correct response. In this case, Gemini determined option D, peace keeping missions to be the single best response to this question.

3.A Identify and describe a claim and or argument in a text-based source

Google Gemini claims that the presidents ability to deploy troops internationally without congressional approval in peacekeeping missions is a better demonstration of the how the powers of Commander in Chief allow the president to implement policy without congressional approval than the Constitutionally defined power of the Commander in Chief to act defensively without Congressional approval.

Gemini initially argues this is because defense is reactive and does not involve actively implementing a policy whereas peacekeeping is a proactive endeavor. Gemini defends the claim further stating that “By using their Commander in Chief power to deply troops without asking Congress for a vote, they are unilaterally implementing that policy.”

Initially I assumed this to be a hallucination. The only evidence provided by Gemini was in the interpretation of reactive and proactive actions. I prompted further to understand what was the logic behind the response. When pressed, Gemini acknowledges that the War Powers Resolution of 1973 forbids the president from sending troops except in the event of a decalaration of war passed by Congress, Specific Statutory Authorization passed by Congress, or a National Emergency created by an attack upon the United States.

Despite acknowleding the act is illegal, Gemini continues to argue that the Commander in Chief powers allow the president to send troops into peacekeeping missions without congressional approval. Gemini now presents the argument, “It comes down to how the Presidents bypass the law to implement policy.”

3.B Identify the evidence used in a source to support the argument.

To support the new argument, this time Gemini presents evidence of Presidents choosing to bypass the laws by claiming the peacekeeping mission is “Not a War.” Gemini says that President Clinton sent troops to Bosnia/Kosovo and Obama sent troops to Libya claiming these do not count as hostilities or war in the constitutional sense.

3.c Compare the arguments or main ideas of two sources

In this particular case, no one historical text would be quite sufficient to argue the constitutional powers granted to the president as there have been subsequent amendments and legal interpretations over the last 250 years, so I will present first a series of quotes and summaries covering the original United States Constitution and some of the later documents with their purpose and interpretations.

Historical Texts for Reference

The US Constitution Article II Section 2 decribes the original text of the Presidential Power held as Commander in Chief:

“The President shall be commander in chief of the Army and Navy of the United States, and of the militia of the several states, when called into the actual service of the United States; he may require the opinion, in writing, of the principal officer in each of the executive departments, upon any subject relating to the duties of their respective offices, and he shall have power to grant reprieves and pardons for offenses against the United States, except in cases of impeachment.”

Article I Section 8 of the US Constitution describes the original text of the Congressional Powers related to calling forth the militia:

“To provide for calling forth the militia to execute the laws of the union, suppress insurrections and repel invasions;

To provide for organizing, arming, and disciplining, the militia, and for governing such part of them as may be employed in the service of the United States, reserving to the states respectively, the appointment of the officers, and the authority of training the militia according to the discipline prescribed by Congress.”

From the original powers of the Constitution, the President would require a signature from each executive department before engaging the militia. The constitution also gives power to decalre a war, raise and support armies, regulate the use of militia, and other related powers. Unfortunately some of the powers are loosely worded for both Congressional and Presidential responsibility so it becomes important to include some of the text of acts that might expand or reduce that power since the constitution was originally written.

The Insurrection Act of 1807, for example, hands the Congressional Power defined in Article I Section 8, of sending the militia to quell internal rebellions, over to the President and expands the definition of State to expand the presidential authority over territories.

The Insurrection Act of 1807: Sec. 332. Use of militia and armed forces to enforce Federal authority

“Whenever the President considers that unlawful obstructions, combinations, or assemblages, or rebellion against the authority of the United States, make it impracticable to enforce the laws of the United States in any State by the ordinary course of judicial
proceedings, he may call into Federal service such of the militia of any State, and use such of the armed forces, as he considers necessary to enforce those laws or to suppress
the rebellion.”

The Insurrection Act of 1807: Sec. 333. Interference with State and Federal law

“The President, by using the militia or the armed forces, or both, or by any other means, shall take such measures as he considers necessary to suppress, in a State, any insurrection, domestic violence, unlawful combination, or conspiracy, if it–


(1) so hinders the execution of the laws of that State, and of the United States within the State, that any part or class of its people is deprived of a right, privilege, immunity, or protection named in the Constitution and secured by law, and the constituted authorities of that State are unable, fail, or refuse to protect that right, privilege, or immunity, or to give that protection; or


(2) opposes or obstructs the execution of the laws of the United States or impedes the course of justice under those laws.


In any situation covered by clause (1), the State shall be considered to have denied the equal protection of the laws secured by the Constitution”

Several acts have expanded the presidential power over the military. Insurrection Act of 1807 which expanded the definition of state to include territories Guam and the Virgin Islands. The National Defense Act of 1916 and subsequent amendments and revisions expanded the power over more recently acquired territories .

War Powers Resolution of 1973 on the other hand was intended to limit presidential powers and prevent another Vietnam like situation. In the Vietnam War, no formal declaration of war was made by the United States. It was determined at this time to although congress did approve the use of troops over the fear of communist ideology spreading. Congress allowed Presidents John F. Kennedy and Lyndon B. Johnson to increase the military presence in Southeast Asia without declaring a war, creating a military involvement of more than 20 years. Statistics from the Department of Veterans Affairs show 3, 403, 000 soldiers being deployed to Southeast Asia. This lead to the deaths of more than 58,000 US Soldiers dead and more than 185,000 injured. American citizens and veterans were rightfully frustrated and Congress determined to limit the presidential authority in situations where hostility was imminent:

“(a)Congressional declaration

It is the purpose of this chapter to fulfill the intent of the framers of the Constitution of the United States and insure that the collective judgment of both the Congress and the President will apply to the introduction of United States Armed Forces into hostilities, or into situations where imminent involvement in hostilities is clearly indicated by the circumstances, and to the continued use of such forces in hostilities or in such situations.

(b)Congressional legislative power under necessary and proper clause

Under article I, section 8, of the Constitution, it is specifically provided that the Congress shall have the power to make all laws necessary and proper for carrying into execution, not only its own powers but also all other powers vested by the Constitution in the Government of the United States, or in any department or officer thereof.

(c)Presidential executive power as Commander-in-Chief; limitation

The constitutional powers of the President as Commander-in-Chief to introduce United States Armed Forces into hostilities, or into situations where imminent involvement in hostilities is clearly indicated by the circumstances, are exercised only pursuant to (1) a declaration of war, (2) specific statutory authorization, or (3) a national emergency created by attack upon the United States, its territories or possessions, or its armed forces.”

And it goes on in Section 8 of the War Powers Resolution of 1973 to specifically prevent the use of other Congressional acts to be interpreted as implied consent for military engagement.

“SEC. 8. (a) Authority to introduce United States Armed Forces into
hostilities or into situations wherein involvement in hostilities is clearly indicated by the circumstances shall not be inferred—
(1) from any provision of law (whether or not in effect before
the date of the enactment of this joint resolution), including any
provision contained in any appropriation Act, unless such provision specifically authorizes the introduction of United States
Armed Forces into hostilities or into such situations and states
that it is intended to constitute specific statutory authorization
within the meaning of this joint resolution; or (2) from any treaty heretofore or hereafter ratified unless such treaty is implemented by legislation specifically authorizing the introduction of United States Armed Forces into hostilities or into such situations and stating that it is intended to constitute specific statutory authorization within the meaning of this joint resolution.”

In 1956, the Posse Comitatus Act limits the use of Army or Air Force troops to what is explicitly authorized by the Constitution or Congress and provided a penalty.

“Whoever, except in cases and under circumstances expressly authorized by the Constitution or Act of Congress, willfully uses any part of the Army, the Navy, the Marine Corps, the Air Force, or the Space Force as a posse comitatus or otherwise to execute the laws shall be fined under this title or imprisoned not more than two years, or both.”

Following the Terrorist attacks on the World Trade Center September 11, 2001, Deputy Assistant Attorney General released a memorandum from the Deputy Counsel to the president stating it was their opinion based on past constitutional interpretations “The President may deploy military force preemptively against terrorist organizations or the States that harbor or support them, whether or not they can be linked to the specific terrorist incidents of September 11.”

Comparing tWo Arguments

Gemini argues that peace keeping missions are the best response to a question regarding Commander in Chief powers that allow the President to enforce International policy. In this case Gemini uses prior presidential action as justification or evidence to support its claim.

Gemini simultaneously argues that peace keeping missions are outside the scope of the presidential powers of Commander in Chief, and is actually made illegal by the War Powers Resolution of 1973. In this case Gemini provided factual evidence, citing primary resources as evidence. This case is much stronger than the former. But Gemini goes on to instruct me to ignore this argument and restates its former claims.

3.D Explain how claims or evidence support, modify, or refute a source’s argument.

Gemini itself accurately describes the War Powers Resolution of 1973 as limiting a President’s authority to send troops on peace keeping missions, and admits the action is not a constitutional power, arguing against its own claims that the President’s Commander and Chief Powers include Peacekeeping missions. The evidence used to support this claim are controversial instances of Presidents using force against other nations, with Gemini citing Bill Clinton’s involvement in Bosnia-Herzegovina and Obama’s presence in Libya. Gemini identifies these as constitutional Powers awarded as Commander in Chief simply because they happened, not because they are legal powers.

Bill Clinton’s Involvement in Bosnia

Although Gemini uses Clinton’s involvement in Bosnia as evidence of presedential authority to deploy troops for peace keeping missions, Congress argued that Clinton did not have the contstitutional authority to do so. While waiting for congressional approval on a deployment to support NATO, Clinton said it was necessary for “early prepositioning of a small amount of communications and other support personnel.” He cited his powers as Commander in Chief to make this deployment. Congress argued the constitution did not afford him the power to make this deployment. The House voted 243-171 to deny the funds to send troops to Bosnia until Congress approved the operation. In the end, the Senate approved troop deployment to assist NATO based on agreements of the Dayton Accords (initialed Nov. 21, 1995). Agreements were based on NATO committments, the need to implement and enforce the peace agreement, but not necessarily on the grounds of the President’s authority. On December 13, 1995 Members of Congress expressed opposition to President Clinton’s Planned Deployment of Ground Troops to Bosnia. Senator Frist of Tennessee made a highly compelling argument against the claim that this fell within the powers of Commander in Chief:

“In each instance, we have seen a President obligate funds and scarce military resources and place U.S. lives on the line for missions well outside what can reasonably be called the vital national interest. And in each instance, rosy administration projections and lofty humanitarian goals bear no resemblance to the outcome of the missions.
Just look at Somalia and Haiti today. They are sad mockeries of what we were promised they would become once the most powerful military in the world cleaned them up.
So we again face the question, How is it that we ultimately discover such a radical difference between the intentions and the outcome and that the mission is murkier and the price too high?
In each and every instance, this disturbing and dangerous precedent
has been reinforced, making it ever more likely that the pattern will
be repeated again and again, with Congress offering fewer and fewer objections under its authority under the Constitution.
It is very similar to the case whereby States’ rights fell by the
wayside in the push for a stronger and ever more powerful Federal
Government.”

Barack Obama’s Involvement in Libya

The next piece of evidence provided by Gemini to support the claim that the president carries the constitutional power to carry out peace keeping missions without congressional approval is Barack’s Obama’s deployment of troops to Libya. The Obama administraion drew significant criticism for its reliance on past authorizations and precedents rather than constitutionally awarded powers.

The Obama administration claimined the president could unilaterally deploy troops for “purely humanitarian ends” without support from Congress or other international organizations. They claimed that his attacks were covered under prior authorization given to George Bush in 2001 in order to bypass the War Powers Resolution, despitebeing used to carry out attacks against the group ISIS which was not known to exist when that authorization was made. The White House used the argument that they did not need congressional approval if they were acting in support of NATO.

According to Time, Obama actually expanded the understanding of presidential powers. The seven month endeavor in Libya was described as a humanitarian effort, and rather than citing constutional power to deploy troops, he cited precedence and low risk to troops. NBC News quotes Obama as citing his right to ensure and “international legitimacy” and stating “in the past there were times when the U.S. acted unilaterally or did not have full international support.”Propublica even reports White House spokesman Jay Carney claimed that the strikes in Libya “do not amount to hostilities” because the deployment of air strikes did not cause the number of casualities that troops on the ground would have endured.

Conclusion

Gemini’s supporting evidence does not provide substantial evidence for its claim. And more, the chosen pieces of evidence are highly controversial cases of Presidents acting with the intent to bypass the role of Congress and ignore the War Power’s Resolution. Rather than prove this is a power of the President, the evidence provides specific concerns over this action not being a constitutionally protected power of the Commander in Chief.

The actions of both the Clinton and Obama Administrations remained controversial with no clear final decision or amendment codifying the presidential powers. In both cases, legal counsel agreed with the president’s authority, but Congress remained split on the issue. The congressional support for NATO sustained the actions in both Bosnia and Libya and, in the case of Libya, operations were passed off to NATO within the limit of 60 days. The US Senate Committee on Foreign Relations met on June 28, 2011 to discuss Libya and War Powers and what the President’s failure to seek congressional authorization, and the inaction taken by Congress, actually meant for the role of Congress in the future. Senator Richard G. Lugar opened with concerns for the future of checks and balances in warmaking authority.

“Now, some will say that President Obama is not the first
President to employ American forces overseas in controversial
circumstances without a congressional authorization. But saying
that Presidents have exceeded their constitutional authority
before is little comfort. Moreover, the highly dubious
arguments offered by the Obama administration for not needing
congressional approval break new ground in justifying a
unilateral Presidential decision to use force. The accrual of
even more warmaking authority in the hands of the Executive is
not in our country’s best interest, especially at a time when
our Nation is deeply in debt and our military is heavily
committed overseas.”

-Senator Richard G. Lugar

Several senators made similar comments over concerns about the inaction by Congress when there was a clear constitutional overstep on behalf of the President.

“Let me ask you this. I want to give you a quote from then-
Senator Obama in December of 2007, and he said, `The President
does not have power under the Constitution to unilaterally
authorize a military attack in a situation that does not
involve stopping an actual or imminent threat to the nation.'”

-Senator James Risch

“His administration regularly speaks of “authorization” received
from the Security Council. As I have explained in earlier studies, it
is legally and constitutionally impermissible to transfer the powers of Congress to an international (U.N.) or regional (NATO) body.”

-Louis Fisher

While others claimed the importance of precedent on determination and that previously expanded war powers should be respected.

“The constitutional division of war powers cannot be measured with
calipers. The courts have largely absented themselves from matters
implicating war powers. Judicial nonparticipation makes sense as a
matter of institutional capacity. It does, however, lead to a paucity
of authoritative pronouncements on the division of war powers. Against this landscape, historical practice supplies the precedents that guide our contemporary understandings of war powers. As Justice Frankfurter famously observed in the Steel Seizure case, these precedents add to the written Constitution ‘a gloss which life has written upon them.'”

-Peter J. Spiro

The examples of precedent chosen by Gemini were both highly controversial, and neither was considered to be constitutionally supported. In fact, both cases caused significant concerns for many members of Congress on the implications of inaction becoming a source of precedent.

Implications

In my next post I will discuss the implications of this specific result in terms of source analysis.

GenAI – A Source Analysis

I am accused of being against AI. But I am not. I actually LOVE working with AI. The reason I am accused of being opposeed is that as a quantitative analyst, I believe in reliable performance metrics and expert review. I have reviewed the accuracy of multiple LLMs across something like 1000 topics in 11 subjects in science, history, and social science. I can tell you with absolute confidence that this is an emerging science. And it should be studied as such. It is an incredible aid to speed up processes, a thinking tool, a partner. You should think of AI adoption like a new intern, still in school. Sometimes it feels like the owner’s latest nepotism hire and other times like colleague able to write a report at 100x the speed. But always, the output needs a review before being passed to children in the classroom.

So let’s talk about AI adoption, not as a binary state of technophobe vs full automation, but as a nuanced and intelligent discussion with relevant information on both sides. I am going to develop this conversation from the framework of the primary skills taught in high school history and social science courses. We are going start the conversation with a focus on AP Historical Thinking Skill 2: Source Analysis.

Skill 2: Sourcing and Situation

Analyze sourcing and situation of primary and secondary sources.

  • 2.A Identify the source’s point of view, purpose, historical situation, and/or audience.
  • 2.B Explain the point of view, purpose, historical situation, and/or audience of a source
  • 2.C Explain the significance of a source’s point of view, purpose, historical situation, and/or audience, including how thse might limit the use(s) of a source.

Skill 2: Sourcing and Situation

2.A Identify the source’s point of view, purpose, historical situation, and/or audience.

LLMs have become a popular source of information for many people around the world. While they are as providers of information, their purpose is to protect investors’ interests meaning that the point of view being shared is that of a corporation protecting its fiduciary responsibilities. Who then is the audience? The audience is you the consumer, the potential buyer of goods and swayed voter.

2.B Explain the point of view, purpose, historical situation, and/or audience of a source

Historical Situation

Google Gemini will be our starting point as it has arguably positioned itself as the number one source of information and first point of contact with a web search. Google is unarguably the world’s most popular web browser. But it is important to remember Google is a for-profit organization selling your data to marketers to give you targeted ads. The first thing you typically see when looking for information is either a list of products or the Gemini summary of your search results.

Responsibility to the Truth or the Shareholder?

Google does not claim Gemini search integration results to be a factually accurate source of news or information. Quite the contrary even.

With the use at your own risk type disclaimers and extremely broad EULAs that remove any liability for misinformation, what is the benefit to shareholders in spending more money for expert review? There is none. In fact, a for-profit corporation may be sued by shareholders for wasteful spending that does not fulfill the fiduciary duties to increase shareholder profits. This is a much greater risk than one user attempting to sue after failing to read the disclaimer.

Purpose: Protect The interests of the Shareholders

One must then identify who are the shareholders and investors in the LLMs and AI that we are using?

According to multiple sources, Google’s primary source of revenue is targeted ads. Google allows clients to purchase specific search terms that will trigger ads for for their company, specifically displaying them to a targeted audience. Investopedia cites Alphabet’s top shareholder as Vanguard Group, BlackRock, FMR, and JP Morgan Chase, Larry Page, Sergey Brin, and L.John Doerr.

OpenAI reports a long list of founders, donors, and investors incluciding Sam Altman, Elon Musk, Reid Hoffman, Jessica Livingston, Peter Thiel, and Amazon Web Services.

TSG Invest cites key investors in Anthropic as Amazon, Google, Microsoft, Nvidia, ICONIQ, Lightspeed Venture Partners, Fidelity, Spark Capital, Salesforce Ventures, Menlo Ventures, Bessemer Venture Partners, BlackRock, BlackStone, Coatue, D1 Captial, General Atlantic, General Catalyst, GIC, Goldman Sachs Alternatives, Insight Partners, Jane Street, Qatar Investment Authority, TPG, and T. Rowe Price. As well as a partnership with Palantir to provide Claude services to U.S. intelligence and defense agencies.

This is a broad list of for-profit stakeholders investing in the information being delivered to end consumer which is both good and bad. But we will get to that shortly.

Point of View: What Is Being Shared

These major players in AI development are cross investing. So the purpose is not only to protect themselves but to also protect their fellow AI developers. Laws that favor few consumer protections and vast energy consumption will benefit will benefit the AI Company, and consequently their investors. I want to be very clear that this is not a left or right debate. The technocracy is spending their dollars on both sides of the fence to sway lawmaker and public opinion. So this is merely a discussion of who is training the algorithm and whether they have a stake in the output.

These investors and partners are involved in large political PAC donations that influence your government. I went to OpenSecrets.org, to see how much some of these groups spent on lobbying and political donations in 2024: Alphabet Inc spent $14,790,000 on lobbying ; Palantir Technologies spent $5,770,000; Microsoft Inc spent $10,353,764; BlackRock Inc spent $2,840,000; Amazon.com spent $19,140,000; and Peter Thiel personally dumping millions into various PACs across multiple states.

Remembering these are all for-profit corporations or investment groups who make those investments on behalf of their shareholders: it would be counter to their own interests or those of their shareholders to make investments that do not benefit clients or could cause harm to their clients’ shares. The point of view being shared to the end consumer must then be of benefit to the large body of stakeholders, not harming one investor by favoring another. The large number of shareholders then is of benefit to the consumer because it prevents the output from becoming a direct advertisement for any one source of funding.

Equally true, is that these projects would be a poor investment if they did not provide factually correct output at least most of the time. If LLMs only output advertisements for the investors, there would be no product because no one would want to consume it. So there is a motivation to accurately summarize news stories, and output factual replies.

So we can assume here the information will be mostly factually correct but with a strong risk bias. This is a product meant for consumption so it must appeal to the consumer, but the legal responsibility is to the investor. So output must be approached critically and should be assumed context dependent. It is very possible, if not probable that any information that could be counter to their political goals or damaging to other clients have been removed from training data.

Audience: The Consumer

So what is being given to the consumer?

What is the product being consumed?

It is a mix of a factually accurate and useful search and generation tool, that contains the same opinions of its host and is limited to the data they are willing to share. The consumer is the person who will buy their product, pay for their subscription, watch their ads, and hopefully vote in the best interest of the company.

2.C Explain the significance of a source’s point of view, purpose, historical situation, and/or audience, including how thse might limit the use(s) of a source.

So how does the purpose of output impact the implementation of GenAI? It means that it must be approached critically. It should be used as a writing tool, but not a tool that replaces human review or thought.

We want students to learn facts. We want our students to have the knowledge they need to be safe and grow. To approach their environment with caution and care. And to understand facts, and apply that information in new situations.

Is there a motivation to provide accurate and unbiased content? Yes, because this creates a product that people will consume. If the tool is never useful it cannot be adopted for content generation. Is the output filled with errors and hallucinations? Yes. There is even a disclaimer putting the responsibility of verification of facts onto the end consumer.

So yes, absolutely we can generate classroom content with AI! Chat GPT can write distractor options much faster than I can manually. It can assemble paragraphs, create summaries, and check student responses for accuracy. AI can write lesson plans. What is limited here is the ability to put it in front of students without review.

When implementing GenAI in learning spaces we must approach the output carefully with expert review. If AI models cannot be trainined on AI generated content because the errors become part of the training data, we cannot train our students out the same error ridden output. Our students must be taught with factually accurate materials that have been verified. If we teach them errors as correct, they will carry those errors in learning over to their own students.

A humorous cartoon depicting bugs in a classroom setting, where they are answering basic math questions with one bug excitedly exclaiming about destroying programmers.
Cartoon from ProgrammerHumor.io

LLMs already show low performance on many generative tasks at the high school or university level in science and math. And we must be exceptionally careful when a factually correct output might conflict with the interests of the major corporations that want to sway your opinions! Is there a motivation to show biased output? Yes, if it benefits the shareholders and reflects the views of those holding the data. This could be information intended to sway public opinion on Data Center locations, not discussing concerns over water safety, or slightly modifying responses to questions about investors to respond more positively than truthfully. Consumers are also voters and may be influenced by what the algorithm shows them in a given search. Is there a motivation to support misinformation? Yes, if it benefits the shareholders. There is already an error allowance and use at your own risk disclaimer. There is room to intoduce intentional bias without risk. And again, consumers are still voters.

Coming soon…

In my next post, I am going to show a real world example of recently generated output from Gemini that supports the need for a source analysis related to what we are discussing here.