Jump to content

How do I find and edit my robots.txt file?

Go to solution Solved by paul2009,

Recommended Posts

Posted

Site URL: https://www.stevemaskery.com/

Hi, I'm new to SS and not a lot of previous experience with HTML.

I have two domains, workshopessentials.com and stevemaskery.com, both of  which are forwarded to:

https://amphibian-celery-b7en.squarespace.com

I'm getting a nag from Google saying that I need to add a couple of lines to my robots.txt file so that it can index my images properly. How do I find and access this file please?

Many thanks
Steve

 

  • Solution
Posted (edited)
On 3/16/2022 at 10:47 AM, SteveMaskery said:

How do I find and access [my robots.txt] file please?

All Squarespace sites use the same robots.txt file and as a Squarespace user you cannot access or edit it. Squarespace have already specified the pages that should not be crawled by search engines because they’re for internal use only or display duplicate content. For example, /config/ is your Admin login page, and /api/ blocks the Analytics tracking cookie.

Here are some pages that Squarespace ask search engines not to crawl. These pages organize content that exists elsewhere on your site.

/search

/*?author=*

/*&author=*

/*?tag=*

/*&tag=*

/*?month=*

/*&month=*

/*?view=*

/*&view=*

/*?format=*

/*&format=*

/*?reversePaginate=*

/*&reversePaginate=*

For a complete list of the excluded pages, view the robots.txt file on any Squarespace website.

Edited by paul2009
Added additional information

Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥.
Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. 
Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links.
Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.

Posted (edited)

Ah, OK, thank you.

Google is telling me this:

Quote

 

Dear Google Merchant Center user,

Merchant Center Account:  xxxxxxxxx

We have detected a recent configuration change of the site hosting your images that result in the disapproval of some of your items in your Merchant Center account.

Since images are an important part of the rich product information shown in Shopping ads, we require that all items include a valid image that can be indexed by Google. We crawl the images you submit to Merchant Center every few weeks to ensure that users always see the most recent version. Our ability to crawl your images can be restricted with a robots.txt file, which might lead to the disapproval of affected items. Learn more about robots.txt by visiting https://support.google.com/webmasters/answer/6062608.

What's the issue?
We have detected that a robots.txt file that controls the indexing of some of your provided images has been updated recently. As a result of these updates we aren't able to index the images of some of your items. This will result in the disapproval of the affected items.

Details and impact:
Estimated percentage of offers affected: 100
File: http://www.workshopessentials.com/robots.txt

In order for us to access these images, please modify the robots.txt file mentioned above to allow the user-agent Googlebot-Image to index these images. You can do this by adding the following lines to the robots.txt file:

User-agent:       Googlebot-Image
Disallow:

If modifying this robots.txt file is not feasible you might want to consider hosting your images on a different hosting service that allows images to be indexed by Google.

 

Is this something I should be worried about or can I safely ignore it?

Many thanks
Steve

 

 

Edited by SteveMaskery
Remove sensitive info
  • 1 month later...
Posted

@burmdoge What issue are you trying to address? 

Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥.
Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. 
Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links.
Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.

Posted

Hi Paul,

I'm trying to get the index page on our Squarespace website indexed, but in the Google Search Console, URL inspection it's showing:

URL is not on Google: Indexing errors

Failed: Redirect error  

 https://mesaio.com.au

Looks like it's related to the robots.txt file, could you please help resolve this issue?

Thank you,

Dion

  • 3 weeks later...
Posted

I'm having the same issue and my clients are getting very anxious about not being find-able on Google, it's been almost 3 months since we launched arcticroadrally.com. Please help!

  • 1 month later...
  • 1 month later...
Posted

Squarespace robots.txt is blocking Google from indexing my blog posts.

How do I edit the robots.txt to stop this? Totally ridiculous. Checking before I have to move my site off of Squarespace as it will not allow blog posts to be indexed. Regular pages are indexed.

Screen Shot 2022-09-05 at 11.27.31 AM.png

  • 1 month later...
  • 2 weeks later...
Posted
On 9/5/2022 at 5:29 PM, MJB1923 said:

Squarespace robots.txt is blocking Google from indexing my blog posts.

How do I edit the robots.txt to stop this? Totally ridiculous. Checking before I have to move my site off of Squarespace as it will not allow blog posts to be indexed. Regular pages are indexed.

Screen Shot 2022-09-05 at 11.27.31 AM.png

I experience the exact same!! Can someone from Squarespace please help? Because blogposts sure are NOT pages that google should not crawl. Tell us how to fix it or a lot of people will move their business away from squarespace. 

Posted
On 11/15/2022 at 11:14 AM, Chiara1234 said:

Squarespace robots.txt is blocking Google from indexing my blog posts. Can someone from Squarespace please help? Because blogposts sure are NOT pages that google should not crawl.

@Chiara1234 I believe you have misunderstood the message from Google.

Squarespace is not blocking Google from indexing your blog posts. Google is indexing your blog but Squarespace is correctly preventing Google from indexing the category links on your blog page, for example /blog/category/Reviews.

This link leads to duplicate content (the same blog posts that have already been indexed individually) and so this link should not be indexed, as I mentioned in my answer above.

Did this help? Please give feedback by clicking an icon below  ⬇️

Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥.
Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. 
Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links.
Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.

  • 8 months later...
Posted
On 3/6/2023 at 7:36 AM, Desi-Rae said:

Can Squarespace PLEASE stop controlling this? It's unnecessary, has far reaching negative implications for users, and why? Who decided these things? Why should it not be indexed? Why can't the user decide this? What exactly will this break for Squarespace while it is positively hurting customers?

As a webdeveloper of 14 years experience, I 100% disagree. A robots.txt is extremely important to SEO and given Squarespace has intimate knowledge of their own systems that it would be impossible for everyday users to know they should absolutely set the default robots.txt. 

You need to have more trust in a company that literally creates and runs millions of website rather than some article you read online....context is key here. You get the benefit of their expertise, and essentially they are saving you from killing your SEO in 100s of different way by having a misconfigured robots.txt or missing configurations because you don't know all the paths Squarespace has available.

You have asked a lot of questions that if you don't know the answers already, you should not be making changes to a robots.txt and as you will ruin your website, despite what you think. 

Ideally, having a section in the admin with sufficient warnings that allows a section that is appended to the default robots.txt could be good, but then again it would be a very advanced feature that user would need to understand some real important why and why nots to add pages there.

  • 4 months later...
Posted
On 8/8/2023 at 5:43 AM, BenCircular said:

As a webdeveloper of 14 years experience, I 100% disagree. A robots.txt is extremely important to SEO and given Squarespace has intimate knowledge of their own systems that it would be impossible for everyday users to know they should absolutely set the default robots.txt. 

You need to have more trust in a company that literally creates and runs millions of website rather than some article you read online....context is key here. You get the benefit of their expertise, and essentially they are saving you from killing your SEO in 100s of different way by having a misconfigured robots.txt or missing configurations because you don't know all the paths Squarespace has available.

You have asked a lot of questions that if you don't know the answers already, you should not be making changes to a robots.txt and as you will ruin your website, despite what you think. 

Ideally, having a section in the admin with sufficient warnings that allows a section that is appended to the default robots.txt could be good, but then again it would be a very advanced feature that user would need to understand some real important why and why nots to add pages there.

I would have to disagree with you on this - I have worked in e-commerce and SEO for 15 years and being able to control your own robots.txt is a pretty essential functionality. 

I'm not so interested on what Squarespace chooses to not index by default, I trust them on that too, but I have additional pages on my site that I want to be able to block from being indexed/crawled and as far as I can see I'm not able to define this and without robots access I don't have the power to. 

There are many people out here using squarespace sites who aren't developers, but have a lot of web experience and do need to be able to manage the SEO of their sites properly. If Squarespace wants to claim to be a good platform for SEO then they need to be providing the tools for businesses to make that happen.

I've worked with many major e-commerce and web platforms and consider access to this to be a very standard feature these days.

Posted
1 hour ago, asillince said:

I have additional pages on my site that I want to be able to block from being indexed/crawled and as far as I can see I'm not able to define this

Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it.

Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥.
Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. 
Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links.
Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.

Posted (edited)

Google Search Console is telling me that my https://home.mydomain.com/robots.txt file is 404 Not Fetched (Not Found) - Hostload Exceeded. This is a Squarespace homepage and I'm not sure why it's not working correctly. I checked the SEO settings and everything looks normal. Any ideas?

Edited by ChrisL123
  • 4 months later...
Posted
On 1/2/2024 at 5:58 PM, paul2009 said:

Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it.

As far as I can see this doesn't apply to individual product pages. Can you confirm that this is correct and if there is a way to do that?

Posted
2 hours ago, asillince said:

As far as I can see this doesn't apply to individual product pages. Can you confirm that this is correct and if there is a way to do that?

It is true that the products themselves do not have settings to control SEO access. The SEO setting applies to the whole collection.

Two possible solutions.

Create two Store pages (collections). One that allows crawlers and the other not.

Please see the following. The issue in this thread is different but the JavaScript would be similar. I know of no code that currently does this.

 

Find my contributions useful? Please like, upvote, mark my answer as the best ( solution ), and see my profile. Thanks for your support! I am a Squarespace ( and other technological things ) consultant open for new projects.

  • 2 weeks later...
Posted
On 1/2/2024 at 8:58 AM, paul2009 said:

Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it.

The problem is not with Squarespace URLs - it's actually with pages that DON'T exist on the Squarespace site, but DID exist on the prior site.

Case in point: we've moved multiple clients from WordPress to Squarespace. In each case, old WP URLs are actively being crawled by Google to this day. We need the option to disallow Google from crawling folders like /wp-includes/ and /wp-content/ because that content is simply no longer there.

P.S.

A 301 redirect is not an appropriate solution here because the old URLs are spammy content caused by a hack of the old website. Squarespace does not offer an option to set a 410 server response on an individual URL level  either (another feature Squarespace should absolutely offer to site administrators).

Posted

Regarding 410s, a 404 status code is fine.

 

Building high-performance websites with expert SEO for local and national reach. Have an SEO question? Google to find our advice or book a Zoom session for tailored help.  Squarespace since 2013 

  • 2 weeks later...
Posted

I got it, we're not allowed to edit robots.txt

Then please let me know how else I can fix the 401 mistake to make the website crawled by Google. It seems that the robots.txt blocks crawling 

image.thumb.png.5d5871879fc902b9721a7b482be26653.png

  • 2 weeks later...
Posted (edited)

I am at a complete loss, hoping someone can help.

Suddenly nothing is showing up for my website, see screen shots below. I haven't added any new code to my site, none of my main pages are toggled to be hidden from search results, and I've spent hours trying to figure out how to fix this. I did toggle off AI Crawlers, but nothing else.

Thanks in advance for any insight or help.

 

 

 

image.png.31cdf8dad4e60c82eab77bdc479c5364.pngScreenshot 2024-06-29 at 16.52.54.png

Edited by ilseS
More detail
Posted

@paul2009 or anyone here, I wonder if you might be able to help? Im out of my league with Robots.txt suddenly my site and all of its pages appear to be blocked on google due to a Rotots.txt setting ...and I have ZERO idea why. I've not done anything new, not installed any strange code, and am, at a loss ...

Any help would be tremendously appreciated 🙂

 

image.png.31cdf8dad4e60c82eab77bdc479c5364.png

Screenshot 2024-06-29 at 16.52.54.png

  • ilseS changed the title to HELP please? suddenly my site is blocked in google search results by Robots.txt ...
Posted (edited)
1 hour ago, ilseS said:

suddenly my site and all of its pages appear to be blocked on google due to a Rotots.txt setting

I suspect crawlers in Settings > Website > Crawlers has been disabled.

Screenshot2024-06-29at11_02_21AM.thumb.png.3bb624117ed7f9ddebad84aae38984f7.png

Looking at your robots.txt we can see the disallow.

Screenshot2024-06-29at11_00_09AM.thumb.png.8f60c35e2ea11a367c56e17d8ba78ba4.png

Please see Excluding your site from AI scans.

Let us know how it goes.

Edited by creedon

Find my contributions useful? Please like, upvote, mark my answer as the best ( solution ), and see my profile. Thanks for your support! I am a Squarespace ( and other technological things ) consultant open for new projects.

Create an account or sign in to comment

You need to be a member in order to leave a comment

×
×
  • Create New...

Squarespace Webinars

Free online sessions where you’ll learn the basics and refine your Squarespace skills.

Hire a Designer

Stand out online with the help of an experienced designer or developer.