PoshCode Logo PowerShell Code Repository

Get-FileEncoding by JasonMArcher 4 years ago (modification of post by Chad Miller view diff)
diff | embed code: <script type="text/javascript" src="http://PoshCode.org/embed/2075"></script>download | new post

Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM).

  1. function Get-FileEncoding {
  2. <#
  3. .SYNOPSIS
  4. Gets file encoding.
  5. .DESCRIPTION
  6. The Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM).
  7. Based on port of C# code from http://www.west-wind.com/Weblog/posts/197245.aspx
  8. .EXAMPLE
  9. Get-ChildItem  *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'}
  10. This command gets ps1 files in current directory where encoding is not ASCII
  11. .EXAMPLE
  12. Get-ChildItem  *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'} | foreach {(get-content $_.FullName) | set-content $_.FullName -Encoding ASCII}
  13. Same as previous example but fixes encoding using set-content
  14. .NOTES
  15. Version History
  16. v1.0   - 2010/08/10, Chad Miller - Initial release
  17. v1.1   - 2010/08/16, Jason Archer - Improved pipeline support and added detection of little endian BOMs.
  18. #>
  19.     [CmdletBinding()]
  20.     param (
  21.         [Alias("PSPath")]
  22.         [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)]
  23.         [String]$Path
  24.     )
  25.  
  26.     process {
  27.         $Encoding = "ASCII"
  28.         [Byte[]]$byte = Get-Content -Encoding Byte -ReadCount 4 -TotalCount 4 -Path $Path
  29.  
  30.         if ($byte[0] -eq 0xEF -and $byte[1] -eq 0xBB -and $byte[2] -eq 0xBF) {
  31.             $Encoding = "UTF8"
  32.         } elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xFE -and $byte[3] -eq 0xFF) {
  33.             ## UTF-32 Big-Endian
  34.             $Encoding = "UTF32"
  35.         } elseif ($byte[0] -eq 0xFF -and $byte[1] -eq 0xFE -and $byte[2] -eq 0 -and $byte[3] -eq 0) {
  36.             ## UTF-32 Little-Endian
  37.             $Encoding = "UTF32"
  38.         } elseif ($byte[0] -eq 0xFE -and $byte[1] -eq 0xFF) {
  39.             ## 1201 UTF-16 Big-Endian
  40.             $Encoding = "Unicode"
  41.         } elseif ($byte[0] -eq 0xFF -and $byte[1] -eq 0xFE) {
  42.             ## 1200 UTF-16 Little-Endian
  43.             $Encoding = "Unicode"
  44.         } elseif ($byte[0] -eq 0x2B -and $byte[1] -eq 0x2F -and $byte[2] -eq 0x76) {
  45.             $Encoding = "UTF7"
  46.         }
  47.  
  48.         $Encoding
  49.     }
  50. }

Submit a correction or amendment below (
click here to make a fresh posting)
After submitting an amendment, you'll be able to view the differences between the old and new posts easily.

Syntax highlighting:


Remember me